No, Seriously, It's Naming

Thu, Mar 17, 2016 · Tagged Ruby, Clojure · 10 minute read

I just read David Bryant Copeland’s post It’s not Naming That’s Hard—It’s Types which he wrote in response to Katrina Owen’s What’s in a Name? Anti-Patterns to a Hard Problem and I feel compelled to say a few words. Katrina’s post provides some suggestions around the perennial developer challenge of naming things in code. Along the way she asserts that including the type of a variable in its name is typically an antipattern, and “type information is just not that compelling”–something David took great exception to.

David argues that the actual problem is we don’t have enough types: “types are a better way to solve the problems Katrina identifies in her post.”

He then proceeds to turn Katrina’s perfectly reasonable bit of code…

def anagrams(subject, candidates)
  candidates.each do |candidate|
    subject != candidate && same_alphagram?(subject, candidate)
  end
end

def same_alphagram?(subject, candidate)
  # ... (not provided, but I want to offer a fair comparison of length)
end

…into this…

class Word
  attr_reader :letters

  def initialize(string)
    @letters = string.chars.map { |char|
      Letter.new(char)
    }
  rescue ex => ArgumentError
    raise ArgumentError, "'#{string}' is not a word: #{ex.message}"
  end

  def to_s
    @letters.map(&:to_s).join("")
  end

  def same_alphagram?(other_word)
    self.letters.sort == other_word.letters.sort
  end
end

class Letter
  def initialize(char)
    unless char =~ /^[\a\s]$/
      raise ArgumentError,"'#{char}' is not a letter"
    end
    @char = char
  end

  def to_s
    @char
  end

  def ==(other_letter)
    self.to_s == other_letter.to_s
  end

  def <=>(other_letter)
    self.to_s <=> other_letter.to_s
  end
end

def anagrams(word, candidate_words)
  candidate_words.each do |candidate_word|
    word != candidate_word && word.same_alphagram?(candidate_word)
  end
end

And I’m like 😱…

So much nope

But after unflipping my table, taking a long walk, and having a big mug of coffee, I think I’m ready to speak again. The truth is I can understand why David went down this road. In fact, it’s perfectly reasonable Kingdom of Noun thinking, and possibly something I would have done a few years back. But I’ve come to believe it’s backwards, so I’ll make my case:

Performance

Let’s just get this one out of the way because it’s the easiest but not the one I want to focus on: David’s version wraps every single character in a Ruby object–this is spectacularly wasteful of memory.

Sllloooooooooww

Someone’s going to point out that we shouldn’t be using Ruby if we care about performance. This argument is specious; we shouldn’t use Ruby if we care most about performance, and I certainly tend to optimize to use machine cycles instead of human cycles, but don’t assume you won’t ever find yourself wishing for better performance in a part of an application where Ruby was generally a good choice.

Naming

The naming situation didn’t even improve–now we just have a proliferation of things to name! Really! Look again!

Complexity

There’s one very obvious difference between the two solutions: length. Length is not, itself, a deal-breaker, but all-things-being-equal, it is a disadvantage. Every line of code is an opportunity for bugs.

If there’s a bug in both of these implementations, which would you rather work on? The one with two functions¹, testable in isolation, and completely referentially transparent? Or the one with two classes, three times as many lines, and state?

But more importantly, ask yourself: why is there more code?

Look at the methods David had to (re-)implement: the to_s methods, ==, <=>. What do the contents of these methods have to do with finding anagrams? Nothing. This is what’s called “incidental complexity”; this code exists solely to solve challenges introduced by the shape of the solution–it doesn’t actually relate to the details of the problem we’re trying to solve (the “intrinsic complexity”).

Why’d ya have to go and make things so…

Looking at the contents of those extraneous methods, all they do is allow access to underlying capabilities of the (wrapped) String class. Katrina’s version didn’t require this rigmarole because she didn’t wrap the String class.

Let’s put it this way: let’s say we have a perfectly good humanoid robot, and we want to teach that robot to serve us gin martinis. Katrina’s solution says: “OK, robot, follow these commands to make the martinis happen”. Simple! David’s solution says “robot, I’m gonna throw a bag over your entire body, then cut two arm holes (so you can reach the booze, of course!), then eye holes, and then you can make martinis!”.

And if this whole one-step-forward-two-steps-back thing weren’t strange enough, it’s actually even stranger–not just any bartending-capable bag will do–it has to be exactly the one labeled Word².

Totally a robot

If we think about this in the abstract, David’s version is saying to us: “I could do my job perfectly fine as a function that manipulates what you already have, but instead I want you to transform yourself to a new format I invented before I’ll let you use my functions.” Why do we want this?

Think about what we’re doing here–we’re creating two new classes (unnecessary) in order to annotate that certain values can participate in certain interactions. If we really feel it’s important to call these out, programming provides a mechanism for this–it’s called an interface. An interface is a way to tell a type checker or a human that one thing works with another. This idea exists because a value cannot reasonably change its identity (what wrapping it in a class does) just to participate in an interaction with a piece of code. Expecting it to assumes that the receiving bit of code is the most important thing in the program and thus the rest of the world should conform to it. This kind of thinking leads to a proliferation of odd, single-use wrapper classes and incompatible objects.

Now, Ruby doesn’t have reified interfaces. It’s built on the idea of duck typing, which says that an interface should be descriptive (“what can you respond to?”) rather than prescriptive (“what are you supposed to respond to?”). We can argue about whether or not that is a good idea, but that is quite definitely the Ruby way, and one can at least say it does lead to some useful and fun tricks (at the expense of safety and informative annotation).

But both duck typing and reified interfaces capture the central point–to the extent possible³, we should avoid demanding that arguments be certain things, we should only demand that they do certain things.

I can understand how we got here. There’s a temptation to scope objects down to the bits we care about; to make them clean, and their possible operations easily enumerable. It’s a prophylactic. A form of isolation. But this presumes that we know everything anyone will ever want to do with the objects. And that means we need to know the future! When we really think about writing good software, software that can survive the long-haul of requirements changes, of developer turnover–are we really going to look back and think “thank Minsky I blocked access to the length method on String, that really could have cooked my goose!”, or are we just being overly picky about aesthetics?

One might suggest that we’re abstracting away the details of the String class, and abstraction enables flexibility and code reuse, right? Well, let’s look at the details and decide–starting with a question:

What’s true about a String that’s not true about a Word? Answer: we all know WTF a String is!

But more importantly, so does every debugger, REPL, and test framework–they all know how to work with these things. I can serialize them, transmit them, clone them, visualize them, compare them. I know they’re value objects. I know their concurrency semantics. I know their performance implications.

In David’s post he says:

Strings (and Hashes) are great for exploring your domain, but once you understand your domain, data types will make your code easier to understand and easier to change.

I vehemently disagree. Hold fast to pure data, and only yield ground under exceptional circumstances. In your career you’ll be burned many, many more times by the opacity and statefulness of objects than you will reap the rewards of transparently reworking objects’ innards.

When you’re trying to recreate a complex application state to understand a bug, you’ll be much happier if your data is composed of core data types rather than a graph (possibly with cycles!) of dynamic objects (possibly opaque!) which may be dynamically generating branches as you traverse it.

When you’re trying to test a piece of code in isolation, you’ll want to feed it pure data, not spend hours trying to figure out which series of constructors can manufacture the appropriate tree. In fact, if the piece of code you want to test requires running a constructor function, you literally can’t test it in isolation.⁴

With pure data you can dump a readable version from a live production server, take it home with you, and have a perfect snapshot of a real bug. With a graph of objects? God help you.

So, back to the original question–is wrapping values in objects like this composable abstraction?

No! It doesn’t compose at all. Let me say that again: insisting upon object identities is antithetical to the idea of composable abstraction. And this is why the promise of OO as a silver bullet did not come true–we were sold the idea that objectifying something would make it more reusable, but what we ended up with is something we can’t use anywhere but a single location!

So what went wrong? Well, as Rich Hickey points out in Simple Made Easy (49m22s), abstraction that just hides things is not important. It’s a kind of faux abstraction. Real abstraction is about not needing to know things. And this code does the exact opposite of that: instead of not caring specifically what it operates on, it chooses to operate on a single new thing. This is the opposite of abstraction.

Now, lest you think I’m coming down on David too hard, I want to mention something–David is smart guy. You know how I know? He wrote something really good about exactly this problem, and I recommend you go over to his blog and read it. It’s called Dishonest Abstractions are Not Abstractions. He says:

In my book, I encourage the reader to use JavaScript and learn SQL, because the tools given to you by Rails aren’t abstractions—they are extra things to learn that provide at best a marginal increase in productivity, and that productivity only applies during the least time-consuming part of software development: typing in source code.

In your head, substitute “functions” for “JavaScript and learn SQL”, and “wrapper classes” for “Rails”. He continues:

These tools don’t meet any higher-order need a developer has. They provide the ability to execute code only and when compared to the technologies they replace, they appeal more to aesthetics than the ability to better deliver quality software.

Please don’t think I’m being facetious–I think he’s making an important point, and I think that point also happens to apply nicely to the problem at hand.

Well, that’s basically my rant.

I will yield this: one thing we lost along the way is the knowledge about what types we’re expected to use with code. Try a comment. Or even a pre-condition / guard clause. Hold fast to true simplicity–it’s the best friend a developer has.

I’ll end with this slide borrowed from Simple Made Easy⁵; I hope you’ll see how well it applies to this conversation:

Simple Made Easy, slide 34

Happy hacking. 👋

Update:

In fairness, I figured I’d share what my solution to this “problem” would look like so I’m equally subjected to scrutiny:

(defn anagrams-for [word candidates]
  (let [normalized-word (sort word)]
    (filter #(and (= (sort %) normalized-word)
                  (not= % word))
            candidates)))

Tear me apart.

In general, don’t use a class when a function will do. Classes are namespaces + functions + mutable state. Don’t give mutable state places to live. ↩︎
OK, there’s probably a special place in hell for people who mix metaphors, but you likely know what I mean. ↩︎
The reason for this hedge is that at the very bottom, things may change a touch. If our built-in types aren’t objects, we’ll need to worry about their identities. But that’s fine for value objects. ↩︎
This is another thing Rich says really well in Simple Made Easy (~56m): “information is simple. The only thing you can possibly do with data is ruin it. Don’t do it.…If you leave data alone you can build things once that manipulate data and use them all over the place.” ↩︎
If you haven’t watched it, go now! It’s an important talk, and one of the reasons I write Clojure. ↩︎

No, Seriously, It's Naming

Performance

Naming

Complexity

Related Posts