Tuesday, April 5, 2011

Just the Facts Ma'am

When doing object-oriented programming, the hardest things to define are relationships. How does one class related to another? Does an instance of that class A have a bunch of class B? Do objects of class B need to know about the class A objects that hold it? Whats the best way to model this relationship?

On the one hand we want loose coupling. Just throwing in a pointer back to class B might be convenient, but it adds a dependency. On the other hand we don't want to avoid the natural relationships between things. We struggle with these concepts regurarly.

But there's an even more fundamental question underneath these struggles

Is what we're trying to accomplish descriptive or behavioral in nature? Is this piece of our code capturing a set of facts? Or is it a collection of actors that react/act on each other?

First lets think about what we mean by these, when we have a set of facts we mean a set of public data describing that fact. If our fact was a piece of information about baby names, we'd include lots of useful information about that baby name. For example we'd include the popularity, the culture of origin, the meaning of the name, etc. We also freely build associations between facts and other facts. Lets say we create a seperate fact describing all the stuff we care about cultures these names come from. Through whatever mechanism is available, we can relate the names back to the culture.

When we design actors, we care about encapsulation in the true tradition of object-orientation. We want to expose a small set of specific behaviors that can be executed. They may use facts internally or in their interface as value types. Nevertheless we treat an actor as a black box. Any internal facts being used are hidden to us. If our fictional baby system let you adopt a virtual baby, then the baby itself might be a good example of this. The baby can be fed, burped, and entertained. We have no idea whats going on inside the baby. We just no sometimes it poops and sometimes it cries. Sometimes feeding burping or entertaining helps, but we can't go and unset the poop bit in the baby no matter how much we'd like to.

It should be pretty clear that these are orthogonal concepts. Facts represent information in its most uncluttered form with no behavior. Actors represent behavior with information hiding. This orthoginality is the main theme of Chapter 6 of Robert C. Martin's Clean Code. Martin discusses the differences between working with facts (he calls them data structures) and actors (he calls them objects):

"Objects hide their data behind abstractions and expose functions that operate on that data. Data structures expose their data and have no meaningful functions. Go back and read that again. Notice the complimentary nature of the two deļ¬nitions. They are virtual opposites. This difference may seem trivial, but it has far-reaching implications."

Martin emphasizes the strength of data structures to the object oriented crowd -- we can easily write a procedural function to do something new with the data without impacting the rest of the system. Somebody can take our baby name data and output it in their mobile baby app cause its just a collection of data. Awesome!

I think the depth of the distinction though goes deeper then Martin describes. The two aren't just opposite on the encapsulation spectrum. The two relate to their brethren in completely orthoginal ways -- facts relate to other facts in a fundamentally different way then how actors relate to other actors. Actors use other actors to do work. The baby object is going to use the mouth object to eat. Actors also implement interfaces or override base classes to provide modified behavior. Facts associate with other facts to indicate additional information -- this baby name is scottish -- the other scottish baby names are ... these people who like scottish baby names. Facts associate themselves with other facts (ie baby names associate themselves with cultures) to provide even more implicit facts. We like our facts this way, and its natural to operate with information along these lines.

To summarize:
  • From a "Facts" perspective, the universe is a database of useful, interrelated but inert information. Exciting connections can be made between different pieces of information and followed freely.

  • From an "Actors" perspective, the universe is a chain of command with main() as God. From there behavior is delegated to different objects. main() orders an object to do something, which orders someone else to do something. Behavior is commanded in an orderly fashion.

We're dealing with two different social systems. As you can imagine, the worse thing we can do is to try to make a fact like an actor and vice versa. Along these lines, in Clean Code Martin talks about avoiding "Hybrids" between these two. He describes data structures with methods added to them to perform some significant behavior.

I'd go a step further, now that we see how these two social systems function on different planets, its even more mind blowingly dangerous to make hybrids between the two. This is the road to everyone knowing about everything. This is where spaghetti code comes from. The "fact" side of the hybrid makes us want to reach out and know everything about everyone. We need to build associations between interrelated facts. However, the actor side of the hybrid wants to tell things what to do. Putting those two together, we've built something where any object can tell any other object what to do. So if somehow we had a Baby object that also had all the stuff about the name globbed into the baby. Say Baby had an accessor for getting the name, the name's originating culture and so on then from there we can find all the other Baby's in that culture through the hybrid culture object. We can now add methods in Baby that operate on all babys! Cruft like this builds and builds in classes like Baby like you wouldn't believe. With hybrids in play, You end up with big ball of mud. woohoo!

This situation arises because Baby starts out as an actor, jumps down into the fact parallel universe, jumps around the fact network of associations, and then because our facts are also hybrids, we can jump back up to the actor universe and tell those facts err I mean actors err I mean facts to do something. We've create a very bad confusing object oriented version of goto.

With that pitfall in mind, when we're back deciding how objects of a given class should be coupled, we need to ask ourselves why we're coupling them? Is it because they are related pieces of information? If so, then as long as information stays inert and behavior-less, allow facts to be facts. Let them relate to each other like they would want to in fact society. Maybe this evolves into a relational database with foreign keys linking rows together, providing additional implicit facts. Maybe its just a couple of nested structs generated and passed in as values, summarizing the important facts.

We should also be sure to take our facts out of our actors. Let information be standalone for all to consume and use it. Don't muddle the information with behavior. Be careful to avoid making behavioral entities pieces of information. Let them tell you facts, but be able to use those facts independent of the actor. For example its natural to use a baby name fact in the Baby actor, but avoid passing around Babys to get baby names. Use the fact instead. If you don't, you'll soon end up with an insane mass of spaghetti code. Once separated out, we can respect facts for what they're good at, giving us information.n. And now that we've separated the facts from our actors, *then* we can figure out how the actors should interelate to implement behavior.

When we let facts be just facts, we can feel confident navigating their web of knowledge for what we need to know. It becomes the "model" in Model-View Controller terminology. When we muddle the differences between the facts/model and the actors/controllers the inertia of software development will begin to create connections between otherwise unrelated pieces of code through the underlying information substrata of our software.