How to program without OOP
In three recent videos, I explained at length why Object-Oriented Programming is generally a bad idea. To finish up the series, I want to address a couple common counter-arguments and then state in positive terms how I think non-OOP code should be written.
Many commenters stepped up to defend OOP by suggesting that, while it may have problems, OOP is still a valid tool for tackling certain problems and one should ‘use the right tool for the job.’ What this defense gets wrong is that OOP is clearly intended as a holistic prescription, a recipe for writing good code. Why would I ever not want to write good code? By what rules or guidelines am I supposed to sometimes do the OO thing or sometimes not? The OO authorities never say.
Another common response from commenters was that, without OOP, code inevitably becomes spaghetti. Fear of the spaghetti monster is a healthy programmer phobia, but OOP doesn’t protect us from spaghetti — instead it merely obscures spaghetti through indirection. Excessive shared state lies at the heart of most spaghetti code, and any sizeable OO program is a complex graph of mutable objects all mutating each other through complex call chains. The mutation may be indirect, but that indirectness only disguises the shared state problem without actually solving it.
So how do we avoid spaghetti code without OOP? Well, as Casey Muratori puts it, the problem with Object-Oriented Programming is not the Objects but the Orientedness: an insistence on shoving all aspects of code into small units of encapsulation called objects. Yes, we should decompose our systems into units of encapsulation, but these units should generally be quite large: typically a few KLOC to 100 KLOC, but sometimes larger and sometimes smaller. There should be no set cap on the size of these units: exceeding N LOC is not itself a code smell that requires code restructuring.
To avoid confusion, let’s call these units of encapsulation ‘modules’. Just like an object, a module has private state, and all interactions with the module from other modules should be performed via its public interface. Whether our language enforces this pubic/private distinction is not essential, but enforcement can give us a little more piece of mind. In Golang, for example, I treat each package as its own module, and Golang packages only expose elements whose names start with an uppercase letter.
In the large majority of cases, I only need one instance of a module, but if I ever need to instantiate a module as I would a class, that can be arranged. In Golang, when I want a package to be instantiable, I express all its globals as members of a struct and add a ‘constructor’ function; users of the package must first call the constructor to get an instance of the struct, and then all public functions of the package take this instance as the first argument. In this way, I can have a single package that is used by multiple other uncoordinated packages, each using their own instance. (Instantiability is especially desirable for library modules because it allows a dependency to be shared by multiple other unrelated dependencies without creating a shared state conflict.)
This encapsulation and instantiability talk may make modules sound suspiciously like objects, but there are two key differences:
- As mentioned, modules are typically much larger in scale. Unlike in OOP, we are not trying to atomize our code into bite-sized units. It’s a quantitative difference that produces a significant qualitative difference: when our units of encapsulation get too small, our code becomes predominated by interfaces and ceremony rather than actual business. Instead of a large, complex graph of objects, we want a small, simple graph of modules.
- Modules may contain data, but they are not themselves data types. Modules also may contain definitions of data types public to other modules. Unlike in OOP, we are not trying to corral a data type and all of its operations into a single unit of encapsulation. The purpose of module encapsulation is to protect state, not necessarily to keep knowledge of data types private.
Though we allow modules to grow very large, sometimes it becomes apparent that a complex module can be simplified if broken up into ‘submodules’. In other cases, a group of modules may be usefully presented to the rest of the system through the public interface of a single ‘meta-module’. However the system gets decomposed, we…
- …do not split large modules into smaller modules out of guilt.
- …keep the graph of module relationships acyclical.
- …as much as possible, avoid sharing the state of a module amongst multiple other modules. (Or more accurately, avoid sharing the state of a module ‘instance’.)
Lastly, just like we avoid atomizing our units of encapsulation, we avoid atomizing our functions. Of course, any piece of code repeated in more than one place is a good candidate for being split into its own function, but we do not endeavor to hold our functions down below some max N LOC, and so if a function is called in just one place, we ask ourselves if it really shouldn’t just be inlined where it’s called. We don’t attempt to future proof our code by speculating about what might be needed in more than one place down the line; instead, we wait until we actually need a separate function before creating a separate function. This does not mean that we never split one-off chunks of business into their own functions: sometimes the logic of a function gets sufficiently complicated that we take opportunities to extract out coherent chunks. Again though, we simply wait until we have an actual problem before chopping up our code.
So in conclusion, my prescription can be boiled down to three slogans:
- Don’t atomize units of encapsulation.
- Don’t atomize functions/methods.
- Don’t conflate units of encapsulation and data types.
P.S. Despite what some people may have assumed from my videos, I believe objects, classes, polymorphism, and even inheritance can be valid tools in some cases. However, contra OOP, these are niche cases rather than the pervasive default. For example, an ADT has a very clearly defined and well-known public interface, so hiding its data behind a public interface of methods has minor benefits with little downsides. If your language has classes, go ahead and express your ADT’s as classes.
P.P.S. If instantiability is sometimes useful, why not just make all modules instantiable by default? Did Java have it right by forcing everything into a class? Even if I don’t need instantiability most of the time, I concede that having it when I don’t need it isn’t much of a problem. What Java got very wrong, though, is expecting the units of encapsulation to almost never exceed a few KLOC, and so Java requires each class to be written in a single file, which is totally unsuitable if our classes might grow to 100 KLOC or beyond. (C# actually fixes this problem with ‘partial’ classes, allowing a single class to be written across multiple source files.) We also need the ability to define data types within our modules, but I suppose Java’s nested static classes fit that bill. So…maybe classes are what we want after all? Java and OOP just have the wrong scale of encapsulation in mind.
P.P.P.S. Object-Oriented Programming is worse than Hitler