Photo by Sasikan Ulevik on Unsplash

Leaky Abstractions

Engineering Insights

Talin
Published in
6 min readNov 24, 2019

--

A “leaky abstraction” is one that fails to completely hide the details of the underlying implementation. However, before we get into some examples of leaky abstractions, let’s first talk about what we mean by an “abstraction”.

An abstract system is one in which we mostly deal with ideas and concepts rather than physical, concrete objects.

Consider for example the history of money: initially trading was accomplished by simple barter — exchanging one concrete item of value for another. This was followed by a series of innovations, each one more abstract than the last: first gold and silver coins, then paper checks, printed currency, and finally electronic funds transfer and credit cards.

Using gold as a medium of exchange is more abstract than bartering goods directly. Why? Because for most people, gold is useless. In fact, it’s uselessness is what makes it a good store of value, because it doesn’t get used up. The main reason to hang on to gold is because someone, someday, might want it badly enough to trade something for it. In other words, much of its value lies in an idea.

Transferring funds electronically is even more abstract than traditional money. While it is true that a credit card transaction does have effects on the physical world (electric signals being transmitted through wires), most of the action is in the conceptual, mathematical space — updating two ledgers in two different banks.

In software we use a slightly different meaning of “abstraction” owing to the fact that even low-level systems are built out of concepts and ideas. We often use the word “abstract” as a synonym for “generality” or “universality”: a system that is more abstract is one involving general rules and patterns which apply broadly across many cases rather than being tied to specific problems and solutions. We also sometimes refer to this as being “higher level”, as if they existed in some philosophical Platonic realm, far removed from the real world.

An example of a software abstraction: file systems

In early computers, each type of storage device had its own set of programming interfaces. Writing data to a hard disk required a completely different set of programming techniques compared to writing to a floppy disk.

In modern computers, however, all of these differences have been abstracted away. A program that writes data to a hard drive can just as easily write to a floppy or even a solid-state drive. The reason is because the programming interfaces for these drives have been generalized: application programs don’t interface directly with specific hardware storage devices any more, rather they interface with an abstract “file system” which provides an idealized model of what a drive is. This model includes the idea that a drive is a “container” of files and folders, and that each entry within a folder has a unique name that can be used to access the contents of that entry. The model hides unimportant details, such as how the data is mapped onto the tracks and sectors of the disk.

The model also has to take into account that different storage devices have different capabilities. A hard drive is typically much larger than a floppy. And a writeable CD-ROM drive can only be written to once. Fortunately, the file system abstraction incorporates these possibilities within it.

Example of a Leaky Abstraction

Let’s imagine that we are going to create a database access library. This library will allow our application to read and write records to various databases. Of course, such libraries already exist (examples are SQLAlchemy for Python and Sequelize for Node.js), but for this example we will imagine that we are going to write our own.

There are many different flavors of databases, and each has a slightly different set of capabilities. In this example, however, we don’t need all of the advanced features of a database, we just need a way to create tables, read and write records, perform basic queries, and so on. This means that we can “abstract over” the differences between database types, and create a simple API that defines a common basic set of operations.

The library will contain adaptors for each of the different database products we plan to support, each adapter implementing the simple common API defined above, with the exact same behavior. Our goal is that the application code need not know or care which database is being used. This gives us the freedom to “swap out” one database for another without having to re-write all of our application code.

Unfortunately, in this example, we forgot about error handling.

Different databases will respond to errors differently. An error in PostgreSQL will throw a PostgreSQL exception, whereas an error in MySQL will throw a MySQL exception.

Thus, when you call the library function to write a record, you might get either a PostgreSQL exception or a MySQL exception depending on which underlying database is being used. Users of the library who write code to handle MySQL exceptions will be very surprised if one day they start seeing PostgreSQL exceptions instead. In fact, this is a very serious bug that could have catastrophic consequences.

Thus, the abstraction was “leaky” in that it allowed some aspects of the lower-level system (in this case the exception classes) to leak out of the API.

The correct way to solve this is to include error handling in the design of the abstraction layer. The database library should never allow PostgreSQL or MySQL exceptions to reach the client. Instead, any such exceptions should be caught by the library itself, and converted into generalized exception classes or error codes that are defined as part of the library API.

Plugging the leak in this way means that, once again, users of the library can rest assured that their code will not break if the underlying database changes.

Leaky Abstraction Anti-Patterns

Is it possible to think about this problem more generally, that will help us identify potential leaky abstractions in the future?

In the example above, the “leak” happened because there were two different channels of communication between the client and the database, and only one of those channels was properly abstracted. For the database adapter library, the primary information channel was the actual API function calls — the input parameters and return results — used to read and write records into the database. But there was a second, side channel consisting of the exception classes used to report errors.

In fact, a lot of leaky abstractions fail for exactly this reason: there is some side channel or “out of band” communication that bypasses the abstraction. In many case, the design of the side channel is such that it is only activated in unusual or exceptional circumstances. The programmer creating the abstraction didn’t consider these circumstances, and only focused on the normal operation of the system.

Code generators as leaky abstractions

Here’s another example of a side channel. Let’s say you have some nifty software tool which will generate a starter application for you. The idea is that instead of having to take the trouble to learn a complex framework, just run this tool and answer a few questions and it will generate all of the source code you need.

Sounds great, huh? However, consider what happens when you discover a bug in the generated code. Maybe it was your bug, or maybe it was a bug in the generator. Whichever it is, now you are going to have to step through that code in the debugger. And that in turn means that you have to understand the code that was generated.

In fact, you might now be in a situation in which you are worse off than if you had written the code yourself, by hand. Many code generating systems generate complex, messy code that isn’t designed to be read or understood by human programmers. Also, the generated code often has to handle a lot of strange contingencies and edge cases, since (unlike a human programmer), it doesn’t “know” that those edge cases will never happen for your particular project. All of this makes the code hard to follow.

Our code generator abstraction is leaky because it fails to do what was promised: provide a simple, easy-to-understand interface that lets you focus on high-level architecture and keeps you from having to learn all the nitty-gritty details of the generated code. As in the previous example, the problem only arises in an exceptional situation (you detected a bug), and manifests because you’re using a side channel — in this case, the debugger, which has access to all the internal workings of the system.

Conclusion

There’s no hard and fast rule for avoiding leaky abstractions, however understanding the general idea of leaky abstractions can help you detect and possibly avoid them in the future.

See also

--

--

Talin
Machine Words

I’m not a mad scientist. I’m a mad natural philosopher.