When building software that has to endure the test of time, it is wise to assume that change is inevitable. It may come in many forms: the business evolves, engineers come and go and new technologies emerge. In order to protect yourself from this last category it’s wise to follow the “depend on interfaces, not implementations” principle.
As an example, let’s imagine that you’re building an append-only accounting ledger that persists its entries to storage. For this use case you just need to do two things: some simple validations on the input record and then save them to disk using SQLite as a database. While this is clear for everyone, depending on your experience and context you probably would implement this as straightforwardly as possible with something like:
This is fine as a starting point but what would happen if you wanted to add support for PostgreSQL? The simplest way would be to just add an if:
While not a terrible solution, this is already starting to seem smelly since we can anticipate that things will only get worse every time we add more storages. At this point it seems sensible to introduce an AccountingLedgerStorage to hide this chain of ifs under the cape of polymorphism:
The advantages of this are:
- Proper encapsulation: each concrete storage class has only the bare minimum needed to manipulate that storage. No more SQLite mixed with PostgreSQL.
- Separating concerns: AccountingLedger now has only the business logic and delegates all its operations regarding interaction with storage to separate classes.
This makes the code easier to understand and test which leads to future productivity gains since that means it will be easier to extend with new behaviour or fix bugs.
With this new design you’re also empowering your users (which might be other engineers) to write their own AccountingLedgerStorage! This means that even if this code was written before NoSQL was a thing and you’ve abandoned it since then, your users would still be able to plug in their own implementations and for instance use something like Cassandra.
At Feedzai we apply this principle on quite a few cases, meaning that if needed we can quickly swap the underlying tools our product uses. Some examples of this include:
- Distributed consensus: at the moment Zookeeper our implementation of choice, but if we wanted to get some kind of enterprise support we could easily add support for Consul.
- Messaging: our products support both RabbitMQ and ActiveMQ. Running in production? Go with RabbitMQ (scales horizontally, is reliable, has good performance, etc). Just doing development? ActiveMQ is the way to go because no setup is needed.
- Storage: a ton of SQL databases (H2, PostgreSQL, Oracle, MySQL, …), Cassandra and Amazon’s DynamoDB so that it adapts to whatever our clients are used to (and prefer) running.
- Distributed data processing: we currently use Spark (and AWS EMR) and will be adding support for Flink as well. While Spark is widely used across the industry and many people have experience writing jobs for it and managing it, Flink is more suitable for streaming jobs.
However useful this may sound, this was possible because the people who developed these abstractions took special care to keep the interfaces thin and by making as few assumptions as possible about the systems that would implement those interfaces.
You must do this not only to avoid tremendous amounts of work when rolling out a new implementation but also to even making it possible. As an example, imagine that you are writing a certain storage abstraction for Cassandra. If you assume that all future implementations would also be highly available then you wouldn’t be able to introduce PostgreSQL as a substitute without a substantial rewrite.
In spite of all the advantages of these techniques and all efforts to make our abstractions as generic as possible we have to keep in mind that all non-trivial abstractions are leaky.
The consequence of this is that no matter how well-designed they are, you eventually will encounter a situation where you need to understand what the underlying implementation details are, you can’t just treat it as a black box.
What do I mean by this? Recently, when extending our distributed data processing engine to enable Flink usage, I had to implement the Map Partition function. It doesn’t really matter what the function is doing.What does matters, is the fact that it already had been implemented before, using Spark. And now I had to add support for Flink.
The problem I encountered (while receiving identical results for Flink and Spark) was that there was a hidden but significant difference in behaviour: while the Spark version used lazy iterators (load elements to memory only when needed) the Flink version was not capable of doing that but instead loaded the entire dataset to memory!
This is not tolerable when having terabytes of data. Therefore we had to get rid of the Map Partition abstraction before it ever reached the hands of a client.
In short, even if able to satisfy one certain functional requirement — unless the non-functional ones (like performance, availability, etc) are exactly the same (which is nearly impossible) — you end up with leaky abstractions. Which forces your user into headache of having to be aware of what’s happening underneath it all, while the whole point of abstraction should be to not having them to worry about that.
It may sound contradictory but as you become aware of the different pros and cons you will be able not only to better design programs, but also to understand the libraries and tools you use more easily.
As a caveat for this whole post: over-engineering is a real problem even if sometimes we tend to go down that road without realising it. Software is only as good as the value it brings to the business so the techniques outlined in this article are only useful if applied on the right context.
Writing software in an abstract way is an investment that only pays off if you end up taking advantage of those abstractions. The examples provided here are simple and don’t really add much to the cost of development, but in complex real-world applications doing things “The Right Way” under the argument that in the future the code will be easier to extend and include new functionality may turn out to be a waste of time.
Overall it’s something that needs to be evaluated in a per-case basis and doesn’t have a one size fits all answer.