The Case for Consistency over Perfection

Matt Schuchhardt
Engineers at Sea
Published in
7 min readMar 3, 2021

When working on software projects, you will frequently encounter situations where you must make a decision: do I use the existing code (or system, or framework, or library, or configuration) that is already standard throughout the code base, or should I try to create something new and better? I want to make the case that using well-proven code and configurations provides a massive improvement in the maintainability and long-term viability of a code base.

To be clear, I don’t claim that this is some new revolutionary idea. But I’ve experienced enough software systems that have begun to suffer under their own weight because enough consideration wasn’t given up front about how to avoid maintenance redundancies. Hence, I feel that this bears repeating. I also couldn’t pass up the opportunity to introduce even more experientially-founded software engineering best practice guidelines into the world.

Before proceeding, I want to more directly define “consistency”. Consistency is closely related to the “Don’t Repeat Yourself” (or DRY) principle — that is, using abstractions or other patterns to avoid duplicate code or concepts throughout a code base’s architecture. Consistency is a bit more general than DRY, however; for instance, you could scatter your documentation around five different locations and not violate DRY, but this would not be a consistent way to store documentation.

Case study: interfaces

To illustrate this more concretely, I wanted to dive into a couple of examples that we have encountered as part of our engineering team, and what some of the org-wide impacts were. One of the core parts of our code base consists of a variety of heavily customized ETL (Extract-Transform-Load, a common data warehousing pattern) pipelines. This was not built up all at once, but rather continued to grow steadily over time. There was a significant amount of similar logic between these ETL processes (as is the case for most ETL process), but unfortunately, there was only a limited amount of consideration given to code sharing and creating a common interface that the ETL processes could use. Instead, each time a new service needed an ETL pipeline, some boilerplate code was copy and pasted from an existing ETL pipeline and modified for the new use case.

On one hand, the advantage was that each ETL pipeline tended to be well-optimized to fit each new integration’s requirements. Unfortunately, since each new ETL pipeline only had limited code sharing, fixing a bug or performance problem in one pipeline wouldn’t automatically apply to all existing pipelines. This might be acceptable for one or two pipelines, but when you have 10+ similar-but-isolated pipelines, architecture-wide improvements are difficult at best, and at worst don’t happen at all. The result is that we began spending more and more time doing daily maintenance of the systems, which started taking up more and more of our engineering bandwidth. In the end, we created a centralized replacement for our old ETL pipelines which abstracted a significant chunk of these ETL processes to one location. Instead of each ETL process being independent of one another, this central interface allowed us to only focus on differences between each ETL pipeline. The similarities between each pipeline were then handled by shared code, which allowed us to iteratively improve the overall performance of the system while reducing the amount of ongoing maintenance to a fraction of the original.

Case study: configuration management

This tradeoff also extends to system configurations as well. We use Ansible as our primary configuration management system. If you aren’t familiar with Ansible, it enables “configuration as code”, which lets us use templates and simple commands to configure our various machines. Common configuration targets include databases, users and permissions, and even the operating system and application itself. However, our first few attempts at using Ansible were not very idiomatic; instead of making use of Ansible’s template system, which lets you use system variables to drive the configurations based on context, we would effectively just copy and paste a new configuration for each machine and configuration.

The result is that it was very common for the configurations to “drift” like the ETL pipelines covered in the previous section. For instance, tweaking or improving a configuration for one database on one machine would only fix things for that one instance, but not for any of the other database configurations. Of course, sometimes you want to modify a configuration for only one instance, but from our experience, this is relatively uncommon (and simple to do in Ansible anyway) and is rarely worth the difficulty in managing a significant number of database configurations. Continual improvement of the “common” part of the configuration has dramatically improved the stability of our database and production environments.

Case study recap

To briefly summarize our findings from these case studies, try to picture your code base or configurations as a branching tree structure (after all, isn’t everything in CS representable as a tree of some sort anyway?). Every time you duplicate something from one of the tree leaves, you create an additional branch that must be managed and cared for. This is acceptable in small trees; after all, if your sapling only has one or two leaves on it, it’s sometimes trivial for one engineer to be able to reason around the whole tree without too much thought. Even better, not having to deal with an abstracted interface is by definition more direct, and generally easier to reason about. “Don’t repeat yourself” is a rule, not a law, and should be applied pragmatically. But once you see the same code resurface three or more times, there’s a very good chance your sapling has grown into an expensive-to-maintain mess.

An innocent pair of copy and paste operations effectively triples the maintenance surface from the original block of code. If you can generalize A to be reusable instead of copy and pasting, you can maintain a smaller upkeep surface area.

In solo or small projects, code duplication isn’t going to kill your project. If you have to update a couple of locations each time you make a data access modification, you know exactly where these locations are already. Where this concept becomes important is when you can no longer house the entire code base’s state in one developer’s head. The moment that this happens, duplication and code drift becomes much more of a problem. Hence, this is a particular concern for more major projects and teams of developers. By properly avoiding inconsistent code (and systems and configs), you reduce the tech debt drag on your teams, which enables you to spend less time putting out fires, and more time actually moving your projects forward.

If you are an MBA who somehow stumbled onto this article: reducing tech debt and putting effort into high-quality code is what lets your engineers avoid projects grinding to a halt over time. Maintenance overhead can absolutely be a project killer.

Best practices and recommendations

To make this all a bit more concrete, I wanted to provide some best practices to help guide the tradeoff between consistency and optimization. These don’t have any empirical research behind them, and are based purely on my experience, so take this advice accordingly.

  • Start consistent, then pursue perfection — instead of immediately trying to create your own “better” system, start with the existing standard. If you can modify the existing standard to also fully suit your requirements, you can take advantage of your development team’s economy of scale while minimizing the maintenance surface.
  • Only duplicate code or configurations up to one time — if you need to use a (relatively small) code chunk or configuration in two different locations, it may just be a coincidence or one-off thing. But if you need to copy the code to a third location, you’ve established a pattern, and it’s highly likely that you should be abstracting this to a central location. I think allowing this in up to two locations does give you some artistic license around the “do I abstract this” question, as abstractions necessarily are more difficult to reason about than more linear code. But once you repeat yourself three times, it’s really hard to make a case for not centralizing things.
  • Don’t create a new interface or abstraction unless you know you need it — the counterpoint to this discussion is that writing heavily abstracted code that doesn’t benefit from an abstraction can be equally damaging as duplicate code. Abstractions make code harder to reason about, and don’t provide any benefit if multiple systems don’t use that abstraction.
  • Never duplicate code unless you deeply understand it — if you are duplicating code (or moving it to a central interface), you need to have a deep understanding around not only what the code does, but also why it was written this way. Copy and pasting code without any understanding can be severely damaging to a code base over time, and you should never take code history as gospel. There’s a certain amount of self-loathing that exists in any mature code base, and discussing foundational code with other developers will often expose better ways to design code than copy and pasting.
  • Don’t roll your own unless you can claim a 5x optimization improvement over the existing system — if you want to roll your own system instead of using something that already exists, it needs to be able to stand on its own to justify the cost of maintenance. This means that the improvement can’t be merely incremental: you need to get a full 5x or better improvement (in whatever metric the context demands) from the new system. Ideally, you can sunset the legacy system and move all existing code over to use the new system, but this is generally only feasible if you can provide interface compatibility, which can be tricky.

--

--