From DRY to Confusion

Having read Integration API vs. Internal API by Jamis Buck I felt the urge to elaborate on the subject of having an integration API based on my own experiences but from a whole different, perhaps more social or cultural aspect.

I have worked on a legacy code base for some years now and one of things I am continously having hard time to understand “does this concept or operation exist only for integration?” By legacy codebase I mean a system without any automated tests originally built by many different people over time without clear structural guidelines (or architecture).

In this system a pattern of moving everything as the domain or service layer operation on the first go has emerged. I guess you could call this being pre-emptive to not repeating anything compared to Don’t Repeat Yourself (DRY) principle. However talking about DRY is much common so I will not define any new acronyms here.

While our codebase is suffering from a number of issues, this pre-emptive strike hoping never to repeat oneself or anyone else has done perhaps the most damage. It has not even cured us from repetition either — with the amount of different possible infrastructure operations it’s no longer possible to remember what actual operations are supported already. This applies even for most simplest utility services, having multiple implementations for deduplication of string lists.

Duplicate efforts to deduplication for string lists is a good way of showing the next problem with (pre-emptive) pursuit of DRY: What if you are exposing a bad idea as easy to use service? In the case of “deduplication for string lists” it is obvious: given the easy infrastructure support for deduplication, no one has ever thought of using a Set datastructure.

Deduplication example is a good one because it can easily be understood by anyone. How about the more complex domain specific bad ideas you currently have? The knowledge on the origins of why this easy to use domain specific service exists might be lost, but the end result seems acceptable so what’s the problem in using it?

The cure is for all of this is too simple. You tolerate code duplication. It is just techical debt. You should handle it eventually, but even before erasing the duplication you and your team should simply be aware of it. Perhaps when you notice that the same code pops up in three different places, you could start having discussions if something should be done for it. Perhaps someone will realize that a Set datastructure exists.

How about concept level duplication? Should it be tolerated? Intuition says no, it should definetly be DRY’d out.

However if we are talking about different bounded contexts in the sense of Evans’ Domain Driven Design, it depends. For example a simple application could have a single bounded context with an additional integration bounded context. Idea of bounded contexts allows you confine your integration concerns away from your actual application, offering a degree of separation. While there’s more to the actual definition of bounded contexts seeing it as a way to separate concerns suffices now.

Should we tolerate duplication between different contexts? Absolutely.

They might seem duplicated today, but since we are talking about integration as in integrating with some other system these two systems most likely evolve at very different pace.

Refactoring your way from a DRY’d out, not a single line of code duplicated codebase to one that supports the evolution of the different systems is not impossible, but it’s a lot of work. Before making this realization you will already have code where an operation accommodates the two actual systems with some conditional structures mixed in with the common behaviour. Identifying which blocks of code apply to internal or external system is most likely difficult — the comments, if you find such, they could be outdated and wrong.

So do anything other than pre-emptive DRY. Leave it alone if it’s between systems and bounded contexts or for later but make a mental note of it. Good things to do instead include writing tests for you code aligned with the actual use. Having tests makes the deduplication later on a non-issue.

Note: This is not my original idea, many wiser have discussed the same topic. In addition to already mentioned Eric Evans one such is Rich Hickey in his great talks.