Software Development Best Practice — #1 Do Not Repeat Yourself

Milan Gatyás
Life at Apollo Division
7 min readSep 25, 2020
Photo by CJ Dayrit on Unsplash

I have several years of experience writing critically important, production used source code and over time have had many chances to see that some of the most well-known software development best practices matter. Also, some of them matter more than others in a regular day for Software Developers.

I noticed that some of these truths of the software engineering world are unknown to junior/mid (sometimes even “senior”) Developers, or it is not understood why these best practices should be followed and what the benefits of them are.

In the following series I will list a few guides and best practices I have found important and valuable on my source code writing journey.

Let’s start with the one I personally consider to be one of the most important: Do Not Repeat Yourself

Let’s say you want to update your CV with a new entry — you finished your academic degree, or you got the certification you were preparing for. You want to let active IT recruiters know about the change so that your chances of getting a better job position increase. However, you sent your CV to each recruiter specifically. Now you need to update everyone with your updated CV version. And you already lost track of to whom you sent the CV.

Analogically, similar is very easy to do and usually very hard to fix while programming. The existence of repeated, forgotten fragment of the code is, in my experience, a frequent reason for inconsistent and unexpected behavior of the application.

In one of the legacy applications I was working on, we added a new database column for language specific variation of the text (let’s put database table design considerations aside for a moment). The table itself was queried from many places of the application. On each place, the query was created by listing the query columns. Thus, to query the new column, we needed to revisit all query constructions and update the query string. The task took multiple times longer than if the query was constructed once by a dedicated entity and reused by data consumers.

In another application, we validated voucher codes on multiple places. The validation logic was partially replicated for each place. It was using “helper methods” to do so, however, a small part of the code was duplicated. At one moment, we added a bit of logic to one of validation areas, however, it did not made same change on others. As a result, the voucher was invalid in one area but still valid in others.

Lastly, I was reading the legacy source code of the partner company. The class consisted of many methods, each of which were wrapped in try-catch block with logging logic. Handling for all methods was almost identical, and formed a majority of the class size, creating a visual clutter which prevented me from reading the code and understanding it easily.

Above are only few examples of hassles that duplicated code logic can create for you in your everyday life.

How To Approach Code Duplication Problem?

Making sure your code logic is not duplicated is, in general, fairly straightforward. However, it can be challenging to adhere to in the long term. Sometimes the code duplication results over time and you cannot predict, nor prevent it. Then you usually need to refactor the code and deduplicate it.

Let’s take a look on several examples of theoretical code duplications on different code levels and a few strategies of how to prevent/resolve them. The solutions will be based on .NET C# language, though it is transferable to most of the common programming languages.

Literal and constant duplication on is one of the most common ones. For example, date serialization format used across application, email verification regular expression, or number of decimal places your numbers are rounded to. In case these are standardized across your application, you do not want to keep multiple instances of these constants and literals in your application for reasons already mentioned.

Let’s take email verification regular expression as an example. The most straightforward prevention of duplication is to create a field available across the application, which can be referenced to:

string emailValidationRegex = "...";

When the validation policy changes, you update the expression only once and the change will be reflected on all expression consumers. Depending on the needs and nature of the literal/constant, this might be a good enough solution to prevent the code duplication. Of course, there is a chance that e.g. new team member is not aware of the field existence, did not look around the source code first, and duplicated the value. This problem usually gets caught in the code review process by a more senior team member.

A more defensive approach is to design your code in a way that it requires a specific type to be provided. You can put the email validation expression into a type, which you will require in your consumers.

A developer who wants to construct an instance of EmailValidator is now inherently driven to inspect the code and find whether the other type in solution is able to construct the EmailValidationConfiguration instance. There should be only a single type able to construct the configuration instance. If there are more of them, there should be a factory type deciding which one to allow to instantiate the configuration object.

Method code duplication is usually solved by encapsulating the reusable code block into a separate method. The type containing the method is injected into the consumers and consumed. Let’s continue with the email validation example.

All consumers requiring common validation mechanism should consume this method. Again, there is a possibility to increase the visibility of this method by requiring a specific type in downstream processing types.

Type design structure of the code is now naturally requiring a specific sequence of actions to be made: For email to be processed, it needs to be first validated by the email validator. Still, however, the Developer of the EmailProcessor type needs to be aware that ValidEmail type exists in order to create the bonding.

On a bigger scale, code duplication occurs across the applications. Take, for example, a microservice environment. Typically, each microservice is producing logs, and usually you want these logs to be in uniform format. It might be a dangerous idea to maintain a logging helper in each microservice separately. It is easier to create a logging library package, which can be shared among the microservices. The new feature is then developed on the package level and projected into microservices by installing the new package version in microservices. On the other hand, the isolation level of the microservice is decreased, which might not be acceptable for certain scenarios.

Other Benefits Of Code Deduplication

Attempting to write a non-duplicated code also brings other benefits to the table. It promotes single responsibility principle as the usual shared code you write is doing only a single thing. It also helps to promote loose coupling as the isolated code chunks are easy to inject into consumer types. That, in effect, supports unit testability as you can test isolated functionalities of the code, while mocking dependencies.

As with everything in software development, there are always exceptions. In some cases, although you are writing the code fragment you already wrote in another place, a better idea is to keep it that way. This can be the case for a small fragment of code, which on its own doesn’t do anything meaningful (not fulfilling single responsibility principle), and isolation of the fragment with backward injecting makes the code harder to read. Or you have the utility/extension method which you use on multiple solutions, but you don’t want to create a separate package for it, as it is just not worth it, and having possibly different forms of the method on different solutions is not an issue. On each such case, one should think critically about the profitability of the deduplication, and risks associated with the future change requirements (and you bet change requests will always come, sooner or later).

I do hope that the text above made some reasonable points for trying your best to avoid code duplications, and possibly gave you some ideas to help you with the effort.

We are ACTUM Digital and this piece was written by Milan Gatyas, .NET Tech Lead of Apollo Division. Feel free to get in touch.

--

--