TDD (Technical Design Document) as part of the code

Ganesh Kumar
techBrews
Published in
5 min readMar 15, 2020
Photo by Glenn Carstens-Peters on Unsplash

Most applications are not capable of changing with time because of one reason — The developers are not aware of the reasons that went into making certain design decisions of these applications and hence nobody dares to change them.

I started my career as a developer working on a modular monolith application for a Global Bank. The application was in place years before I joined the team and it is being enhanced years after I left the team. During my time, the one big difficulty I faced is to find why a web-service was designed the way it is. Even though I know it can be changed for better, nobody wanted me to touch it because of the infamous saying in IT “If something works don’t touch it.” The reason is so simple, Nobody knew why something was the way it is. They assumed there was a reason for the way it’s designed and a change might bring undesirable results. They were right. Not always the things you see as a developer is in sync with the system design. Those are decisions that are taken at that point in time for some very specific reasons. The entire thought process goes into one document which is the Holy Bible for all the developers, DevOps guys. It is the Technical Design Document.

But the problem at that point in time was the design document did not talk about design decisions and the reason behind it. It talked about how the code is organised, what each of the blocks functionally does etc, all of which I could see when I browse through the code.

Upon my transition from developing an application to designing an application, one of the first rules I made for myself is to log the reasons behind the design decisions in the TDD. Let me share a few pointers on how I write TDD and more importantly how I decide on what goes into TDD with an example.

API Integration Layer at the centre of the action connecting mobile channel to CRM and CBS systems.

The Integration layer that’s designed is supposed to subscribe to the messages on a topic and validate, enrich the data and send it to two core systems. (CRM and CBS) Now, this could have been done as a single API. But I have to split it up into two APIs. (A good developer will curse me on the first look of this) First API gets the messages from the topic, validates and enrich the data and send it to CRM and drop the messages into a queue. The second API will consume the messages from the queue and sends it to CBS. (It does nothing but this). Any developer can get all the information that I have described so far by just browsing through the code. All they need is the missing piece of the puzzle. Why two APIs? Now, this is what should go into the TDD.

The downstream CBS cannot handle huge loads. The Integration API receive a high volume of messages and if all them are processed in real-time the CBS would fail some of these messages. Their communication with CBS is over Http and hence the onus is on the Integration layer API to make sure it balances the load. And hence the messages were posted to a jms queue instead of calling the Http endpoint directly.

API Integration Layer components and the Data flow explained

Now the interesting part is balancing the load. The second API had two important factors that justify the need for a separate API. Maximum sessions and the acknowledgement mode. Max sessions is configured at the jms queue listener. (It means at a given point in time, the second API will handle only ‘N’ number of threads/jobs where N is configurable). The max sessions ‘N’ was calculated at that point in time on the basis how many parallel requests CBS could process. The acknowledgement mode is set to the client (which meant if the listener process doesn’t confirm after processing the message i.e. if the processing fails, the message would still be in the queue to process it again until it’s acknowledged). Now just because CBS couldn’t take up the requests for processing, we can’t repeat the process of validating and enriching the data and we most definitely can’t send the same processed message to CRM twice either. Thus, sending the processed message to CBS has to be separated into an API.

Now the TDD also speak about what needs to be taken care during Ops. The Integration layer APIs are deployed on an easy to scale platform (read Docker and Kubernetes). On high loads, it is common to scale up the number of instances (read pods). The component on which the queue listener is running also has other APIs which needs scaling up. (Why we grouped these APIs on a single container is again something to explain in TDD) so while scaling it needs to be taken care that the sum of the max sessions of all the active pods sums up to ’N’ at any given time to avoid stress in CBS, which means a production support guy who scales up the pods, have to reconfigure max sessions.

As a developer/support guy, I will hate to read a TDD to explain to me how to configure properties in a configuration file (read config map and yaml files). These are things that I could either figure out myself or are they a Google search away. But what I really need to know is what value should it be configured to.

This TDD is now part of the code base and whenever a developer does a git-pull he gets to read the TDD to understand why something is designed the way it is. Now years after I moved out of the project, the bank migrated to a modern CBS that could support high volumes in real-time. With TDD being part of the code, the current development team concluded two separate APIs approach is not needed anymore. Their knowledge of the legacy design (from TDD) and the reason behind it empowered them to make the right decision.

--

--