DEV’S POV

Why Your Software Quality Degrades With Time

the short story

Handmade Software

Published in

The Startup

7 min readJan 12, 2021

Learning from your own mistakes is a big deal. You give your best, but your software has more and more bugs, the app is slow, your devs give their best, but nothing is changing. How come?

Check out the story of an imaginable car store losing its market share because of the quality degradation from the developer's point of view and how playing by the rules can be very beneficial in IT.

Architectural decision

One of the weaknesses of REST architecture is its inflexible data model. In comparison to RPC, where every client can request the methods of the server API as if they were their own, REST defines a fixed data structure called resource. On one side, it gives a developer a way to keep the data client-agnostic and universal. On the other side accessing compound data objects can become a challenge for the client. A possible solution could be to keep the objects as rich as possible, but that, on the other side, can lead to god objects and performance degradation. Keeping objects slim on the other side or splitting the data signature for collective and single object queries can lead to wrong use and possible n+1 problems.

Imaginary cars

Imagine you have a car store and a data collection of cars:

class Car:
    wheels: List[Wheel]
    color: str
    name: str

Under the hood, the wheels collection is an m2m relation table, with associated complex joined database queries.

The frontend needs to show the list of cars to the user, consisting of their names and colors. The backend developer would logically create an interface /cars/, a collective request, a listing of the cars. They would not include wheels to avoid unnecessary m2m queries to the database. The job is done, version 1 releases, the user is happy.

The first bump

To improve the listing, the user requests to see the number of wheels to know how many tires are required for the vehicle. The frontend developer requests the backend to provide this information, and the backend is busy. They refuse to implement the feature in an acceptable timeframe. The stakeholders pressure the managers; managers push on the frontend: the feature is required now.

The manager wonders: how is that a problem? We have the list of wheels and, ergo, their amount on the vehicle's detail page. That’s right, thinks the frontend developer and performs multiple detail requests in the list view. In his developer environment, everything works fine, just 100 milliseconds to perform the task. The shiny new listing with the number of wheels is there. It works perfectly fine on the feature branch and staging because there is no load on it. QA gives a “go,” and the feature is released.

The next day the production is down. The backend developer is up early, which is already a bad thing. New Relic and database monitoring say the DB is down. Further investigation indicates that previously not so popular interface calls have increased by 100 times, and there is no plausible explanation for that.

The sprint passes, the retro comes. The backend developer claims the frontend would misuse the interface without notifying. The frontend says the feature request was made and underlines the pressure from the business side. The backend is sick and tired, and they declare a rule to give all possible data through the query parameters to the backend. Like this: /cars/?with=wheels. The frontend can finally specify which data is needed.

Let’s make it quick and dirty; we’ll fix it later.

Time passes, the frontend is happy with the approach at the beginning. All the data is there at hand, just gorgeous. Until one day, there is a strange problem coming up: null doesn’t have an attribute, or how is it called in javascript :D The frontend developer figures the field name is somehow empty on the car detail view. What happened? The junior developer reused the backend connector from another view, the graph showing the distribution of wheel count for the cars in the database.

With the big power comes great responsibility, so the freedom of changing the data signature has liquidated the central principle of REST — data model consistency. The objects the frontend has requested were almost identical and reshaped for the immediate purpose. The service contract between the frontend and the backend was broken through that, which led to an error.

The sprint passes, the retro comes. The frontend developers curse the decision to make the queries adjustable and claim that the backend has to deliver only a consistent data model, so it needs a separate interface for each view. The wheels count stats view gets a shiny new /car-wheels-stats/ interface: all happy, all good. From now on, different data — different interfaces.

Three sides of the same coin

Time passes. The company decides to launch a mobile app: for both android and iPhone, native because the developer team is concerned about the performance issues and the acceptance of the design language. The apps are designed from scratch, the shiny new lick-ups you’d eat.

The app developers create the click dummies; everybody is thrilled with how the project is building up. The time to connect the backend comes. The apps have a different layout and user interaction, and they are completely different from what the users see in the frontend web app. The data model has become quite complex since then, but the backend delivers the new interfaces just in time. The apps are released; everyone is happy.

Because different teams are responsible for the app and the web app, the design language is not uniform. The next endeavor is to unify that language. Slightly different views get identical look, so does our car list. People come and go, so does the knowledge.

One day a strange bug comes up: the number of wheels is showed up incorrectly, but only on the iPhone app. Investigation finds that the backend doesn’t deliver the right amount. The backend team is motivated to narrow down the problem.

After a closer look, the backend team finds out: there are two places where the amount is calculated: one is in the utility code, and one is in the request handler itself. Both are almost identical, but not exactly. Reason: someone has copy-pasted the code, the person that doesn’t even work here anymore. Fixed.

The sprint passes, the retro comes. How come that the code was duplicated? How come that the same functionality was implemented twice? Nobody remembers the data consistency problem anymore, even those who participated in the original discussion. Further investigation shows the whole backend is doubled, tripled, and quadrupled with similar functionality, depending on which client has requested the data.

Finally, finding the wrong guilty one.

The CTO hasn’t seen unit testing as a crucial investment since the beginning of the project because the product team just had to launch something and satisfy the investors. At the end of the day, we have our QA-team. They’ll find the most crucial bugs. The project grows bigger and bigger, leaving no possibility for covering the project with unit tests post factum.

Impulse driven decisions lead to the data model inconsistency. The initial framework agreement for data transmission (REST) was broken for the sake of a little but visible and immediate benefit. Frontend driven backend is no longer extendible and just collapsing under its destroying complexity and self-contradiction. Refactoring seems such a titanic task that no one sees any possibility to accomplish it.

That stinky legacy no one wants to touch.

Time passes; the feature development is getting slow and sloppy. Adding a simple new functionality becomes a huge struggle because the front-end-driven back-end needs to implement each feature for each client separately. The parts of the supply chain, not familiar with the implementation details, can’t comprehend the reasons for the struggle. The tension grows.

Every interference with the existing functionality leads to hardly understandable bugs. There is neither will nor knowledge required for the diagnosis. Along with the business, the product team decides to freeze new feature development and start over with Cars 2.0. “Cars” officially delivers no new features, only maintenance. The CTO gets fired, the frontend lead takes his place. They have mastered Javascript throughout their entire career, and now it’s time to switch Javascript in the backend, too, so all the developers become full-stack developers.

Some of the old developers get fired, some have to requalify for javascript. Javascript is a popular language; it’s so much easier to find the developers for those positions. The new CTO even gets honored for reducing the costs. There are no developers with more than 3 years of development experience in javascript for the backend, though, because production-ready javascript for the backend is not much older.

Reinventing the wheel count

The newly formed full stack team decides to go away from the old sad REST API and start using GraphQL. The backend becomes slim and is no longer an impediment for the frontend to access the data. The old platform is still online. The team is very optimistic about its plans.

The new responsive web app is almost done, it’s deployed on the test and staging environments, and all looks perfect. The day Cars 2.0 get released, 5% of the traffic gets redirected to the new app after the user’s consent. 10% of the users agree the application collapses under 1/20th of the original traffic. The release is rolled back; the CEO apologizes to the users personally over the email.

Sprint passes, retro comes. GraphQL takes much of the work, but the team slowly realizes that everything comes at a price. The more freedom the client has, the more is its power to overwhelm the data storage. So exactly that happened. Sounds familiar? Like with customizable query parameters on the early stage of our imaginable cars store, increasing the data interface's flexibility has led to severe performance issues.

Inability to release the new version of the software and at the same time, no new features for the old one kicked our car store out the business. The competition has learned from our car store's mistakes and stuck to the programming paradigm, regardless of how dogmatic it seemed to be. You can only benefit from the technology if you use it correctly.

Next time, when you reinvent the wheel count, check out if you didn’t get off the track.