How I deal with greenfield technical debt

Firstly, let’s get out of the way the definition of “technical debt”.
Here’s what I know about it:

  • Technical debt is mostly associated with duct-tape programming and a focus on short-term outcomes.
  • Technical debt is also a result of over-engineering and a focus on long-term growth, albeit less frequently acknowledged.
  • Technical debt is not only about code, it’s also about architecture.

From this, therefore, I’d like to define it as:

Technical debt is any technical choice that hampers the short-term and long-term growth of a business.

Introduction

We’ll go through the relationship between technical debt and planning, prototyping, testing, validation and discipline. We shall learn how each of these things affects the end result.

Determining viability

The right way to scale your company” begins with learning and validation.

We’ll assume that your client is interested in validating his idea. If he isn’t, he’s either already very rich because of his genius — or he’s simply gambling. He’ll also know that the map doesn’t tell you everything about the terrain. In my experience, in most cases there isn’t even a map, which is exhibited in poor planning, vague and wobbly requirements, anaemic specification, plenty degrees of freedom, ill-fitting implementation, technical debt, crappy execution, and consequently — failure.

Technical debt is a symptom, not a cause.

Determining viability is an experiment. An experiment starts with a hypothesis and rewards the client with the competitive advantage of information. Learning it sooner means he’s able to act faster, and thus succeed.

The experiment begins with non-technical steps and once it’s made it far enough, technical solutions will be needed.

This is where you as an engineer come in.

Note on validation

Validation is for the masters of the trade. You have to figure out the right metrics and success acceptance criteria. It is difficult to do because of distractions. Always reconnect to your end goals.

I’m experimenting with this on my non-profit Git Work project. I must say, it’s not easy, especially that like most people, I’m used to doing rather than thinking. Example is starting your project by writing code and not exploring the competition.

The Prototype

Could also be a proof of concept (POC) or minimum viable product (MVP). See explanation.

The client needs a prototype to further his experiment. Perhaps he wants to show it to potential users. But this experiment requires some realistic live data processing and may benefit from a message queue rather than a database.

I’ll create a file with some events, called /log/file, and an app called experimental-app that reads from standard input. This is enough to support more advanced streaming in the future. To append a message to the queue, I do echo message >> /log/file, and to process messages live, I do tail -f /log/file | experimental-app. The web developer can append messages to the log from PHP without having to ever know about Kafka/RabbitMQ connection strings and SSL certificates. I place it on a cheap VPS and hand it off to the client. Job done, first iteration complete.

Technical debt..? Only if you’re wicked enough to take this into production… so, absolutely not!

Note on testing

Even at prototype stage, you may need to debug code. Given that in many cases, debugging is in both short- and long-term a lesser alternative to test-driven development, I will actually write tests so long as they work in my advantage in producing a prototype quickly. This typically means at least a few unit tests. It’s not as hard as you think.

Plan

Being a good citizen, you explicitly note down a bunch of issues on JIRA or in the README of the project: “No automated deployment, cannot scale beyond manual deployment”, “No proper message queue, cannot scale beyond 1 server”, etc. These can be called bugs and now you can do bug-driven development.

You move on happily knowing you didn’t waste 2 days of the client’s time giving them a RabbitMQ cluster (whoop de doo…), an event processing framework and an automated deployment pipeline, should the result be anything other than “great, let’s continue exactly in the path we were going”. It may turn out that what you thought as an issue is not an issue after all, too: good on you for acting on it, but not prematurely solving it. For example, you find out that you didn’t need to shard your data after all when your client updates you with the exact data rates.

When you take a shortcut, you must document it: “it is not scalable right now”. Do not hide this critical information. Make sure this is in the Issue system, so that the client is aware of how much work there is left in making the system production-worthy.

Fix when appropriate

When the time is ready, your client knows that there are scaling hurdles like lack of automated deployment. Make sure he knows the cost of solving the problem at the wrong time — I’ve had a project manager spend 8 months on the front-end and then give 2 weeks for a back-end that needed 3 months, causing a huge delay and I think financial penalty. Make sure that every time you solve a problem, he can tell the difference. Simplest measurement is how long it takes to respond to a change. It is easier to do when you work incrementally.

It’s simply not technical debt if it doesn’t slow down the business. Pretty code is not be-all and end-all, it’s only a means to an end.

Now that the core solution is delivered, work can be done against it to make it production worthy, iteratively and in tandem with the key solution. No more crazy separation between development, ops, testing and design: they all work on the same repository, same issue tracker system.

To keep the core lean and mean, extract generic bits where and when appropriate, hand them off to specialists to improve.

If you do not tackle these technical debts in time, you’re heading for the ruins. But if you tackle them prematurely, you’re also heading for a disaster of building an unfit platform, which is an even more painful choice to undo. This is why some developers are completely against frameworks. For example, using Kafka to achieve scalability when the data rates are under 50k messages/second — and the user is a data scientist and not a developer— this forces creation of extra layers.

Preventing retrofitting

When invested in a choice, we tend to rationalise through cognitive dissonance. I’ve faced many cases where an engineering team will over-engineer a platform and resist attempts to solve problems outside of it. It’s hard to abandon a baby you’ve nurtured for so long.

When asked to process a 10MB CSV file, you spent 1 day writing a distributed Hadoop job that fits into your framework.
Your competitor spent 30 minutes on a 20-line awk script to do the same. Don’t mawk the awk.

When the platform is over-engineered, so is the solution. And this is technical debt. They created a platform from the problem rather than from the solution. They tackled the distributed problem without a basis.

Rely on what you know rather on what you think you know.

This of course includes the knowledge that you simply “don’t know” things.

And of course, let’s face it: doing stand-ups doesn’t make you agile.

Scale

At some point your client will have enough proof for the value of this project or product. Now you begin scaling the system and its parts — put in proper engineering, robustness, consider all of the edge cases.

Refactor the hacks

Don’t be like this

Now that you’ve got a hacky awk solution with tests, you won’t have to refactor without tests and walk on a tightrope. You can be certain that your second and further iterations will be noticeably better than your first iteration. Having tests lets you change your architecture while keeping individual bits working — you obtain that much more confidence in moving with agility in your market.

Converge

As your audience expands, your system converges into perfection and scalability. At this point what was new becomes old and stable.

Opportunity opens up for new ideas, you hack them into the existing system without affecting existing users too much. When appropriate, you slice parts off into new systems, applications and services so you can continue scaling and iterating responsively. You’re keeping short-term and long-term in mind, at all times.

Technical debt fades out of existence.
You and your client designed it that way.

How I learned this

I learned this through pain in developing and delivering software in many commercial settings and also a non-commercial product.

How I continue learning this

I know there’s still a lot to learn, and also a lot to write :-). The path for me is still untrodden. I must focus on the user’s needs more than mine. There’s a lot that’s not been said in this article so I may write something later.

I seek out allies who welcome such a process or have their own already. If you enjoyed this article you may be one of them, so keep in touch and do send me message if you want to talk.

Conclusion

We learned how to be appropriate and disciplined at every stage of the process, including how to take intentional shortcuts that won’t bite as “technical debt” neither in the future nor in the present. We looked at tests, creating documentation, scaling up, not solving problems premature, and always staying on track.

Keep in touch

Thanks

  • To swardley for his feedback, which helped me refine the “Scale” section of the article.
  • To Burt Chen for his feedback, which includes adding examples and illustrations.
  • To Herdy Handoko for his feedback, which connects the sections better.
  • To all the people I’ve worked with and learned from :-).