How To Save Your Organisation With Documentation

Documents I write to improve decision-making in software development

Axel Scheele
NAVARA
Published in
5 min readSep 12, 2023

--

Introduction

A while back I worked at a company in a senior tech position. Before I even started, I knew things would be difficult. I had been told that the change to failure rate was very high. What I walked into, though, was really something I had never seen before.

From day one I got sucked into a crisis, doing hot fixes on the software and manual updates on production databases. Operations ran about 18 hours a day and IT was critical to do anything. Then the last 6 hours of the day were used to synchronise systems with partners so operations could resume the next day.

At any moment things would break due to changes and operational load. Everyone just did what was important in that moment. New features were never delivered on time and things seemed to take forever. It was panic, it was chaos, it was the way things worked.

Post Mortem

In a situation like that you can choose to do two things. One you look for something else, or two you improve the situation. And as I was allowed a little bit of influence I chose the second option. The first thing I did after a major incident was introduce post-mortem meetings.

Post mortem is a term out of Google SRE. It is a meeting to discuss incidents in a structured way and to come up with lessons and improvements that can be implemented. A key part for me is that an incident report is made. It can be used to refer to the need to work on certain things and it is a way to keep lessons for the future.

Apart from having a cathartic effect on the people involved, post-mortem meetings also showed us what we wanted to improve. One of the things we wanted to do was to create test plans.

Test Plan

A test plan in its crudest form is nothing more than a checklist with things we want to be sure about before a release. For example, one check would be when releasing a new version of a mobile app, to be sure that it doesn’t crash when running on an actual mobile device. Apart from any functional criteria that we could think of a considerable value would be in our beta testers. Doing follow-up with them all is something on those lists too.

This seems all very obvious, but by keeping track of these things we were sure they happened. Furthermore, it offers a way to communicate very clearly with a manager about release planning. The overall effect is peace of mind. A value that is shared in another documentation format that I call protocols.

Protocols

In the introduction of this article, I mentioned making changes in production databases. This is something I rather not do. I one time deactivated all operations by running an incorrect update. However, the world is not perfect and badly written legacy code forces us to do things we rather not do. In fact, sometimes, it is part of the daily routine.

Things get really bad though when you have to do these things late at night, alone. The first time I had to do this I asked one of the veterans in the company to help me write out the steps of the work I had to do. Including rollbacks and verification.

So our protocols were born. For every manual action, we created one. It gave us certainty that we would cause less trouble and at the same time, they showed the holes in the applications. Things we needed to do and to fix became clear at this point, but to make plans we needed to discuss solutions.

Architectural Designs & Decision Records

Solutions in IT are often very complex and abstract. Furthermore, many decisions are taken based on the context available at that time. Some people can do all this in their heads, especially when they are alone on a project. However, when you have to discuss solutions together changes are that you misunderstand each other. And if you have to remember decisions taken a long time ago it becomes hopeless.

Architectural designs are key in discussing solutions, looking at complex cases and coming to a common conclusion. I personally have a preference for C4 and sequence diagrams. C4 is great for discussing high-over solutions with people and sequence diagrams show the intricate interaction between systems and where things get tricky. It doesn’t really matter, as long as you can understand each other.

However, choices in software are often made with more than just the designs. Often there is a context to take into consideration and that is where decision records shine. Though given that you will not immediately see a result from decision records, your future self and future colleagues will be very thankful. Having a reference to why solutions were chosen at a time is great input for refactors and new features.

I discussed documents to assist in making technical decisions here. Something that is straightforward if the requirements are clear. New features, though, seldomly have clear requirements.

Feature Scopes

New features were often a source of disappointment. Somewhere in an email sent a few weeks ago a very important request about the feature had been made. This email was overlooked and now the feature had to be redone. An email chain does not really work as a requirements list.

Therefore we introduced a new concept, to collaborate better on business requirements and the new software that we had to build. I call this document the feature scope. What we did first was create a rough description of what was needed. Then we added a few acceptance criteria and some scenarios that would probably describe how it would function. Then I discussed this document with the people who requested the feature. Once everything was happy we would freeze this scope and build it.

Of course, the first version was never what was wanted. But it was what was agreed upon and it was finished. Discussing changes was much more constructive and the document allowed us to agree on smaller increments and therefore quick feedback. A much more pleasant way of working for everyone and a much better result.

Conclusion

Did things go better at the company? The applications were still pretty horrible, but we had a bit more grip. Furthermore, collaboration and self-reliance improved among the engineers. The business was more involved.

Still, there was the elephant in the room, the software landscape that needed a proper refactor/ rebuild. I think, and that was also the goal of my efforts, that the documentation made clear what was needed. It could be discussed. A priority could be given based on evidence and rational choices could be made based on facts. Documentation in itself is not a solution for problems, but it helps in solving them.

For almost every step of the software development cycle, a form of documentation is proposed that could improve your work. Maybe you will not need all of them, but perhaps there is one that you would like to try.

References

  1. Google SRE
  2. Post mortem template
  3. C4 diagrams
  4. Sequence diagrams
  5. Decision records

--

--