Build maintainable software in a human friendly way
I have been a software engineer in Amazon for 5 years (To those that know, I am a yellow badger). I have been in Amazon retail, Kindle, and AWS. These three departments have drastically different business objectives, they build entirely different software using different technology stack, yet throughout this three teams, there is always one single topic that I would always hear from people: “XXX (some engineer) write such shi**y code and I could not make sense of it!!!”
I have said this sentence at least 101 times in my career. I am very sure that other people have said more than that. I personally have been drowning myself in code with 12 different layers of nested for and in loops, spams of try-catch blocks that penetrates at least 5 stacks of method invocation, variable names that doesn’t match what the method name is about, a database that our service relies on but have zero document about its schema. Those are not fun to say the least.
I started thinking to myself, why is software such a hard problem? Why can’t we build software that makes a software manager like myself doesn’t have to spend awkward moments trying to butter up our team’s code base are not that bad and it is maintainable? You may argue that software are hard, yes some of the software problems are hard like “is NP = P?” or the byzantine generals problem, but I am willing to dare that 90% of code you see in a industry production software are not hard as we have built cloud services like AWS to do a huge amount of heavy lifting.
I think one of the problems is that engineering schools, curriculum, even the on-boarding materials in companies has a deeper focus about tools, like how to write java code, how to connect to RDB, how to design highly available architecture, etc. Rarely, I would see any of this programs focuses on the HUMAN, the user of tool.
Why don’t we talk more about human, the person that are actually going to interact with the software? Humans are going to be in charged of writing code, reviewing code, deploying code to production, testing code, monitoring code and extending code. How can you let humans maintain a code base that is entirely built to be anti-human?
What a piece of work is a man! how noble in reason! how infinite in faculty! in form and moving how express and admirable! in action how like an angel! in apprehension how like a god! the beauty of the world! the paragon of animals
— Hamlet (2.2.295–302), Hamlet to Rosencrantz and Guildenstern
Although humans are amazing in the eyes of Shakespeare, there are some facts that I believe engineer should remember when developing software.
Fact 1. Human forgets
What did I have for dinner at exactly 57 days ago from today? No humans could ever respond to the question accurately unless you have access to a calendar with all your events documented. Before I talk about writing technical documents, writing comments in your complicated logic, etc. I would love to extend this concepts further more.
Humans are good at remembering high level concepts and structures once they have mastered it but they are not good at remembering details. A sample using history facts is that It should be relatively easier to remember that “In the industry era, America has some monopolies that controls important economic sectors” then to remember the details like “In 1890, standard oil company controls 80% of the oil in America”.
We could build software that leverage this characteristics so that human can use the software effectively. You could build concepts around your software and let the code comments to detailed all the samples. For example, I once built a real time request-response service that serves ~50 APIs. They all follow a pattern like the following in the java code: the API handler method signature is always “public XResponse XHandler(XRequest)” where X is the API name, all the data access object (DAO) located in a module called “dao” are injected into the class via Google Guice. You might not notice, but if you follow this two rules across your entire 50 APIs, this concept is a document that could guide your engineer where to find code. And they could remember this very easily.
As for the details, you would need to be more vigilant in the code you are writing. Always ask yourself this question, “can you remember this thing after 30 days?” I personally find these types of data are especially easy to forget:
- Facts that could go multiple ways. One example is a “TTL”. Quick question, if you just took over a relational database that has a column called “TTL”, it should be pretty common to know that this should be a UNIX timestamp of when the data is supposed to be removed. But now the question is, “Is the unit a millisecond or a second?” You go to the code, and you see the variable name is just “TTL” with no idea what unit it is in. So you think really hard, assume it is “second”, deployed to the production system and start writing new entry and updating existing entry with the TTL in second. After some time, you got the entire database wiped clean because it is actually in millisecond. So please put unit in its name if this variable is a number that has meanings and leave no room for mistakes.
- Proprietary business logic. Imagine you operate a highly specialized field like “I need to filter my search result based on X machine learning model, Y field, Z external API, etc”. Unless you look at them every day, because this is not common sense, it is very likely to forget.
So if you need to document, know that not all contents are equal. You need to make sure you document efficiently. If your software follows pattern, then you only need to document exceptions to the pattern. If your software have information that are prone to be forgotten, write them down somewhere that people know where to search. (Another pattern)
Fact 2. Human makes mistakes
Humans make mistakes, whether they are under pressure or not. There are number of blunders in the financial industry in the past. My personal favorites are fat finger errors. Writing bugs is the most common mistakes software engineer would make, and engineer would also make operational mistakes like accidentally dropping a production table. Although some mistakes like the discovery of penicillin change the course of human history, most mistakes are not that impressive to make.
Software engineers operate critical software. Perhaps not as important as a nuclear power plant’s safety inspector job. But it is important enough for some companies’ business that they will page an oncall engineer even if it is 2 AM. There are ways to prevent humans making mistake, these method should be employed and they should be reviewed frequently as they could be expensive and harmful to efficiency.
- Two-person your software. The two person rule are adopted from the military when launching missile to make sure the correct target are set. Always have a 2nd person reviewing the code for bug, watching over which button a person is going to press on a production system, and other critical operations.
- Limit human interaction. Human makes mistake, sure, so don’t let them touch anything. So if you have a manual software releasing mechism where an engineer need to SCP a java zip file onto the service host and run a script with 100 parameters to start a service, automate it with a automated CI/CD pipeline. Humans are SSH’ing into the host all the time for unknown reason? Use least privilege principal to limit them of access to actual needs. You need to do a DNS gradual cut over that could roll back immediately when your alarm is fired? Write a program to do it, a program is always more accurate and faster than a human.
- Let machine do the boring job. If right now I give you a zip file with 2000 files and I want you to change compression format from GZIP to WinRAR for a super important project for your CTO. You could do it by hand, sure, but can you confidently tell your CTO that you have done every single files without any mistakes? Are you sure you did not skip any file just by hitting the scroll down button for an extra half a second? If you can’t be confident, just write a program to do it. It would be a easy for-loop. And know one thing, when you are getting bored of the tedious work, the machine would happily take it without any complaint and it would do it 100% correct as long as you implement it right.
Humans will make mistakes, so don’t design your software that requires human to perform tedious, long task which are for machine. Do not allow room for human make mistake by limiting their access.
Fact 3. Human cannot process large amount of information at once
Do you know why your phone number is written in the format of 111–111–1111 instead of 11111111111 in most places? (If you read carefully, I actually pressed an extra 1 during my first draft and I am keeping that deliberately. I am pretty sure most readers would miss it when reading. If you did not, kudos to you!) Because the separation makes each task smaller chunks, and you could comprehend the number easier. This example also shows that human comprehension prefer “More smaller tasks” rather than “Few Larger tasks”. Somehow human brain capacity seems to favor horizontally scalable model.
As a software engineer, the software you design and implement should not be cluttered with information. The scope of “information” extends to code, logs, operational dashboards, metrics.
- Watch your code cyclomatic complexity. I would recommend you put cyclomatic complexity in your checkstyle rule set and actually fail the code release process if such complexity become high. There are a abundant ways to lower the complexity that I would not list every one of them here. But some simple concepts that works great are using sub-methods to break down large method, use dependency injection to break down complex object into smaller, easy-to-understand objects, etc.
- Frequently review your logs and dashboard. Even better, get other people that never reviewed your ststem come and review it. Is your dashboard conveying important information about system health and the business health or just useless noise? If there are a critical software issue right now, can your logs and dashboard provide enough information or too much noise to determine what component is at fault?
Don’t provide artifacts that are too heavy for humans to consume can go a long way in making your software easy to maintain. And of course, you cannot rely on human to do the job. As again they will make mistakes or simply emotional and don’t want to do it. You would need automation, tools and process to enhance the human inefficiency.
Summary
I hope that this post gives you a clean perspective to software engineering from the human side of things. I truly believe that software should be built to be friendly for humans to maintain, extend or even deprecating. And that would require engineer have a good insight onto human behaviors and how to make humans faculty shine brighter and limit the repercussions of their weakness.