Self-Documenting Code is Not Enough
I have heard in conversations at work and in online forums that one’s code should be self-documenting. This means that certain naming conventions and organizing code in a consistent and structured way. Another developer should be able to take a look at your code and understand what it is doing. While this is fairly reasonable at a surface level, my experience has been that following self-documenting practices is not only the bare minimum, but it is often used as a crutch to assume that one’s code is understandable when it is not the case. In this article, I would like to discuss the problems that I have run into in the wild.
Self-Documenting Code According to Wikipedia
In Wikipedia, self-documenting code is described as the following:
In computer programming, self-documenting (or self-describing) source code and user interfaces follow naming conventions and structured programming conventions that enable use of the system without prior specific knowledge
The idea here is that you follow conventions of the frameworks you are using, name functions, variables, and classes accordingly, etc. The point here is to reduce code comments and documentation.
This all seems quite reasonable until you dig a little bit deeper.
Wikipedia provides an example with a function called count_alphabetic_chars.
size_t count_alphabetic_chars(const char *text)
{
if (text == NULL)
return 0;
size_t count = 0;
while (*text != '\0')
{
if (is_alphabetic(*text))
count++;
text++;
}
return count;
}
It should be noted that there are no comments, and no indications of what the standards actually are. The approach therefore requires prior knowledge on the part of anyone working within the source code. What may be easy to follow for one person may be difficult to follow for someone else.
Self-Documenting Code Assumes A Certain Level of Experience
When you work in a code base that has new developers poking their head in here and there, it is necessary to fill in new developers in on what your standards are, where certain functionality lives within a large code base, how build processes work, etc. Typically, repositories have a readme and contributing markdown file to offer some insight into how to set up a code base for local development, and what processes to follow for making changes, branching, and pull requests. The reason for this is because unless you are familiar with the technologies and libraries already being used, you likely won’t know what to do just by looking at the code. This is especially true when you are dealing with hundreds of files with each holding hundreds to thousands of lines of code.
Non-native speakers of English and Junior Developers will struggle with just code
There are a number of words in English that are used in other languages, but have a different meaning in those other languages. For example, in English, tenant might refer to a person, but in Japanese, a tenant is more likely to be a location. Since self-documenting code tries to minimize external documentation and code comments, this means that it is a form of documentation that has a higher emphasis on brevity which implies a higher rate of misunderstandings especially among any collaborators who are not native English speakers. The same applies when you are working in a non-English language and decide to use non-English names in the code.
For junior developers, you may know what a function is doing, and what a variable might be used for, but what about flow and logic? What is the intention of this condition? Why use a for loop instead of a lambda statement? How can I do this or that? While the expectation here is that you should Google the answer, if someone has already solved the problem that you are trying to solve within the code base, you should be informed about it.
This is particularly the case when I need to determine if there is something that exists within a code base that I can reuse. Why should I reinvent the wheel when someone else on my team already did that a few months ago?
What about local setup, build, and deployment?
With regards to things like how to set up a project locally, build it and deploy it, code can be fairly limited. In a React application, you may be able to take a look at the package.json and follow online tutorials. You may even be able to search for Jenkins files, Docker files, etc. and figure things out on your own. However, it is unlikely that you will have a perfect understanding while looking at these files. Not to mention, such files usually follow a particular format, and if you are not already familiar with what to expect in such files, you may be lost. In addition to this, while such files may be able to give you an idea of how setup, build, and deployment work, they won’t give the full picture, and they definitely won’t tell you where you can view additional data such as metrics and dashboards related to builds and deployments.
What exactly are our standards?
Self-documenting code may give you an idea as to what some of the standards are with regards to code style, but it won’t give you the full picture with regards to architecture, security, performance, and other important aspects of software development. This is why a more verbose explanation is often needed. Going back to the example of React, you are given a decent amount of flexibility in how you wish to organize your files and the libraries you can use which is quite different from other frontend frameworks which enforce certain rules in terms of architecture and organization. In the case of such frameworks, code by itself is not enough given the flexibility and freedom developers are given to do whatever they want and potentially shoot themselves in the foot.
Self-Documenting is not Self-Enforcing
Talking about shooting yourself in the foot, depending on the number of people working in a codebase, it may be necessary to idiot-proof your code. Just because code is self-documenting does not mean that people can’t get buggy code or overly complex code through code review and merged in. Not everyone working in a code base has the same views on how code should be, and just because code is self-documenting does not mean that people will follow existing standards. This particularly the case when the reasoning behind such standards is explained nowhere in adequate detail.
Why are things being done this way?
Self-documenting code might explain the what, but not the why. Why organize the code this way? Why is it necessary to have four nested loops? That’s what comments are for, but if you get rid of those, good luck finding out the reasoning behind critical pieces of code without speaking to the person that wrote the code, or spending a fair amount of time to figure out the answers to your questions from the code alone.
Conclusion
A picture may be worth a thousand words, but unexplained code is worth a thousand questions. Despite risking potentially sounding like a broken record, self-documenting code does not answer enough questions in spite of what it is trying to replace. Despite there being less output in terms of documentation, there is the risk of having fewer time savings by abandoning code comments and documentation. It also places a higher burden on tribal knowledge which is not guaranteed to always exist as a resource for you to take advantage of. Whether your code is understandable can determines how quickly bugs and features can get added to an application. As people enter and leave your teams/organizations it should be realized that you are writing code that others including your future self will need to be able to understand regardless of whether you yourself are available to explain it in the future.