Towards self-documenting code

In most cases, if you rely on comments to explain code, you’re doing it wrong. I’ll show you some ways to achieve self-documenting code. No, it’s not utopic.

Luís Soares
Apr 25 · 10 min read
Photo by Artur Shamsutdinov on Unsplash

The myth that code needs comments comes from the university. The rule to make your code self-explanatory rather than using comments existed in all companies I was at and it worked out fine. So what are the problems with code comments?

Subjectivity. Comments are subject to the reader’s interpretation. Natural languages are ambiguous but code doesn’t lie.

Code never lies; comments sometimes do. Ron Jeffries

They allow bad code to exist. Comments are an excuse to make bad code with good explanations.

A comment is a failure to express yourself in code. If you fail, then write a comment; but try not to fail. Uncle Bob

They quickly get outdated. The code is the ultimate source of truth so why have another? Code changes while comments can go stale and outdated comments confuse more than they help. Most coders have code blindness so believing others will read them and even update them is wishful thinking. A comment is a lie waiting to happen.

The comments will inevitably become out of date, and untrustworthy comments are worse than no comments at all. The Pragmatic Programmer

I’ll skip the obvious suggestions like having small functions and classes, the single-responsibility principle, consistency, simplicity, decoupling, use language conventions, etc. Also, there are a few types of comments which are easy to remove: commented-out code (that’s why you have source control), obvious comments, auto-generated comments, function/class header comments (unless it’s a public API like a library or a web service). Just get rid of them.

Don’t make me think

Comments are not inherently bad, but rather the reliance on them. Whenever you feel the need to comment, ask yourself: “can it be converted to code? If not, why?”. Try to acknowledge the underlying smells rather than adding comments.

A comment is the code’s way of asking to be more clear. Kent Beck

The saying goes like “first make it work, then make it right, and, finally, make it fast.” Doing it right mostly means making it self-explanatory. If you have to explain code, you’re doing it wrong. Self-documenting code is self-explanatory and screams its intent. This is exactly for the same reason that a good UI should not rely on instructions.

Your objective should always be to eliminate instructions entirely by making everything self-explanatory, or as close to it as possible. Don’t Make Me Think

Many teachings from the book Don’t Make Me Think could also be applied to codebases

The common objection to avoid code comments is that they serve as documentation to non-trivial code. So, why not make it trivial? Don’t stick to the first implementation you came up with. Usually, the ideal solution is the simplest one. The paradox is that there’s no quick way to get it simple. Therefore, iterate multiple times. Simplicity will naturally emerge from the iterations.

Don’t comment bad code — rewrite it. The Elements of Programming Style

When you feel the need to write a comment, first try to refactor the code so that any comment becomes superfluous. Refactoring: Improving the Design of Existing Code

Focus on intent; not technical details

As a general rule, whenever you feel like writing a comment, ask if you could express it in the names of variables, functions, classes, and other programming elements. To achieve it, make the intent clear by separating the what (intent/purpose) from the how (implementation details).

Naming things

A lot could be said about naming variables but to me, the most precious rule is to focus on the purpose, not the technicalities. For example, clientList, callback, emailAddress are bad variable names because they focus on the type — a technical detail (this is known as Hungarian notation). Better names would be: eligibleClients, onCleanupComplete, notificationAddress. because they express the business intent. Regarding function signatures, clarify their purpose through the names (e.g. rather than total(map) consider calculatePrice(user)). The same applies to class names: avoid names like Mapper as they say nothing about intent.

📝 If you’re having a hard time naming a class, check if it’s not hiding any kind of smell. The class may lack cohesion — i.e. it might be doing unrelated things or doing too much. Class names likehub, manager or service are black holes of code.

Don’t mix levels of abstraction

Most of the code comments I find could be refactored into methods. Compare the following:

// parse request
… code
// delete user from server
… code
// send email to the user
… code
🆚
user = parse(request)
delete(user)
notify(user)
📝 generalization:
fun doSomething() {
what1()
what2()
what3()
}
fun what1() { how1 }
fun what2() { how2 }
fun what3() { how3 }

The first approach is full of technical details and resorts to comments to explain what is going on while the second approach relies on the code itself to do it; it separates the whats from the hows.

If you have to spend effort into looking at a fragment of code to figure out what it’s doing, then you should extract it into a function and name the function after that “what”. That way when you read it again, the purpose of the function leaps right out at you, and most of the time you won’t need to care about how the function fulfills its purpose — which is the body of the function. Martin Fowler

This is known as the principle of the single level of abstraction, which tells us to don’t mix different levels of abstraction in the same function. Delegating details to private functions creates a high-level language that is closer to an actual spoken language. You can vertically scan the code reading “do this, then do that, also that…” without having to grasp the implementation details at the same time.

In order to make sure our functions are doing “one thing,” we need to make sure that the statements within our function are all at the same level of abstraction. Clean Code

Switching between levels of abstraction makes code harder to read. While reading the code you have to mentally construct the missing abstractions by trying to find groups of statements which belong together. Single Level of Abstraction (SLA)

📝 Don’t be afraid of long method names if you need to express what it does (e.g. hackForRetroCompatibilityWithOldCustomers). Make those methods small so that they can speak for themselves.

Make your app’s intentions clear

The screaming architecture states that the app should be oriented to its business/user intents rather than technical details like framework artifacts. It recommends that your architecture has a clear set of use cases. These use cases support the software functionalities.

Just as the plans for a house or a library scream about the use cases of those buildings, so should the architecture of a software application scream about the use cases of the application. Screaming Architecture

I recommend splitting your app’s business logic by use case — this allows a glance to know what the app does thereby being a source of self-documentation.

Use cases are what the app does for you. This example is from a sample project.

The way you set up your app’s dependency injection is also a great source of documentation. It clearly tells the reader what depends on what. Here’s an example:

val app = RecordingsApp(
accessControl = AccessControl(),
recordingRepo = RecordingRepo(
database = Database.connect(
url = System.getenv("DB_URL")
),
logger = MyLogger(
level = Level.WARN,
),
),
recordingUploader = RecordingUploader(
apiBaseUrl = System.getenv("API_URL"),
),
clock = Clock.systemUTC(),
monitor = NewRelicMonitor(),
)

Custom data types

Custom types (e.g. entities, enums) are a key factor in the expressiveness of your codebase — they provide semantics and self-documentation. They capture domain concepts thereby enriching your type system— they’re a bridge between the codebase and the business domain. Besides enforcing usage (in strongly-typed languages), they communicate intent and document the functions’ inputs and outputs and their calling sites. Custom types are an abstraction of primitive types and therefore the cure to the primitive obsession antipattern.

Being abstract is something profoundly different from being vague (…) The purpose of abstraction is not to be vague but to create a new semantic level in which one can be absolutely precise. Edsger W. Dijkstra

Request/response models are DTOs — they encapsulate a cohesive set of data without logic. Requests are just an input message that documents what is required by the caller (e.g. CreateUserRequest). On the other hand, response models, capture the functions’ outcomes. For example, you could have a UsersResponse that encapsulates a pagination result (data, total results, total pages, current page).

Value objects are custom types that allow passing by value rather than by reference. They are identified by the data they carry (e.g. Point, Email, URL).

📝 In Kotlin, consider named parameters to provide meaning to callers. As a general rule, try to learn your language’s idiomatic ways of doing things.

Contextuality

When the scope (i.e. visibility) of something is higher than needed, the reader is forced to think where it might be used and may even be afraid to change the code. My recommendation is to reduce the scope of things as much as you can. This way, you’re telling the future coders: “hey, this is only relevant here”. By reducing the scope of classes and other artifacts, you’re reducing the software’s surface area. This also contributes to implicit documentation as it highlights the pieces one has to consider when looking from a bird’s eye view. Additionally, the tests won’t see implementation details and are forced to test as a client, which is a good practice. Here are some examples:

  • Define variables as close as possible to where they’re needed. If the variable is used only once, consider inlining it — unless its existence is acting as documentation itself.
  • Constants are many times abused. Reconsider their usage.
  • Make methods private by default, unless they’re part of the class public interface.
  • Rely on inner/private classes for local concerns. These are implementation details that should not be visible by everyone (e.g. (de)serializers, command handlers). This informs the reader they’re a local concern.

After making a class private, and if there’s a one-to-one relationship with its only user, consider merging them because they might be the same thing.

Each piece of design infrastructure added to a system, such as an interface, argument, function, class, or definition, adds complexity, since developers must learn about this element. In order for an element to provide a net gain against complexity, it must eliminate some complexity that would be present in the absence of the design element. A Philosophy of Software Design

📝 If you’re using the clean/hexagonal architecture, all it needs to be public (and visible to the tests) are ports (its interfaces, request/response models, and errors), the adapters, and the use cases. All the rest should be local/private.

Even if the scope of something is public, contextuality is still essential and documents by itself. This is easier to define through examples:

  • Rather than having global enums, put them inside the entities or APIs that they belong to. For example, UserStatus enum could be put inside the User entity (and just be calledStatus).
  • Rather than having global request/response models, define them as inner classes of the interface or use case that they belong to.
  • Instead of creating the “exceptions” folder, prefer defining them as inner classes of the features (use cases) they belong to.

Automated tests

Tests are the best documentation of a system acting as a user manual. They’re also live documentation and executable specifications — they document what the system can do (the implementation is the how) and, unlike code comments, they fail if they don’t match the implementation.

Tests are my starting point whenever I look at a new codebase. Even if the implementation is not great, I feel safe to refactor it if the tests are good. Depending on the level of documentation that I need, I may resort to higher-level or lower-level tests. Documentation is actually one of the goals of automated tests:

Final remarks

To sum it up, the way you document your codebase is through the code itself and that's called self-documentation. How? It depends on the abstraction level. For low-level concerns, naming things by their intent helps a lot. For mid-level, the unit tests do the job. For high-level, you can rely on the screaming architecture and acceptance tests.

You may say “it depends” or “there are exceptions” but that helps to prove my point since they’re not the general rule. Here are a few articles that cover those exceptions and further explore the topic:

CodeX

Everything connected with Tech & Code

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store