Writing Code for Humans — A Language-Agnostic Guide

Christian Paro
Jul 23, 2017 · 15 min read

…because code which people can’t read and understand is easy to break and hard to maintain.

…and because there are a lot of style guides for writing code in specific languages (which are totally worth reading if you’re using one of those languages), but surprisingly little about writing code for humans outside of a few classic books on the subject.

This piece is inspired by a tweet about code not being “self documenting” from a friend and former co-worker at DigitalOcean (Matt Layher), and a follow-on question from @CatSwetel on my +1 on that statement about whether I had a post or talk about this. Well, now I do. And you should follow these people — Matt is an extremely talented developer who regularly makes useful contributions to both DigitalOcean’s software infrastructure and Go’s open-source ecosystem, and Cat is a technically-oriented executive consultant whose insights into engineering practice are what I _wish_ “agile consulting” looked like as a whole.

Maintenance Matters

I’ve seen a lot of stats thrown around for the cost of software maintenance relative to the cost of development. In practice, most of the work I’ve done or been around has spent far more time on things like:

  • feature enhancements
  • bug fixes
  • refactoring
  • solving scaling problems

…than it has on building the first working version of a given system.

Some of what you will write is truly “disposable” code, which you write once, run once, and then throw away — but most developers spend most of their time on software which last long enough to require some form of maintenance. Big, important projects (the kind that tend to have words like “system”, “infrastructure”, “library”, or “framework” in their descriptions) will skew especially far toward the side of maintenance costs vs. initial development costs. And, while back-office business-logic code and front-end interfaces tend not to require the same kind of long-term technical maintenance, they are prone to requirements changes even after they’ve been happily running in production for months or years.

Also, it’s hard to guess just how temporary a “temporary” system will be. I’ve seen code that was meant to last forever get tossed a quarter after it was written — and have been personally responsible for a handful of quick “hacks” that were meant to serve as short-term patches while working on a proper long-term solution which never came because of changes in priorities. Sometimes a bigger problem or opportunity comes along. Sometimes the hack works well enough that there’s not much incentive left to spend 10x the resources on something that’ll solve what’s essentially an already-solved problem.

So it’s safest to assume that nothing that makes it past the day you wrote it is immune from needing someone to go in and work on it later.

Cleverness is Dangerous

Nothing here that most developers haven’t already heard — but seeing how often developers (myself included) manage to forget this when writing things, it doesn’t hurt to reiterate.

If you think you just did something very clever, be very wary about whether it’ll cause maintenance headaches down the road.

Sometimes “clever” code is actually helpful, or even necessary. There are a handful of cases where that extra bit of performance or flexibility is actually useful for practical purposes. Usually, though, it’s more of a liability than an asset.

Even when that “clever” thing is done with good reasons for doing it, it should be:

  • In an abstraction that makes it so no one has to worry about how it works unless working directly on that specific “clever” part of the code. (For example, you might have a neat logging or tracing trick which uses some ugly reflection hacks to help gain some visibility into the code — but nothing making use of it should depend on how it works and nothing other than the trace output should depend on it working at all.)
  • Heavily commented with explanations of both how it works and why this approach was chosen over something simpler/safer/more-obvious.

If you can’t explain how the “clever thing” works in simple terms, you probably don’t understand it that well yourself. If you can’t explain why it’s necessary, it probably isn’t. If you can do both of these things, the nice documentation you attached to your clever new code can turn that code into a learning tool for other developers (and a concrete “notebook entry” for your future self) rather than a liability for the team.

Code is Not (Always) Self-Documenting

Sorry. It’s just not.

I first heard the claim of “self documenting code” in reference to the Linux kernel. I’ve dabbled in kernel hacking for fun and understanding, and did some more-substantial work with QEMU and LibVirt as part of a port to the s390x architecture while at IBM. And I honestly believe that the communities working on these projects would be larger, more effective, and perceived as more welcoming to new contributors if the commenting and documentation for these projects wasn’t so severely bare-bones.

The key here is that naming standards, formatting standards, and rigorous code review can do a lot to make sure there isn’t any line or block of code that someone reasonably familiar with the language can’t “parse” like a compiler and reason though — but that doesn’t solve for the problem of said person having to grep their way around the codebase and construct their own maps and notes when they want to see how it all fits together or why a thing does what it does.

In toy examples like people make in most academic programs that purport to teach you how to develop software (whether at a university, a 12-week bootcamp, or something in-between), this isn’t a very big deal. There’s not that much code to sort through, and many developers don’t even need to write down notes on their exploration while developing a mental map of the codebase.

In large projects (like operating-system kernels, databases, systems-management stacks, and a surprising number of business-back-office systems once all the actual rules of the preexisting human-based system have been discovered and incorporated), this exploration starts looking more like a dungeon crawl than the search for your hotel room after check-in. So, unless the point is to challenge people with rediscovering things that someone else already knows, it’s only sensible to hand your intrepid new explorer a decent map and some guidance on what the various artifacts and incantations in this strange place actually do.

…But Names and Structure Go a Long Way

While “clean code” doesn’t pass as a complete solution for “good documentation”, it does have a place in the overall picture.

If you need to put a comment on a line of code to say what it does, then that may be a sign that better naming or clearer structure would be helpful. For instance, there’s a reason for why in many languages you can express something like this:

maskedValue = inputValue & (1 << 3 | 1 << 1)
multipliedValue = inputValue << 3

…or like this:

maskedValue = inputValue & 10
multipliedValue = inputValue * 8

In the first set, we’re helpfully showing that the mask has bits 4 and 3 set without making the user work that out backwards from a decimal representation of the mask — but we’re multiplying in a way which isn’t going to make as much sense to most users as the way shown in the second set of examples. In many cases, the compiler/interpreter will render these two examples to the same underlying behavior, so the difference isn’t for the computer. It’s for expressing intention and framing something in the way that’s most intuitive to the user.

On that front, wouldn’t it be even nicer to have something like this?:


toCreate = 1 << 0
toDelete = 1 << 1
toRename = 1 << 2
toShare = 1 << 3
toCopy = 1 << 4
mask(originalFlags, maskFlags) = {
originalFlags & maskFlags
}
canCopyOrShare(permissions) = {
mask(permissions, toShare | toCopy) != 0
}
// ---- And, elsewhere... ---- //threadsPerCore = 8 // TODO: Read from environment variable.workerThreads = cpuCores * threadsPerCore

The above code isn’t written in any real language (though it is a little C-like and a little Scala-like). But using meaningful names and a little structure, along with language syntax which hints at the actual intended semantics, helps to make the purpose and behavior of the two abstract “mask” and “multiply” actions clear.

Explain Context and Purpose

There’s an old saying where “comments are for saying why a thing is done, code is for saying how it is done”. There’s also been plenty written about this before.

Most of your comments should be explanations of why something is done:

  • Why you implemented something in a non-obvious way.
  • Why you implemented something in a particular way, instead of some other plausible alternative.
  • Why you implemented something at all.

The first of these explanations will likely save hours (or weeks) of people trying to figure out what you were thinking when you did something — because there, right next to where you did it, you told them exactly what you were thinking at the time. It can also prevent people from “fixing” something that might look off in the immediate context but is actually correct in some larger context. Ideally you don’t have too many cases of the latter — but like most things, working software implementations are rarely free of structural flaws.

The explanation of choices between alternatives can help quash future bike-shedding arguments about whether X should be replaced with Y. Or, it may be a declaration that you actually did just throw something in there without taking the time to explore all the alternatives — making an explicit acknowledgement of the fact that there may be a better way which just hasn’t been considered yet.

And the statement of why something was made in the first place helps to draw a path from low-level implementation details up into the architecture and/or out to actual stakeholders for whom the project exists. Not all developers care as much about the problem domain as they do about the solution domain — but others (among which I count myself) have a hard time making decisions about things without first knowing what the project is meant to accomplish, for whom it is being done, and how anything already present in the system fits (or fails to fit) into that big picture.

Beyond the “why” comments, I’m a big fan of high-level documentation which outlines:

  • What parts a system is made of
  • How they fit together
  • Why, at a high-level, they exist
  • How, at a high-level, they work

This documentation is what I think of as the “map” to your software. It provides a tool which allows new developers to navigate a system they haven’t had much time to become familiar with. It also gives long-time maintainers an organizational framework for their knowledge which can help them to manage problems and solutions of much greater scale and complexity than they could effectively work on with just a “mental map” of the system.

Often, this documentation will be a mix of generated docs (of the sort which gained popularity with Java and JavaDoc) and prose-and-diagram-based architectural specs or overviews. I like the idea of everyone working on a project having a shared picture of the system they are working on (ideally one they could sketch on a whiteboard without much structural variation across what each person would draw), and having an at-a-glance index into everything already present and available in the system.

Highlight The Tricky Bits

TODO Tags:

One of the examples above uses one of my favorite comments, the TODO tag:

threadsPerCore = 8 // TODO: Read from environment variable.

In this case, we gave our threadsPerCore a meaningful name instead of leaving it as a “magic number”. But we haven’t gotten around to making it configurable, though it is the sort of thing that probably should be at some point. So the TODO tag gives us something we can search for, make listings of, or get a count of as an idea of what “nice-to-have” items we would like to take care of at some point in the future.

If you’re thinking “hey, shouldn’t that be a ticket in a tracker instead of a comment buried in the code?”, I can get on board with that. But who said it was either/or?:

threadsPerCore = 8 // TODO #1936: Read from environment variable.

Now you have something that can point you to the issue you opened in the tracker, and something you can search for using the issue number when you get around to actually working on that issue. Whether someone sees the comment while working on something else, and decides to knock off the issue in a bit of slack time, or someone picks the ticket off the queue, neither is going to need to waste time looking for the other half of that equation.

FIXME Tags:

These are another tag which has become a popular convention. Like a TODO tag, they indicate that more work could be used on something. Unlike a TODO tag, they indicate that the work is needed not just as an improvement, but as a way of remedying something which is known to be broken.

There are some people who don’t think you should ever have these in a codebase. I am not one of those people. Instead, I see them as a useful tool for when:

  • You are working on something which is not even pretending to be production-ready, and need a quick way to mark stuff that you know isn’t quite right yet.
  • You have some edge case that isn’t implemented yet, or where the behavior yielded is incorrect or undesired, but it’s not common or critical enough to stop progress on the project or shipment of a new feature until it is fixed. Sometimes shipping a 90% correct feature and coming back to fix that last 10% is better than not shipping one at all. There’s a judgement call to be made there, but one thing you definitely should not to is ship a 90% correct feature and then forget about the 10% that’s broken because you never wrote it down anywhere where you’d see it again.
  • The FIXME isn’t a behavioral bug, but is some piece of high-interest technical debt which should be addressed sooner rather than later because its presence is a noticeable drag on further progress on the product.

DRAGON Tags:

Like the TODO and the FIXME, the DRAGON tag marks some bit of ugliness in your code which may need to be addressed in the future. Unlike them, it’s something where the relevance of addressing that problem is conditional on some other change or condition.

This tag is something I started using at DigitalOcean after realizing that I had NOTE tags and similar on a mixture of things which were good-to-know but not hazardous to be ignorant of and things which could bite you hard if you weren’t aware of them. So, for the sake of having something as highlightable, searchable, and countable as the TODO and FIXME tags, DRAGON was born with the implication of “here be dragons” and “be wary of waking the dragon”.

There’s an old story about how a missile was programmed with known memory leaks which would crash the guidance system if the program ran for long enough. The developers responsible were okay with this, and consciously left the leaks in place because the rate of leakage was calculable, there was headroom to spare in the hardware for the leaked memory, and the missile’s range limitations meant that it was going to hit its target or run out of fuel — in either case self-destructing before the leak could cause a crash.

That would be an excellent (if extreme) example of where a DRAGON tag would be useful. There’s a “problem” which is known to not be a practical issue because of something the developers know about the context within which the program is used. It is, however, based on an assumption (about the limitations of the missile’s range) which may fail to hold true in the future. Imagine the same software being reused in a later generation of the missile which has a longer range. It would be a very helpful thing for the maintainers of this software to have a way to search for any cases of “stuff that works they way it should under the conditions of its current usage, but it brittle in a way that could make a seemingly-innocuous design change trigger an ugly failure mode”.

Like TODO and FIXME tags, DRAGON tags probably shouldn’t proliferate too much without someone taking the time to eliminate some of these cases altogether — a few such issues can be managed, but once there are enough that it’s hard to do a quick scan of them they can start to look like “noise” and be ignored or overlooked with nasty consequences.

That said, I’d much rather have a sign warning me about something “dangerous” in the code than an engineering culture which discourages people from marking their ugly hacks by picking on the markings rather than recognizing them as part of an overall QA system (which also includes testing, code review, visibility tools, and good tooling for gradual deploys and fast rollbacks) which can help to reduce the overall number and severity of production incidents.

Test Your Documentation

A funny thing about documentation is that it has a way of drifting apart from the reality of the system being documented. The further it is from the system, the worse this is. For example, comments on or adjacent to a line of code are usually updated when the thing they apply to is changed in a way which would invalidate them. Function-level comments are more easily missed. Class or file-level commends more easily still. External specs and overviews rarely remain accurate to the system they describe unless active measures are taken to keep them in sync (or code review disallows any changes which would violate the spec).

Documentation can also, like the code being documented, have bugs in it. Some are trivial, like spelling and grammatical errors which don’t change the meaning of anything but do look unprofessional or distract from the intended meaning. Some are serious, like a misstatement of the behavior of a component which can lead a developer to use it in a way which will create a defect in the software because the documentation has mislead them.

But, like code, documentation can be both reviewed and tested.

Reviewing Documentation

When doing code reviews, documentation should be part of the review.

  • Does documentation exist where needed for clarity?
  • Is the “why” documentation accurate to the actual problem space and findings in comparing alternative possible approaches?
  • Does any “what” or “how” documentation match what the code actually does? If there is a summary of what some unit of code does, is the summary one which follows from the code’s actual behavior?
  • Do the high-level “map” documents actually point to the right places? Or do they have misspelled names, broken links, or references to the wrong thing (as can happen in copy/paste errors or as the occasional result of some mental mix-up).
  • Is there anything in the documentation which is confusing, ambiguous, or in need of further detail or better structure in order to clearly express the author’s intent?

Testing Prescriptive Documentation

Prescriptive documentation (like “how-to” documents) is easy to test. Essentially, it’s the same thing as a test script as would be used for manual end-to-end testing, and it can be tested by handing it to someone who had little to do with making either the document or the system being documented to see if they can follow it and reach both the intended results and the intended understanding of what they did.

If they are confused, or ran into a dead-end because something in the doc didn’t work, they’ve found a bug in the document, the system or both.

Incidentally, passing such a document to a test-automation developer and having them use it (rather than the “white-boxed” code) as the basis for automating an integration or end-to-end test can similarly test both the doc and the system.

Testing Descriptive Documentation

Documents for things like interface contracts are a little less obvious to test, but I still feel they are testable by having these documents used as the basis for test automation rather than (or in addition to) writing unit tests with full knowledge of and access to the code being tested.

Different shops have different ideas on testing, but my thought here is that this is a good place to split up unit testing by having the feature developers write unit tests based on what they know about the code, and by having separate testers (who may or may not be dedicated QA engineers) write a battery of tests against the system as it is documented. The first set of tests can go for the areas the developers know are thorny from having done that development, while the second is probably a better simulation of actual usage given that most code is “used” most often by people who didn’t originally write it, are just calling it from elsewhere, and who read the library docs to understand how it works rather than reading the code itself.

Summary

For all the fights that break out over brace placement and how indentation is done, most of the human-friendliness of code comes from structure, naming, and documentation.

  • Give things meaningful names
  • Put related things together, keep unrelated things separate
  • Keep the code simple wherever possible
  • Explain the code in friendly terms and why it was not done in the “obvious” way when something “clever” is necessary
  • Use comments to give context to why a thing exists and why it behaves the way it does
  • Use high-level documents as a map through which new maintainers can easily find a good starting point for troubleshooting and issue or introducing a new feature
  • Mark the scary and thorny bits of the codebase as such, both to encourage remediation and as a guard against people making careless modifications in areas of the codebase which may contain non-obvious hazards
  • Reviewing and testing documentation can be as important as reviewing and testing your code in a long-lived project; bad documentation can cause at least as many problems as bad code

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade