Signs of code readability: Naming

Artsiom Miksiuk
11 min readNov 11, 2022

--

Everybody always have their own “unique” vision of why something is more readable than <write your version here>. I’d like to do some loud reflection on how variable’s naming affecting code readability and why explicit full naming isn’t necessarily making code better.

Name variables whatever you like

1: You can’t have a variable named x.
2: Did you see the code?
1: Yes, it’s not clear and less readable when variable is called x.

Before proceeding, try to read and assume what this code is doing.

Example 1. Pseudocode which looks like JavaScript. Tokenized version.

The concern which we are dealing with is that variable x, which been used in filter and map calls, is decreasing readability of the code.

Did you get what this code is doing? It takes some entity representing transitions, filters out transitions which have empty previous state, and then returns new object with swapped prev and next fields. The output variable name verifies our guess that the result of a whole operation is inversion of some transition object.

Reader: But you’re cheating! You’re using named root level variables, not some x-es!
Me: Keep this indignation a bit. I’ll return to this later.

Have a look into named variables version of the same code.

Example 2. Pseudocode. Named version.

Code is the same by meaning. Only the argument x was renamed to transition.
How did rename change the code? It made it noisy. I like the way Kelvin Henley touches the readability topic in his talk. In particular he uses analogue of signal-to-noise ratio (SNR) to qualitatively measure the difference.

Note: SNR analogies is used in a natural language readability studies. As an example, in the “Pushing on Text Readability Assessment: A Transformer Meets Handcrafted Linguistic Features” study the closest analogy is called a semantic noise ratio.

Kelvin didn’t do the actual math with it, but we will. We will try to measure quantitively the difference between SNR’s of a named and tokenized examples. It would be a relatively naïve approach, a bit speculative, but illustrative.

Following the definition of SNR we have to define our signal and noise values. By signal we would take the meaning of the code — transitions’ inversion. Quantitively it’s always equals to one. Therefore, the signal Ps = 1 for both versions of the code.

Note: We could try to decompose these meanings, and count them by lines, like variable assignment, filtering, mapping, object construction, but! In the final math, eventually, as both versions of the code constructed from the same meanings, they all will be reduced to 1.

As for the noise value, we might take symbols number, including the newlines which might cause line wrapping. For the tokenized version Pn = 150 and for the named version version Pn = 195.

Note: The same case as with the Ps. Choosing other metrics for Pn would result in the same relative reduction in the final calculations.

It seems like there is not much of a difference, is there? Let’s calculate a delta.

Note: Yeah, the equation might be simplified to the division of codes’ lengths, but I’ve decided to keep things more aligned with the original SNR idea.
In the end, it looks more impressive when you have some formulas on the page!

I did a simple page for a SNR calculation, so you can check out your samples if you like to. https://questionandanswer.github.io/snr.html

Still feels like it’s not much. Just to feel and remind what this value is let’s see some comparison examples:

  • Color intensity changed by 20%. Depending on the display you are looking into this might be less or more obvious. Both pictures are a black color, but in the left one it is a 100% and in the right one it is 80%.
#000000 vs #333333
  • What about a 10% raise to your current salary? Would this be noticeable? $1000 vs $1100? Not much? And how about 20%? $5000 vs $6000? Looks more reasonable?
  • V8 added sparkplug which brought only 5–15% improvement for the runtime. Reasonable for the engine and performance, why not for a code?

Consider another well known code snippet which is less frequently a subject for naming debates.

Example 3.
Example 4.

By meaning and intention this is the same code, but the majority of programmers prefer the first version of the code. Of course while the loop itself is simple. A 70% change is now noticeable immediately.

But why does it matter? It breaks code’s structures recognition for our eyes and brains. We have a syntax highlight specifically for that, it just integrates an additional structure layer for our brains via color encoding of syntactic elements and structures of a language. Longer names increase eyes’ parsing time of a code and blend different codes’ structural parts with each other. This creates unnecessary cognitive load while trying to extract meaningful parts.

Increased noise levels increases cognitive load and breaks the reading process.

But let’s leave this statement for now.

How the variables naming works?

In a named versions of the code, along with the increased number of symbols we have also additional semantics which names have brought. Under semantics I mean lexical semantics. Shortly speaking, it’s a meaning which every word has.
This additional semantics, is it necessary good? Not always. A wrongly chosen name only confuses and increases the levels of mental noise along with symbols noise.
As an example, rename transition to option in a map call, and try to answer why. Some might come up with an explanation, some not. An inappropriate semantical chose of a name, of course not frequent, but not so rare either. Usually this might happen in refactoring, especially if you work with dynamically typed languages, where static analysis is limited and refactoring tools can’t track all the whole invocation chain for rename. Or this might be caused by poor domain understanding or general low discipline, while writing a code. This example yet shouldn’t be considered as some serious argument.

But, let’s consider this. Renaming transition to, let’s say, item breaks the semantical bridge between an array of transitions and each element. In domain language an element of transitions’ array would be called transition; In computer science (CS) element of generic array is element or item. The first variant is a domain semantic, the second is CS. Neither is good or bad, but it’s a chose. The same will be with renaming to x. This variant would be closer to a math definitions of a map function f(x).

1: So, let’s then name all our variables x, y, a, b, tt, r2, d2, 3po. Why not? It’s so convenient as you have said.
2: Have I?
1: Well, following your logic it seems like so.

While we are naming things in code, we are naming concepts. We can name all of them as x-s and y-s, and it would work, but why don’t we do it then?
The first, I’m not proposing using math semantics everywhere.
The second, did you hear about contexts? Maybe from DDD about bounded contexts? Or maybe you have heard about closures in JavaScript?

Oxford dictionary:

context [noun]

1. The situation in which something happens and that helps you to understand it
2. The words that come just before and after a word, phrase or statement and help you to understand its meaning

In lexical semantics, word’s meaning defined not only by the word alone, but also by the context in which it’s being used. You might have a look also at context principle and principle of compositionality. Compare these two sentences:

  1. There are transitions. Transition is an object with fields from and to. Transition can be inverted by swapping from and to values.
  2. There are transitions. Transition is an object with fields from and to. It can be inverted by swapping from and to values.

Both sentences is a reflection of a transitions inversion snippet from the top of the article. The first sentence corresponds to the named version , the second to the tokenized version. Can you spot what is x in the second version? It.
We are using contextual dependent referencing all the time. More over, in a natural language repetition of the same word, which referring to the same thing, in a series of sentences is considered as a bad writing or style.

Long story short — I’m not saying we don’t need explicit names, I’m saying

Variable’s name should be included into the context it being used in and shouldn’t always include the context itself.

Functions are rare subject for non explicit naming, as they usually already included in a broad domain context, therefore there is no way to reduce their name and keep clear meaning. But as variables exists inside those functions, which can be decomposed and their context reduced, we have some naming options here. Maybe let’s use these options?

Applicability note: There is a difference in applicability of such view between strongly typed and dynamically typed languages. In strongly typed languages type communicates a lot of context information and parameter’s name value is noticeably reduced. In dynamic languages without any types annotations, such names shortening is not always, but yet applicable, as a name in here is the only way to communicate a context and correctly represented concepts.

Readability

Let’s now check out what in natural English readability is.

Oxford dictionary:

readability [noun]

1. the fact of being easy, interesting and enjoyable to read
2. [synonym legibility] the fact of being clear and easy to read.

While second meaning is somehow clear, unfortunately, it is still not clear how to interpret “easy to read” phrase. Let’s check the read definition.

Oxford dictionary:

read [verb]

1. to look at and understand the meaning of written or printed words or symbols
2. to go through written or printed words, etc. in silence or speaking them to other people
3. to discover or find out about somebody/something by reading
4. read somebody’s mind/thoughts to guess what somebody else is thinking

6. to understand something in a particular way

We can assume that in the first definition the ease of understanding or comprehension is meant, without an absolute confidence though. In the second definition, the ease of perception and reading possibility itself, like text is recognizable, visible, large enough and etc. most probably meant. Working in a standard conditions, in the office or at home, the second condition is usually met — it is visible, large and clear — we, therefore, more interested in the first definition of readability, rather than the second.

Speaking about code readability, most of the time we mean how difficult it would be to understand it. An ability to understand any code depends not only from how it is written, but also who’s reading it. As code is a collection of different ideas it’s complexity is driven by complexity of those ideas in the first place. Full naming is important but it’s has tiny difference when ideas are not trivial.

Code is not readable by design even if we following the same or standard formatting rules. I like the math analogies to express some non math ideas, while I’m not that strong in math. I would try to express this one using it, as I think it would represent the idea clearer.
If R is readability function of a code, f is a formatting rules with which this code is written and p is a programmer who reads this code, then R might be defined as R(f, p).
The usual misconception, or oversimplification, for the debates is that we are dealing only with R(f), I guess, assuming p as a constant. Remembering all the statements about diverse teams which can be heard from company to company, with different seniority levels in them, such assumptions can’t be correct.

Readability does not mean auto-comprehension. You still need to figure out what code is doing.

In “Comprehension of computer code relies primarily on domain-general executive brain regions” research from MIT, code comprehension and brain activity zones under MRI were studied. They were showing Python (textual) and ScratchJr (graphical) languages along with sentence problems to a group of participants.
In general they found a strong correlation between multiple demand (MD) system, which is responsible for math, logic and problem solving, with the code comprehension. Language system, responsible for linguistic processing, also been activated but not so intense and inconsistent for the code problems.
In the section “Language system responses during code comprehension are weak and inconsistent”, they found that for the Python code use of semantically transparent or meaningful identifiers makes no difference on language system activation. The careful assumption that researchers gave is

“One possible explanation is that participants do not deeply engage with the words’ meanings in these problems because these meanings are irrelevant to finding the correct solution.”

While this is not a direct proof that naming is completely irrelevant, but that it is probably not playing any significant role in a code comprehension. Maybe code comprehension is less tight to an actual names, rather than some specific brain’s general cognition functions and conceptual thinking.

There is a difference of conceptual naming and code variable’s naming.

You can name variables however you like, as long as you can communicate concept clear — verbally and ideally by code.

But sometimes you will need to bring other to the same level of conceptual understanding. And the latest is more of a social science problems rather then purely engineering or programming one. Shortly — you can’t solve everything with standards or especially code. It’s just an illusion.

We are subjective, but this is not an excuse

Look at the following two snippets.

QSort SNR delta

For me this is an example when expressive naming is not making any positive difference. It’s not making code more comprehensive or easier to work with, but it looks more “impressive”, as an enterprise version of foo bar; which has a lot of noise, hello SNR metric; which brought additional semantics, not always meaningful, not less confusing though; which kept comprehension level the same, but potentially increased cognitive load while trying to extract meanings.

Standards are not an evil, I’m not against them, they should exist, they should be used. But some are outdated, some are misinterpreted, some are abused. We are subjective, but this is not an excuse to blindly shorten names right and left. Naming should be full and comprehensive when contexts are huge, but keep it simple when the context is small. Search the ways to express ideas compact and by pieces, instead of just abusing formatting rules and saying it is readable. Explicit full naming is not an auto-readability (or better auto-comprehension), and it’s okay to shorten the names even to one symbol, there are cases for that.

This is the case

I think I didn’t say much of something new here, while tried to recap what I think and how I see things. Meanwhile I hope this might make someone wonder about any fundamental standard or practice they been using all this time and perhaps start questioning it.

--

--

Artsiom Miksiuk

I'm a Software Engineer with passion in psychology and all kinds of engineering: electrical, software, mechanical, hardware and etc.