Ranges, Code Quality, and the Future of C++
Many of you have seen the recent blog post by Eric Niebler about the acceptance of his C++ Ranges proposal to the C++2a standard. This is a feature set I’ve wanted in C++ for some time. In fact, using C#’s standard LINQ library, I’ve become accustomed to writing code in this style.
I found it unfortunate, then, to see people respond to this post on Reddit and Twitter by complaining that this feature makes code unreadable. Apparently, C++ is becoming more complex and less useful.
I think this is completely untrue. C++2a is going to be the best version of C++ yet, and a big reason for that is Eric’s Ranges library.
But even to me, his Pythagorean Triples example is bad code. This is not because this range library makes code harder to read, but because he utilizes the library very poorly.
Now, code quality is subjective, pure and simple. But generally, people consider code that is easy to read, write, and debug to be high quality. With that in mind, ranges enable much higher code quality than was previously achievable in C++.
Here is the Pythagorean Triples example Eric posted:
Even a programmer familiar with “range-transformation style” will find it takes at least a few minutes to figure out what this code is doing.
In order to make code easier to read, you need to reduce the amount of time it takes to understand the code as well as reduce the potential for someone to misunderstand the code. This actually goes as much for writing the code as it does for reading it — it’s easier to make mistakes writing confusing code than it is clean code, even if you fully understand what the code is supposed to be doing in your head.
The above code doesn’t do any of these things. It takes an algorithm that is very simple, an algorithm that can be represented very clearly, and makes it harder to understand.
Here is what the algorithm (with one difference) looks like when written in a more classic C/C++ style:
This isn’t terrible, but there’s enough complexity exposed that it still takes a significant amount of time to understand the algorithm. The reader needs to parse and keep track of each of the statements within those 3 for-loops. Any of those loops could be written incorrectly, and generally you just have to read it very carefully to ensure you understand what they’re actually doing.
Those for-loops are a common pattern (hell, it’s probably the most common pattern in C++), but every time they’re used, they increase the complexity of the function and increase the likeliness of incorrect interpretation.
Let’s clean it up a bit:
Now, to understand this code, you do need to understand the “iota” function. This dependency is fine! Encapsulation is a good thing — instead of explicitly writing those for-int-loops, we leverage an existing, tested method to do it. And even without understanding iota, you can still read this and abstractly understand the algorithm.
This reduction of complexity is precisely what makes code easier to read.
However, there’s a small difference from the functionality of Eric’s example: this version don’t stop at 10 triples. It will print triples forever. How do we modify it to only print 10 triples?
The answer is… well, you can’t do it elegantly. The algorithm for generating triples is embedded into the code utilizing the triples (by printing them). Because of this, in order to limit the number of triples, we have to modify the algorithm itself.
Now the code isn’t simple at all. While it doesn’t take quite as long to understand as Eric’s example, the code is no longer trivial to read. How long does it take you to check that line 13 is correct? Should it be ++count or count++? Should it check for == or !=?
The cognitive overhead of reading this function is now significantly higher. This one algorithm generates triples, counts how many triples were generated, and stops generating when it hits 10. It’s hard to read because this isn’t just one algorithm; it’s multiple algorithms written together.
Ranges and Coroutines
This situation illustrates precisely why ranges are so great (and important). A range allows you to return the algorithm itself, rather than the data the algorithm generates. This way, you can combine it with other algorithms without modifying it directly.
There are a few ways to turn an algorithm into a range. There’s the way Eric did it, by combining existing range algorithms. However, the “triples” algorithm isn’t a natural combination of other basic range algorithms. That’s why his code is so hard to read.
The simplest way to write this algorithm is to use a coroutine. To be fair, this feature is not accepted for C++2a yet, but I think it’s fundamental to writing new range algorithms.
Now we have a triples function that returns a generator (a type of range) that generates triples infinitely. Reading that function is as easy as reading the “clean” version I showed earlier, but now it has all the advantages of a range! We can trivially combine it with the “take” function (on line 13) in order to limit the number of triples output.
Like iota, you need to know what the “take” function does to understand its use. However, like iota, the function is so simple that understanding it in context is trivial.
In fact, the reason I believe coroutines are so fundamental to ranges is because it allows functions like take and iota to be written just as cleanly!
With coroutines, it’s as easy to understand any given range function as it is to use them.
The point of ranges isn’t to arbitrarily use functional-style coding in order to make a simple algorithm harder to read. The point is to reduce complexity by breaking an algorithm into its component parts and allowing you to easily put them together.
tl;dr Ranges are for utilizing algorithms and coroutines are for implementing algorithms. When used properly, ranges improve code quality significantly.