A little bit of code [C++20 Ranges]

An introduction to one of the awaited features of C++20: the ranges.

Vanand Gasparyan
ITNEXT

--

C++20 is coming soon, with a number of cool features, one of which is the ranges library. In this article, I aim to demonstrate the power behind the elegant syntax it brings.

Disclaimers

  1. The library used in the code examples is not really the C++20 ranges, it’s the ranges-v3 open-source library from Eric Niebler, which is the basis of the proposal to add ranges to the C++. It’s a header-only library compatible with C++11/14/17.
  2. The code in the article is purely experimental, doesn’t really care about the cleanness and readability.

Functional Programming in C++

Functional Programming (hereinafter FP) is one of the programming paradigms C++ supports. However, compared to other functional-only languages it isn’t very comfortable. Yet, having such a sharp and powerful tool as C++ you’re free to create your own stuff. That’s exactly what ranges-v3 is. The best part of that library is the Unix pipe syntax it provides. That improves the readability, enables us to have less code, thus fewer errors. That’s just one of the pros of the FP: for more in-depth knowledge I highly recommend this book — Functional Programming in C++.

ranges-v3

Before understanding what this library brings to the table, let’s quickly recap what do we have. We have STL, in it containers, algorithms, a decent level of abstraction with iterators that enable the algorithms to work with those containers, and some functors.

The ranges library brings the following:

  1. range — a wrapper to anything like a range, e.g. containers, initializer lists, lazy range-like objects.
  2. algorithms — more or less the same set of algorithms that STL provides, just that these work with the ranges.
  3. actions — objects that modify the underlying range, like sort.
  4. views — lazy objects that behave like ranges or wrap actual ones, like filter.

Before going to the code, let’s dive into the views deeper. They’re significant for being something really new and lazy, thus highly efficient. Consider this pseudocode:

ContainerT container = { ... };
auto widgets = container
| views::filter(...)
| views::transform(...)
| views::take(10);

Imagine a scenario where you have a big container of user objects and you want to

  1. select those meeting a certain criteria
  2. create a widget from each user object
  3. take the first 10 widgets only

The straightforward way is to simply do all these steps, which might be very inefficient (considering the container is really big and we only need the first 10 of the result). But when we do the same with the views, their laziness makes it very efficient. In the pseudocode above, nothing has happened yet, we’ve just created a view, a combination of other views. It’s something that behaves like a range, i.e. we can iterate over it. Once we start iterating, the action begins:

for (auto widget: widgets) {
/* ... */
}

The widgets will ask the views::take for the next widget, which in its turn will ask views::transform for a widget, but only 10 times. views::transform then will ask views::filter for the next user to create a widget from. views::filter will keep asking for the next user from the underlying real container until its condition is met. The same would be possible with a bunch of nested loops which is ugly and error-prone. And also, keep in mind that the container is actually a range, i.e. it can be a real container, a stream or just another view.

The idea is to demonstrate how powerful the ranges are going to make the language. So I’ll show two pieces of code that solve some real-world problems, trying to keep the code as short as possible, using the ranges-v3 library as much as possible.

Print the n frequent-most words

I came across this problem in the “Functional Programming in C++” book. The story tells, back in 1986, Donald Knuth was asked to implement a code for a journal. The problem stated:

Read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies.

Knuth came up with an implementation in Pascal that was 10 pages long. In response, Doug McIlroy wrote a UNIX shell script that solved the same problem, and it was this long:

tr -cs A-Za-z '\n' |
tr A-Z a-z |
sort |
uniq -c |
sort -rn |
sed ${1}q

This story was the main motive of this article, so I’ve decided to start with the very same problem.

It’s longer than McIlroy’s solutions but still, it’s so short you can write it without scrolling. Let’s go line by line and see how it works.

We’ll be using both actions and views from the library, thus the includes. is_alpha is a pretty trivial function that checks whether the character is a letter or not. The program basically needs two inputs, (a) the n number of most frequent words we want to see, which we’ll provide as a command-line argument, and (b) the text, which we’ll feed through the standard input ( std::cin ).

The next section starts with an istream_view, which is a lazy container of std::strings from standard input. For each of those words we’re doing the following transformation:

With views::all we treat the word as a container of chars. views::trim is another lazy view, that filters out the characters in the beginning and the end of that container that don’t satisfy the predicate. Here the predicate is the negation ( not_fn ) of the is_alpha function defined above. views::transform transforms every character to lower case. And finally, we convert the result back into a string.

Generally, views::transform expects a function that takes a value as an input and returns some other value, not necessarily of the same type.

After that transformation, we end up with a lazy container of trimmed lowercase words. The trimming might result in empty words. views::filter takes a boolean predicate, in this case, one that checks for the emptiness of a string, and filters those out. Next to_vector (or to<std::vector>) creates an actual container of those words, and the reason is the actions::sort that follows. Obviously it sorts the container, but it’s important to note that sort is an action, not a view. That’s because it has to move around the elements in the actual container, whereas views don’t affect the source range anyhow.

As a result, sorted_words is a sorted vector of trimmed lowercase strings. For a lorem-ipsum input the first 100 elements look like this:

[a,a,a,a,a,a,a,a,a,a,ac,ac,ac,ac,ac,ac,ac,ac,ac,ac,ac,accumsan,accumsan,accumsan,accumsan,adipiscing,adipiscing,adipiscing,adipiscing,adipiscing,adipiscing,adipiscing,aenean,aenean,aenean,aenean,aliquam,aliquam,aliquam,aliquam,aliquam,aliquam,aliquam,aliquam,aliquam,aliquam,aliquet,aliquet,aliquet,aliquet,aliquet,aliquet,aliquet,aliquet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,ante,ante,arcu,arcu,arcu,arcu,arcu,arcu,arcu,arcu,arcu,arcu,arcu,at,at,at,at,at,at,at,at,at,at,at,at,at]

The views::group_by carves up the sorted_words into a range of ranges of consecutively repeating elements (because of the std::equal ). Then we transform each of these groups into a pair of the length ( distance(group) ) and the first element of it. As of this moment, we have something like this:

[ {10, "a"}, {11, "ac"}, {4, "accumsan"} ... ]

The only thing left is to sort them by their firsts. We can make use of the default less-than operator of std::pair and simply put everything into a multimap. As a result, we’ll have them sorted ascendingly.

We needed the n most frequent words, thus the last n elements. To do that we simply take the first n elements from the reverse view on the multimap.

Note: The library also provides aviews::take_last view, but it’s not the same as | reverse | take.

std::vector vec = { 1, 2, 3, 4, 5, 6 };
auto tl_view = vec | views::take_last(3); // [ 4, 5, 6 ]
auto rt_view = vec | views::reverse | views::take(3); // [ 6, 5, 4 ]

Print the calendar

The next problem is to write a program that prints the calendar for a given year. The idea came from this CppCon talk from Eric Niebler, the creator of the ranges-v3. The difference is I'm trying to keep the code as short as possible.

Again, this code only has a demonstrative and cognitive purpose. Otherwise, it’s full of bad practices.

First, we define two very fundamental functions: is_leap returns true if the given year is a leap one, first_weekday_of returns an integer in the [0–6] range for a given year, the weekday of the 1st of January of that year (0 being Sunday). These are the necessary and sufficient conditions to print the calendar.

The first interesting chunk is the following:

As the name suggests, we want to have an array of 12 integers denoting the number of days per month. We could simply initialize the values like the month_names, but let’s not forget why are we here: to have fun with ranges. Remember the trick with the knuckles?

First, we create a view that counts 1 to 7, the number of knuckles and depressions between them on the left hand, with views::ints(1, 8). Then we transform them into 1s and 0s consecutively. With the views::cycle we make the view repeat after itself infinitely, imitating the attaching the right hand. Then we take the first 12 values, cause that’s how many we need. Then we transform them again by adding 30. And finally, we resolve the February case based on the leapness of the year.

Now that we know the number of days in each month and the first weekday of the year, we can calculate the first weekdays of each month.

It might look a bit complicated, but it’s just simple counting. The only interesting view here is the partial_sum, which results in a new range of the same size, where the nth element is the result of applying the given binary function ( std::plus in this case) to the nth element of the source range and the (n-1)th element of the resulting range.

As a result, we get the first weekdays of each month starting from February. As for January, we’ve had it already, in fact, we used it in the last transform view to calculate the rest, so we just prepend it to this with the help of concat view and there we have the first_weekdays_per_month.

Moving on, we iterate over the months and print the calendar for each. The views::zip takes three ranges of size 12 as an input and returns a single range of 12 tuples, each consisting of 3 values — month_name, number of days in that month, first_weekday of that month:

[ ("January", 31, 2), ("February", 28, 5), ("March", 31, 5) ... ]

Then we print the header of the month and the padding for the first week (boring formatting stuff).

The next section generates the weeks of the month.

The views::zip zips two ranges:

  1. views::ints(1, days + 1) generates the days of the month, 1 to days inclusive.
  2. The second views::ints does the same counting starting from the first weekday of the month, which, combined with the transform view results in a weekday circle of the same size.

As a result, we get such tuples:

[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 0), (7, 1) ... (31, 4)]

On this range, we then apply the group_by view with a binary predicate that returns false when a new week begins.

[
[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)]
[(6, 0), (7, 1), (8, 2), (9, 3), (10, 4), (11, 5), (12, 6)]
...
[(27, 0), (28, 1), (29, 2), (30, 3), (31, 4)]
]

There we have our weeks. All that’s left is to iterate over the weeks, for each week over the days and print them out.

Simple as that.

As stated in the disclaimer, the code in the examples uses the ranges-v3 library. It’s a header-only library so cloning and including is all you need to be able to use it. I’d highly recommend playing around with it to get used to the new syntax, dive deeper into the views and actions, at least to be better prepared to the C++20. Unfortunately, not everything from this library will end up in the language. However, the ranges-v3 is mature and well-maintained enough to be used for more.

--

--