A little bit of code [C++20 Ranges]
An introduction to one of the awaited features of C++20: the ranges.
C++20 is coming soon, with a number of cool features, one of which is the ranges library. In this article, I aim to demonstrate the power behind the elegant syntax it brings.
Disclaimers
- The library used in the code examples is not really the C++20 ranges, it’s the ranges-v3 open-source library from Eric Niebler, which is the basis of the proposal to add ranges to the C++. It’s a header-only library compatible with C++11/14/17.
- The code in the article is purely experimental, doesn’t really care about the cleanness and readability.
Functional Programming in C++
Functional Programming (hereinafter FP) is one of the programming paradigms C++ supports. However, compared to other functional-only languages it isn’t very comfortable. Yet, having such a sharp and powerful tool as C++ you’re free to create your own stuff. That’s exactly what ranges-v3 is. The best part of that library is the Unix pipe syntax it provides. That improves the readability, enables us to have less code, thus fewer errors. That’s just one of the pros of the FP: for more in-depth knowledge I highly recommend this book — Functional Programming in C++.
ranges-v3
Before understanding what this library brings to the table, let’s quickly recap what do we have. We have STL, in it containers, algorithms, a decent level of abstraction with iterators that enable the algorithms to work with those containers, and some functors.
The ranges library brings the following:
- range — a wrapper to anything like a range, e.g. containers, initializer lists, lazy range-like objects.
- algorithms — more or less the same set of algorithms that STL provides, just that these work with the ranges.
- actions — objects that modify the underlying range, like sort.
- views — lazy objects that behave like ranges or wrap actual ones, like filter.
Before going to the code, let’s dive into the views deeper. They’re significant for being something really new and lazy, thus highly efficient. Consider this pseudocode:
ContainerT container = { ... };
auto widgets = container
| views::filter(...)
| views::transform(...)
| views::take(10);
Imagine a scenario where you have a big container of user objects and you want to
- select those meeting a certain criteria
- create a widget from each user object
- take the first 10 widgets only
The straightforward way is to simply do all these steps, which might be very inefficient (considering the container is really big and we only need the first 10 of the result). But when we do the same with the views, their laziness makes it very efficient. In the pseudocode above, nothing has happened yet, we’ve just created a view, a combination of other views. It’s something that behaves like a range, i.e. we can iterate over it. Once we start iterating, the action begins:
for (auto widget: widgets) {
/* ... */
}
The widgets
will ask the views::take
for the next widget, which in its turn will ask views::transform
for a widget, but only 10 times. views::transform
then will ask views::filter
for the next user
to create a widget from. views::filter
will keep asking for the next user
from the underlying real container until its condition is met. The same would be possible with a bunch of nested loops which is ugly and error-prone. And also, keep in mind that the container
is actually a range, i.e. it can be a real container, a stream or just another view.
The idea is to demonstrate how powerful the ranges are going to make the language. So I’ll show two pieces of code that solve some real-world problems, trying to keep the code as short as possible, using the ranges-v3 library as much as possible.
Print the n frequent-most words
I came across this problem in the “Functional Programming in C++” book. The story tells, back in 1986, Donald Knuth was asked to implement a code for a journal. The problem stated:
Read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies.
Knuth came up with an implementation in Pascal that was 10 pages long. In response, Doug McIlroy wrote a UNIX shell script that solved the same problem, and it was this long:
tr -cs A-Za-z '\n' |
tr A-Z a-z |
sort |
uniq -c |
sort -rn |
sed ${1}q
This story was the main motive of this article, so I’ve decided to start with the very same problem.
It’s longer than McIlroy’s solutions but still, it’s so short you can write it without scrolling. Let’s go line by line and see how it works.
We’ll be using both actions and views from the library, thus the includes. is_alpha
is a pretty trivial function that checks whether the character is a letter or not. The program basically needs two inputs, (a) the n
number of most frequent words we want to see, which we’ll provide as a command-line argument, and (b) the text, which we’ll feed through the standard input ( std::cin
).
The next section starts with an istream_view
, which is a lazy container of std::string
s from standard input. For each of those words we’re doing the following transformation:
With views::all
we treat the word
as a container of char
s. views::trim
is another lazy view, that filters out the characters in the beginning and the end of that container that don’t satisfy the predicate. Here the predicate is the negation ( not_fn
) of the is_alpha
function defined above. views::transform
transforms every character to lower case. And finally, we convert the result back into a string
.
Generally, views::transform
expects a function that takes a value as an input and returns some other value, not necessarily of the same type.
After that transformation, we end up with a lazy container of trimmed lowercase words. The trimming might result in empty words. views::filter
takes a boolean predicate, in this case, one that checks for the emptiness of a string
, and filters those out. Next to_vector
(or to<std::vector>
) creates an actual container of those words, and the reason is the actions::sort
that follows. Obviously it sorts the container, but it’s important to note that sort
is an action, not a view. That’s because it has to move around the elements in the actual container, whereas views don’t affect the source range anyhow.
As a result, sorted_words
is a sorted vector of trimmed lowercase strings. For a lorem-ipsum input the first 100 elements look like this:
[a,a,a,a,a,a,a,a,a,a,ac,ac,ac,ac,ac,ac,ac,ac,ac,ac,ac,accumsan,accumsan,accumsan,accumsan,adipiscing,adipiscing,adipiscing,adipiscing,adipiscing,adipiscing,adipiscing,aenean,aenean,aenean,aenean,aliquam,aliquam,aliquam,aliquam,aliquam,aliquam,aliquam,aliquam,aliquam,aliquam,aliquet,aliquet,aliquet,aliquet,aliquet,aliquet,aliquet,aliquet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,amet,ante,ante,arcu,arcu,arcu,arcu,arcu,arcu,arcu,arcu,arcu,arcu,arcu,at,at,at,at,at,at,at,at,at,at,at,at,at]
The views::group_by
carves up the sorted_words
into a range of ranges of consecutively repeating elements (because of the std::equal
). Then we transform each of these groups into a pair of the length ( distance(group)
) and the first element of it. As of this moment, we have something like this:
[ {10, "a"}, {11, "ac"}, {4, "accumsan"} ... ]
The only thing left is to sort them by their first
s. We can make use of the default less-than operator of std::pair
and simply put everything into a multimap. As a result, we’ll have them sorted ascendingly.
We needed the n most frequent words, thus the last n elements. To do that we simply take the first n elements from the reverse view on the multimap.
Note: The library also provides a
views::take_last
view, but it’s not the same as| reverse | take
.
std::vector vec = { 1, 2, 3, 4, 5, 6 };
auto tl_view = vec | views::take_last(3); // [ 4, 5, 6 ]
auto rt_view = vec | views::reverse | views::take(3); // [ 6, 5, 4 ]
Print the calendar
The next problem is to write a program that prints the calendar for a given year. The idea came from this CppCon talk from Eric Niebler, the creator of the ranges-v3. The difference is I'm trying to keep the code as short as possible.
Again, this code only has a demonstrative and cognitive purpose. Otherwise, it’s full of bad practices.
First, we define two very fundamental functions: is_leap
returns true if the given year is a leap one, first_weekday_of
returns an integer in the [0–6] range for a given year, the weekday of the 1st of January of that year (0 being Sunday). These are the necessary and sufficient conditions to print the calendar.
The first interesting chunk is the following:
As the name suggests, we want to have an array of 12 integers denoting the number of days per month. We could simply initialize the values like the month_names
, but let’s not forget why are we here: to have fun with ranges. Remember the trick with the knuckles?
First, we create a view that counts 1 to 7, the number of knuckles and depressions between them on the left hand, with views::ints(1, 8)
. Then we transform
them into 1s and 0s consecutively. With the views::cycle
we make the view repeat after itself infinitely, imitating the attaching the right hand. Then we take the first 12 values, cause that’s how many we need. Then we transform them again by adding 30. And finally, we resolve the February case based on the leapness of the year.
Now that we know the number of days in each month and the first weekday of the year, we can calculate the first weekdays of each month.
It might look a bit complicated, but it’s just simple counting. The only interesting view here is the partial_sum
, which results in a new range of the same size, where the nth element is the result of applying the given binary function ( std::plus
in this case) to the nth element of the source range and the (n-1)th element of the resulting range.
As a result, we get the first weekdays of each month starting from February. As for January, we’ve had it already, in fact, we used it in the last transform
view to calculate the rest, so we just prepend it to this with the help of concat
view and there we have the first_weekdays_per_month
.
Moving on, we iterate over the months and print the calendar for each. The views::zip
takes three ranges of size 12 as an input and returns a single range of 12 tuples, each consisting of 3 values — month_name
, number of days
in that month, first_weekday
of that month:
[ ("January", 31, 2), ("February", 28, 5), ("March", 31, 5) ... ]
Then we print the header of the month and the padding for the first week (boring formatting stuff).
The next section generates the weeks of the month.
The views::zip
zips two ranges:
views::ints(1, days + 1)
generates the days of the month, 1 todays
inclusive.- The second
views::ints
does the same counting starting from the first weekday of the month, which, combined with thetransform
view results in a weekday circle of the same size.
As a result, we get such tuples:
[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 0), (7, 1) ... (31, 4)]
On this range, we then apply the group_by
view with a binary predicate that returns false when a new week begins.
[
[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)]
[(6, 0), (7, 1), (8, 2), (9, 3), (10, 4), (11, 5), (12, 6)]
...
[(27, 0), (28, 1), (29, 2), (30, 3), (31, 4)]
]
There we have our weeks. All that’s left is to iterate over the weeks, for each week over the days and print them out.
Simple as that.
As stated in the disclaimer, the code in the examples uses the ranges-v3 library. It’s a header-only library so cloning and including is all you need to be able to use it. I’d highly recommend playing around with it to get used to the new syntax, dive deeper into the views and actions, at least to be better prepared to the C++20. Unfortunately, not everything from this library will end up in the language. However, the ranges-v3 is mature and well-maintained enough to be used for more.