Breaking Erlang Maps #1


Erlang got a new data structure in release 17.0, the map(). This structure is a finite mapping from keys to values with a flat internal representation: a map is essentially two arrays: one of sorted keys and one of values. In turn, maps in this so-called flatmap representation are efficient for small map sizes, but they don’t perform well for large maps.

Release 18.0 is going to eliminate the large-map problem. In 18.0, maps switch to using a HAMT (Hash-Array Mapped Trie) internally once the map grows enough. It is a clever data structure which combines the properties of a hash table with a (level compressed) trie to provide fast lookup as well as persistence. This is the same data structure languages like Clojure, Scala and Haskell (unordered-containers) use. They were designed by the now late Phil Bagwell and the Erlang variant leans itself up against the work of Rich Hickey in Clojure.

Such a new structure however, needs testing. Erlang is a language which prides itself on stability of operation, not necessarily execution speed. So it would seem obvious that one should attempt to test the Erlang Maps extensively before an eventual release of 18.0. I decided to see how far we could get by building an Erlang QuickCheck model for the low-level map operations in Erlang and then testing it against the current stable release as well as 18.0-rc1 release candidate.

The goal of this blog post is to tell the story about the current QuickCheck model. How few lines of code there is in the model, and how little work you need to begin testing by generating tests. It also serves as an example of how to use QuickCheck on a real-world project. It serves to entertain and teach — or that is my hope at least.

There will be multiple blog posts, since it felt right to split up the work into multiple parts. The first part addresses how to generate data and the first simple tests. Later parts will address the more complicated model which uses state machines to drive the generation of tests.

QuickCheck primer

QuickCheck[1] is similar to fuzzing. Where a fuzzer bombs a system with sinister input in order to break it — so one can exploit the mistakes — a QC model checks internal consistency of a system by randomly executing commands against it. The crucial difference is while fuzzers are black-box and often have relatively simple understanding of the inner workings of a program, a QC test is a model. It knows what is supposed to happen in a given situation and makes sure it is so. This gives a QC model far more precision, because it can make use of prior knowledge to guide what should happen next. The price for this additional power is that one has to come up with a proper model in the first place.

In full model checking, one begins by producing a model — a simplified formal description — of the system under test (SUT). The model checker then exhaustively verifies the correctness of the model with respect to the SUT by systematically trying every possible state. The complexity of such an operation is that some models exhibit infinite command sequences. To handle these, it may be necessary to prove that one has a “loop” which will be stable, in order to tie the knot on the infinite sequences, so to speak.

QuickCheck[1] operates as a probalistic model checker in that we don’t check the model exhaustively. Rather, we derive random event traces from the model and verify that they are correct. In the words of John Hughes, we don’t write tests; We generate them. By simple configuration, we can tune the confidence we have in the tests by generating more of them. Spending more time gets us closer to the exhausting model checking algorithms. We can also decide what areas of the model we are interested in. We can weigh the system, skewing the events toward certain areas we know are harder to make correct in the system.

Any test system has a specification limit at which it can’t see further. Even proof assitant systems like, e.g., Coq, are not able to verify a property which is not part of the specification. We simply hope that enough testing form a protective web over the subject matter: math, programs, or hardware making it highly unlikely an error persists. And we hope that any deviation inside the protective web reverberates throughout the web so we eventually catch the mistake. In short,

Testing shows the presence, not the absence of bugs — Dijkstra, 1969

QuickCheck has proven to be efficient at showing the presence of errors, however. The method tends to find very subtle errors in programs which are later found to be malicious rather than benign. And that for a modest additional programming effort. While slower to write than a unit-test, the advantage is coming back thousand-fold once the model is up and running. The ability to run a couple million tests over a night, all different usually finds problems in a code base.

Erlang QuickCheck is an implementation of the QuickCheck idea. It defines a domain-specific-language in which one writes models. And then it contains tools to run those models against real-world code.


The first thing one must do is to come up with a strategy for testing maps. We immediately split low-level maps from things pertaining to the compiler. The maps syntax is compiler specific. But before one tests the notation, we should test that the ‘maps’ module is correct. There is little meaning in testing one without having established confidence at a lower level.

Second, should the test case be stateless or should it be stateful. In a stateless test case, the quickcheck model has no state and thus no knowledge between calls to the map (that is, it is closer to a fuzzer). For functions which are pure, this is often an adequate test method, but real programs are rarely entirely pure. We want to exploit knowledge about “what” and “when”: we want to know what is inside the map at any point in time so we can decide what to do. For instance, we want to check that we can remove an existing element in the map. To do this, we must know what elements are in the map — which is state. Also to check for removal of a non-existing element in the map, we must know what elements are not in the map — which likewise is state.

Furthermore, we need to look at two types of data generation:

  • One, we need to generate random map keys and map values we can use to manipulate maps. We also need to quickly be able to generate random maps. That is, we need to generate concrete data we can push into functions we call in the ‘maps’ module.
  • Two, we need to generate random commands from the ‘maps’ module. This allows us to avoid static tests where the commands are always the same. We will run tens — sometimes hundreds — of commands in each test, randomly deciding what to do, in order to make sure that any combination of ‘maps’ calls are stable. In Erlang QuickCheck, we have the ‘eqc_statem’ system, which encodes tests as state machines. In turn allowing for stateful command generation.

For a state machine model, we can exploit an isomorphism. A list of K/V pairs, subject to some constraints, is isomorphic to a map at any time. Altering the map with a function from the ‘maps’ module has a corresponding operation on lists. This leads to the idea that the model should use the list-representation of maps and then verify the map is manipulated in the right way. It is a common trick for QuickCheck modeling: the model has a simple representation of an advanced data structure. And hence we can use the simple and unoptimized model to make sure the complex and optimized structure is correct.

Note: One might think it is necessary to make the model execute quickly. This is rarely the case, if ever. A model should be focused on clarity and readability. It is much harder to read an advanced representation of data than a simple one. In addition, how a model shrinks is important for generating minimal counter-examples. Often, this means you need a less efficient model, but that sacrifice yields the counter-examples of minimality.

To check maps, we use two models. One is a simple stateless test of simple isomorphism properties for maps. The other is a more advanced stateful test using the above list({K, V}) representation. By splitting the model in two, we can weed out simple properties first. If the simple stateless model fails, it is often easier to find the culprit. Furthermore, once the simple model passes, it provides some confidence which can be supplanted to the more advanced model. It will probably not fail in certain ways.

Generation of map data

First, we need to address how to generate map contents. If we are to fire off random commands against maps, we need to be able to generate parameters for the map. Say we are to generate a maps:put/3 command. It is called as maps:put(Key, Value, Map), so we need to be able to generate random keys, random values and also random maps.

We define that map keys and values are going to be using the same generator:

%% Keys and values are map terms
map_key() -> map_term().
map_value() -> map_term().

and then we can define the real map generator in the map_term() function:

map_term() ->
?SIZED(Sz, map_term(Sz)).

We use a standard trick in QuickCheck. The ?SIZED macro allows us to obtain the current size of what is generated by making it into the explicit variable Sz. The size usually starts around 0 in runs and then increases to around 40 in the course of random testing. By having access to the size, we can control how to generate map terms of a given size:

map_term(0) ->
{100, oneof([int(), largeint(), atom(), binary(),
bitstring(), bool(), char(), evil_real()])},
{10, oneof([function0(int()), function2(int())])},
{10, eqc_gen:largebinary()}
map_term(K) ->
?LAZY(list(map_term(K div 8)))},
?LAZY(?LET(L, list(map_term(K div 8)),
?LAZY(eqc_gen:map(map_term(K div 8),
map_term(K div 8)))}

There are two cases here. Generating a map of size 0 will always create a scalar value. It will weight its generation. 100 out of 120 times it will generate integers, atoms, binaries, booleans and so on. 10 out of 120 times it will generate a function of either 0 or 2 arguments (which are pure, determinstic and returns ints). And 10 out of 120 a large binary is generated (between 65 bytes and 64 Kilobytes). The evil reals are reals of the form:

evil_real() ->
{20, real()},
{1, return(0.0)},
{1, return(0.0/-1)}]).

If one wonders why these are interesting, consider erlang:phash2(0.0) versus erlang:phash2(0.0/-1)…[2].

If generating a map term of size K, 40 out of 43 times a scalar is generated. But sometimes a list of map terms, a tuple or a map of map terms are generated. The recursive call divides the size by quite a lot to make sure the next generations are smaller and that we will eventually hit scalars. The ?LAZY parameter avoids generating the recursive composite variants unless the frequency/1 combinator ends up picking that branch. Had we not used ?LAZY, then we would have generated all of the tree every time and then picked in the tree. With this, we only generate a “path” in the tree, which is much faster. Haskell programmers get this for free, but in strict languages, one needs to explicitly state when one wants lazy evaulation.

Now, we can look to generate lists of Key/Value pairs:

gen_map() ->
?SIZED(Sz, resize(Sz * 15,
list({resize(Sz, map_key()), resize(Sz, map_value())}))).

map_list() ->

map_map() ->
?LET(ML, map_list(), maps:from_list(ML)).

The gen_map() uses the ?SIZED macro to obtain the current generation size. We then adjust it up by a factor of 15 for the list generator, but keep it normal for the keys and values. This lets us generate very large lists by default without having to tune a lot more.

Erlang 18.0 has the HAMT structure internally and its internal breakoff point for switching to HAMT are currently 32 elements (2015–03–29). So we need to make sure we generate lists well in that ballpark to test that the HAMT structure is acting like it is supposed to. Also, if we find an error, having a 400 element list is not a problem. QuickCheck shrinks lists by finding elements that can be deleted from the list. So chances are we will find a much smaller list as the counterexample.

With gen_map(), we can directly generate lists suitable for maps. And we can generate maps by calling map_list() and then using the output with maps:from_list/1 to build up a map with the contents.

Stateless testing

Maps support the functions f = maps:from_list/1 and g = maps:to_list/1. These two functions form a structural isomorphism between the domain M of map(A,B), and domain L, that of list({A, B})[3]. We can utilize this because if m is any map in M, then f(g(m)) = m. And if l is in L, then g(f(l)) = l. This establishes an isomorphism in the same sense as a mathematical category.

We can implement these rules in two QuickCheck properties. The fg variant is the following:

prop_list_iso_fg() ->
?FORALL(M, maps_eqc:map_map(),
List = maps:to_list(M),
M2 = maps:from_list(List),
equals(M, M2)

It is essentially an implementation of the above requirement. Its counterpart running the gf path is a bit more involved:

prop_list_iso_gf() ->
?FORALL(L, maps_eqc:map_list(),
LD = dedup(L),
M = maps:from_list(L),
LD2 = maps:to_list(M),
equals(lists:sort(LD), lists:sort(LD2))

The variant has to remove duplicate keys from the list for the L domain to be isomorphic to M in a well-defined way. And the result has to be sorted in order to make sure there is a canonical representation of the lists. Maps can return K/V pairs in any order.

Before going for 18.0, we run test cases on the current stable version 17.4.1. Running 100 tests for each of these turns out to be fine:

4> eqc:quickcheck(maps_iso_eqc:prop_list_iso_fg()).
OK, passed 100 tests
5> eqc:quickcheck(maps_iso_eqc:prop_list_iso_gf()).
OK, passed 100 tests

Binary variants

Maps can be converted through the functions term_to_binary/2 and binary_to_term/1. These functions are used to serialize data on a wire and are central to Erlang. The distribution protocol uses a variant of these. A lot of places where you need to transfer terms benefit from a compact, compressed, binary representation. Again, it is an isomorphism, since we can convert back and forth from a map. Given enough patience, we can also construct a binary which decodes into a map term. But often this is left out since it is not that easy to build up the binary from scratch.

The term_to_binary/2 function takes a list of options, which we can encode for in the test:

opts() ->
?LET({Compressed, MinorVersion},
{oneof([[], [compressed], [{compressed, choose(0,9)}]]),
oneof([[], [{minor_version, choose(0,1)}]])},
Compressed ++ MinorVersion).

And then we can use the opts() generator to generate the real test case:

prop_binary_iso_fg() ->
?FORALL([Opts, M],
[opts(), maps_eqc:map_map()],
Binary = term_to_binary(M, Opts),
M2 = binary_to_term(Binary),
equals(M, M2)

It works exactly like in the list_fg case: flip back and forth between the binary representation and verify for equality. We run 100 test cases on Erlang 17.4.1, the stable version:

13> eqc:quickcheck(maps_iso_eqc:prop_binary_iso_fg()).
....Failed! After 5 tests.
#{-3878269413 => hill,
-1 => stone,
#Fun<eqc_gen.133.121384563> => sand,
<<3:2>> => 0.0} /=
#{-1 => stone,
-3878269413 => hill,
#Fun<eqc_gen.133.121384563> => sand,
<<3:2>> => 0.0}
Shrinking xxxx.x.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx................................xxxxxxxxxxxxxxxxxxxxxxxxxxxx(x10)xxxxxxx(x1).x(35 times)

#{-1 => flower,2147483647 => flower} /=
#{2147483647 => flower,-1 => flower}

Aha! Failed after 5 tests, and it looks suspicious! The internal ordering of the map is suddenly reversed. This violates the assumption about the internal flatmap representation. Indeed, Wolfram Alpha reports that — apart from being a prime — 2147483647 is also 2³¹-1. It set off some alarm bells when I reported it[4], and a newer stable version of Erlang will have a fix (OTP-12623 is on the current maintenance branch and will be in 17.5 with a bit of luck).

Going deeper

A later post will cover the state machine ‘maps_eqc’ model which is used to check the maps module. It verifies every command of the ‘maps’ module and reports far more statistics. The current work is ongoing and we are using these models to weed out errors from Erlang 18.0 and gain confidence we don’t introduce obvious mistakes in the new HAMT implementation. And while here, we also fix existing bugs that we find[5]. The model has already proven its worth, and has found errors, in the release candidate 18.0-rc1. But with the latest patches from the OTP team on top of the release candidate, the models doesn’t provoke errors anymore.

If you are interested in the full model, it is Open Source under an Apache 2.0 license here (at the time of this writing the commit ID is 5a9523f98635. Expect the model to change over time):

Note: I simplified some of the code in this post because it reads better and avoids having to explain some details. I don’t think anything was lost, but in the interest of full transparency, I better note I did this.

[1] QuickCheck was developed by John Hughes and Koen Claessen. For a good starting point, see the wikipedia article:

[2] This one was found by Björn-Egil Dahlberg from the OTP Team.

[3] L needs some additional rules. For instance, the list must have unique keys, because otherwise there is no corresponding well defined map. So the domain is not any list, but rather “lists of a certain structure with certain no-junk properties”.


[5] “We” means the OTP team, which are doing all of the hard work on HAMTs. The author of this post only writes the QuickCheck model to verify correctness.