An Erlang/OTP 20.0 optimization

Edit: some word choices have been altered slightly in order to make some parts more clear.

This is a short blurb about a specific optimization present in Erlang 20.0 which is scheduled for release in June 2017. The README file mentions the following:

OTP-13529    Application(s): erts

Erlang literals are no longer copied during process to process messaging.

And there have been a couple of questions as to what that is and means. Suppose we have the following little Erlang module:

-module(z).
-export([f/0]).
bin() ->
<<"Some binary value">>.
map() ->
#{ a => 3,
b => "Hello" }.
f() ->
{bin(), map()}.

which we now compile with beam instruction output (where the instructions are represented as Erlang terms):

erlc -S z.erl

The beam data is dumped in z.S which we can read in and look at. First comes a couple of standard header stuff:

{module, z}.  %% version = 0
{exports, [{f,0},{module_info,0},{module_info,1}]}.
{attributes, []}.
{labels, 11}.

Next, our functions follow. First, the bin function:

{function, bin, 0, 2}.
{label,1}.
{line,[{location,"z.erl",5}]}.
{func_info,{atom,z},{atom,bin},0}.
{label,2}.
{move,{literal,<<"Some binary value">>},{x,0}}.
return.

The function is executed by running label 2. Note that this is a move instruction of a literal value into the register x0. The Erlang system stores such literals off-heap and ready for reference. The map function is the same. Since the map is just a constant value, we can represent it as a literal value outside the heap:

{function, map, 0, 4}.
{label,3}.
{line,[{location,"z.erl",8}]}.
{func_info,{atom,z},{atom,map},0}.
{label,4}.
{move,{literal,#{a => 3,b => "Hello"}},{x,0}}.
return.

Next, the function f/0 follows. This function allocates a stack slot, calls bin() to get the first literal and stashes it in the stack slot. Then calls map() to get the second literal. Now, a tuple is allocated on the heap and the two literal vales are put inside the tuple. Finally, the tuple is returned in the x0 register and we reestablish the original stack by de-allocating the extra slot we used:

{function, f, 0, 6}.
{label,5}.
{line,[{location,"z.erl",12}]}.
{func_info,{atom,z},{atom,f},0}.
{label,6}.
{allocate_zero,1,0}.
{line,[{location,"z.erl",13}]}.
{call,0,{f,2}}.
{move,{x,0},{y,0}}.
{line,[{location,"z.erl",13}]}.
{call,0,{f,4}}.
{test_heap,3,1}.
{put_tuple,2,{x,1}}.
{put,{y,0}}.
{put,{x,0}}.
{move,{x,1},{x,0}}.
{deallocate,1}.
return.

The Erlang system gains many benefits from literal values since they are easy to reference from multiple processes and are generally “free” values to work with. However, in Erlang versions before 20.0, when you send a literal as a message, it is copied into the message as an ordinary value. This means you lost the beneficial sharing that is going on with the literal value.

The code for message passing in Erlang/OTP 20.0 now handles literal values directly. Rather than copying the contents of the literal into the message, we pass a pointer to the literal area. Of course, in order to do this safely, you must ensure the invariants of literal values are in place. In particular, the literal lives in a module, and if that module is purged from the system, the literal value must be saved somewhere else so references to it are preserved.

Why does it matter?

This change is one which is rather classic for the Erlang BEAM VM over the years. Most systems won’t need this in normal operation, but it helps a little bit along the way. And a few systems will have a tremendous amount of help from this change.

If you value long-running systems without restart, it tend to be the case that the errors you have to fix becomes more and more outrageous. The kinds of errors which makes the system fail in the end require complex interactions between several subsystems. Added memory pressure is among them. Robust operation contains more than simply efficiency, albeit this change also optimizes the system.

The compile-module hack

A hack that has seen some use throughout time is that if you use a tool such as merl to construct a module and then compile the module, its literals are essentially “free” in the Erlang VM. Thus, you can avoid some memory pressure if you need some kind of data lookup table — and the table has the property it stays mostly the same and rarely changes. You simply recompile and hot-load the new table on change.

With this change, the compile-module hack is even more powerful, because you can pass the values around between processes without risking a copy and thus increased memory pressure.

All in all, it looks like it is a neat optimization.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.