Using constant pools to speed up your Elixir code

A nice picture of a pool.

Introduction

Writing code in Elixir that is optimized for speed can sometimes be challenging due to the nature of the Erlang runtime environment’s (BEAM) concurrency model. Instead of using OS threads like, say, Java and other languages, BEAM provides its own abstraction in the form of processes implemented in the runtime environment itself. These processes have their own stack and heap, but more importantly they share nothing with other processes. Communication between processes is done via message passing. In practice what this means is that any message, which is just an Erlang term, that is sent to another process must be copied to the destination’s heap.

For the most part, this limitation is fine. A lot of the time, a message that gets passed will be small enough, usually something like :ok or {:ok, result} where result is any term. Of course, result could be of arbitrary size, and that is where the problem could start to appear.

A naive GenServer constant store

To get a feel for the problem, let us start with a contrived scenario: we wish to store some data to be available to other processes. We will naively do so with a GenServer like in the following example.

Now every process that wishes to access the data can do so by calling Store.get/0. The immediate problem with this is that processes go through the messages in their mailbox one at a time. This means that reads are effectively serialized by the host process, which is less than ideal when you have thousands of processes trying to read the data.

ETS tables as a first optimization

Erlang Term Storage (ETS) is an in-memory data store with some modest concurrency guarantees, namely that updates to single objects are both atomic and isolated. What this means is that any update to an object either succeeds or fails without effect, and no other processes will see intermediate results of the update. An ETS table is created from within a process and is stored off-heap. The process maintains ownership and the table persists until its owner process dies.

The default behavior is that the owner process is the only one that can write to the table, but any other process may read from it. This is a nice way of solving the sequential read problem with the pure GenServer implementation from before. A naive ETS store might look like the following.

Other processes may access the data via :ets.lookup(:store, :key).

There is still a slight problem though. The data in the ETS table is indeed stored off-heap, but any read from a process requires the data to be copied into the process space. So if the size was so prohibitively large that copying it between processes incurred a performance penalty then we are mostly back where we started.

Module constant pools

Constant Erlang terms are also called literals and are treated in a somewhat special manner. Per the Erlang docs efficiency guide in section 8.2, the literals are stored in that module’s pool. These pools are heaps in their own right, but may actually be referenced directly from other processes without being copied. This implementation detail can be quite useful for alleviating memory pressure issues. So we can solve both process read access and data copying problems in our scenario with the following.

This is all well and good, but how often will this scenario ever play out? The vast majority of the time, the data will be dynamic in nature and not something that can be baked into the code. So what we really need here is a way to be able to dynamically set the data as a literal in a module during runtime. This is something that can be accomplished with a simple macro.

defmodule StoreSetter do
defmacro set(val) do
quote do
defmodule Store do
def get() do
unquote(val)
end
end
end
end
end

There is a little bit going on here so let’s take a little bit of time to understand it. The keyword defmacro defines a macro called set which we will end up calling like a regular function. The quote keyword instructs the compiler to store everything in its block as data, which in this case is the module definition of Store with its get function. It then transforms the block of code into its internal abstract syntax tree representation which is a nested tree of 3-tuples. This representation is then injected back into the compile tree. Now unquote here lets us evaluate a code block, in this case the term we want to set in the store, but defers execution until the code generated by the quote block is executed.

So that was a quick and dirty rundown of Elixir macros. For more check out the official docs and Chris McCord’s excellent book. Let’s try it out in iex.

iex(1)> c "naive_dynamic_module.ex"
[StoreSetter]
iex(2)> require StoreSetter
StoreSetter
iex(3)> StoreSetter.set([:foo])
{:module, Store,
<<70, 79, 82, 49, 0, 0, 3, 140, 66, 69, 65, 77, 65, 116, 85, 56, 0, 0, 0, 82,
0, 0, 0, 8, 12, 69, 108, 105, 120, 105, 114, 46, 83, 116, 111, 114, 101, 8,
95, 95, 105, 110, 102, 111, 95, 95, 9, ...>>, {:get, 0}}
iex(4)> Store.get
[:foo]

So far so good, when we call StoreSetter.set/1 with a term we generate a new module called Store and we can retrieve the value we set by calling Store.get. Now, what if we were to set a new value in the store?

iex(5)> StoreSetter.set([:foo, :bar])
warning: redefining module Store (current version defined in memory)
iex:5
{:module, Store,
<<70, 79, 82, 49, 0, 0, 3, 152, 66, 69, 65, 77, 65, 116, 85, 56, 0, 0, 0, 82,
0, 0, 0, 8, 12, 69, 108, 105, 120, 105, 114, 46, 83, 116, 111, 114, 101, 8,
95, 95, 105, 110, 102, 111, 95, 95, 9, ...>>, {:get, 0}}
iex(6)> Store.get
[:foo, :bar]

It still works as expected, but we get that pesky warning. To get rid of it, we need to first remove the module from the loaded modules in the runtime. This is easily done thanks to the :code module as it is core to BEAM’s hot code upgrade feature. Let’s continue in our iex session.

iex(7)> :code.purge(Store)
false
iex(8)> :code.delete(Store)
true
iex(9)> StoreSetter.set([:foo, :bar, :baz])
{:module, Store,
<<70, 79, 82, 49, 0, 0, 3, 156, 66, 69, 65, 77, 65, 116, 85, 56, 0, 0, 0, 82,
0, 0, 0, 8, 12, 69, 108, 105, 120, 105, 114, 46, 83, 116, 111, 114, 101, 8,
95, 95, 105, 110, 102, 111, 95, 95, 9, ...>>, {:get, 0}}

We no longer get the warning, and we have the ability to allocate data in a special read-only heap that can be read by any process without copying the data. As a special note, OTP 20 included an optimization such that literals are no longer copied when sending messages, which makes this technique even more powerful.

A few months ago, Discord’s CTO wrote a blog post that gave a very nice high level overview of the Elixir powered portion of their backend. Discord is a distributed chat app backed by a ring data structure created using consistent hashing. They were experiencing slow reads and even slower lookups in the ring. One of their optimization attempts involved storing the ring in ETS. Even though ETS has sophisticated pattern matching to limit the size of the data that would eventually be copied back to the calling process, the size of the ring was still prohibitively large. They quoted 17.5 seconds for looking up values. Eventually they discovered an Erlang library called mochiglobal which abstracts away the previous technique into a nice API. Discord ported it to Elixir under the name fastglobal. Now that their ring was stored in a constant pool, their lookup time decreased to 0.3 microseconds. The downside is that compiling and loading a module with a large amount of data can take some time. In Discord’s case they reported up to one second. Therefore, this technique should only really be used if the amount of reads vastly outnumber the amount of writes, and this difference is known.

If you want to learn more about how to implement this technique, colloquially known sometimes as the “compile module hack”, I would highly recommend going through Discord’s fastglobal library to understand how it is implemented. Do note that they use erlang syntax to build up their modules while here I used a macro.