The Elixir Parallel Compiler


In a previous post we saw how the core of Elixir compilation works.

As you would expect, though, the Elixir compiler is not sequential. Files are compiled in parallel. This is not just a matter of spawning processes to speed things up, it is way more interesting.

Runtime dependencies at compile time

As we saw in the previous post, elixirc executes code just like elixir does. The difference is that elixirc emits .beam files as a side-effect. Hence, for example, in order to compile m.ex:

defmodule M do
@x N.f()

elixirc has to invoke N.f() at compile time and therefore at that point N has to be loaded into the virtual machine. But you do not need to pass n.ex before m.ex in the list of files to be compiled, nor is the dependency declared in m.ex. That makes a simple-minded parallelization insufficient, since it wouldn’t generally handle that use case.

In order to understand how’s that solved, let’s take a step back and introduce the error_handler Erlang module.

The error_handler Erlang module

Erlang modules do not need to declare their runtime dependencies either. Functions in module m may call functions of module n without declaring that m depends on n.

That is possible because the Erlang virtual machine autoloads modules (by default, there is also an alternative eager loading mechanism).

When the BEAM needs to call a function in an unknown module, it suspends the execution and invokes a callback in error_handler. In the happy path, the missing module is loaded, the function called, and execution resumed.

Let me underline that this happens in the same process.

The Kernel.ErrorHandler Elixir module

The Elixir parallel compiler has an optimistic and clever approach to solve the dependency problem explained before.

Erlang provides API to configure the error handler of a process. Elixir pulls out of the hat a magic trick: It configures its own error handler:

:erlang.process_flag(:error_handler, Kernel.ErrorHandler)

The callbacks in Kernel.ErrorHandler add a thin coordination layer, which falls back to the Erlang error_handler module.

How does parallel compilation work?

We have now all the pieces to understand how parallel compilation works.

The Elixir module Kernel.ParallelCompiler spawns compilers, each one responsible for a file. How many compilers at once? Think as many as cores. This number grows if needed, but that is the idea.

When a compiler is done with a file, it tells the coordinator sending an :ok message with the list of modules it has compiled (remember, in Elixir one file can define several modules).

When a compiler encounters a runtime dependency that cannot be resolved, like the call N.f() in the example, the Elixir error handler is triggered by the virtual machine. The handler sends a :waiting message to the coordinator to let it know it is waiting for N, and waits until the coordinator calls back:

receive do
{^ref, :found} -> true
{^ref, :not_found} -> false

Note that since the error handler runs in the same process, as noted above, that is effectively pausing the compiler.

In the happy path, when a compiler tells the coordinator N is ready, the coordinator broadcasts a :found message to all compilers waiting for N. Their error handlers get out of the receive block, and compilations resume.

Think for a moment: We’ve paused a compiler at who knows which deep and complicated state of execution, and resumed it with the dependency resolved. Pausing, waiting, and resuming are masterfully implemented in a few lines of code simply by taking advantage of the built-in error handler logic.

Wicked, isn’t it.

PS: Thanks a lot to José Valim for reviewing a draft of this post ❤️.