How does Elixir compile/execute code?
Introduction
Elixir always compiles and always executes source code. Both elixir
and elixirc
do both things.
You read that right, always, compilation and execution. elixir
compiles (in addition to execute), elixirc
executes (in addition to compile).
Main phases of Elixir compilation
Both elixir
and elixirc
work the same way:
- Load the contents of the file in memory.
- Produce an AST from it using a custom tokenizer and yecc.
- Expand macros, inline functions, …, a bunch of transformations are applied here in what’s known as the expansion phase. That yields an expanded AST, which still conforms to the same spec.
- Transform that final AST into Erlang Abstract Format, which is a standard representation of an Erlang AST using Erlang terms.
- Manually build an abstract format tree for a function called
__FILE__/1
in a module calledelixir_compiler_X
, where X is an integer, with the abstract format of the program from the step above as function body. - Compile the result to BEAM assembly on the fly with
compile:forms/2
, which returns a binary (no file is written). - Load said binary into the Erlang VM using the Erlang code server.
- Call
elixir_compiler_X.__FILE__/1
. Since this function has your whole program as body, the VM is effectively running the program. Check this one-liner in an .ex(s) file, you’ll see it reports that function and module names:IO.inspect(:erlang.process_info(self(), :current_function))
.
There is some nesting in this process that explains the loop illustrated in the picture above. This is due to the way module definition is implemented, but we’ll leave it here.
Observations
Both elixir
and elixirc
do the same. elixirc
executes top-level and module-level code like elixir
does, it is the same code path.
For example, you can conditionally define a function while compiling. Why? Because the code is being executed. The other way around, elixir
is able to invoke functions in modules defined in the same script. Why? Because they are compiled and loaded into the VM on the fly.
Since programs executed by elixir
are compiled, they run at the speed of compiled modules. Compilation has a penalty, of course, the wall clock time is different, but the code itself runs equally fast.
How are elixir
and elixirc
different?
The main difference between elixir
and elixirc
is that elixirc
produces a .beam file per module as a side-effect of module definition. It does so by dumping the binary returned by compiler:forms/2
. That’s about it.
Extensions in file names do not matter, .ex and .exs are only conventions.
You can also compile a file that contains five modules, and you’ll get five different .beam files, each named after the module name (regardless of the name of the file defining the modules).
Top-level code or module-level code that does not end in a persisted module attribute or a function is gone in the .beam files. Those files contain module definitions for the VM expressed in object code, Elixir is gone there, those are BEAM programs that could have technically been generated by some other tool.
PS: Thanks a lot to José Valim for reviewing a draft of this post ❤️.
PS2: The Elixir Parallel Compiler is my next post on this topic.