Intermediate Representations, Toolchain and Internals in Erlang

Aarya Joshi

Published in

9 min readApr 1, 2019

This is our fourth post on Erlang. For our previous posts, please refer to below links:

Installation,Simple Program Execution and Package Manager

Data Types, Expressions and Abstraction

Debugging and Testing Support

Intermediate Representations

Every Programming Language, to be understood by the machine is translated to an “Intermediate Code / Intermediate Representation” by the Language Compiler or Interpreter. Basic features of IR (Intermediate Representation
) are accuracy: which is the ability to represent the source code without losing any information, and independence: the ability of the IR to work on any given machine.

Erlang uses SSA (Static Single Assignment) form in its IR. Some popular languages that also use SSA format (in some parts) are: Swift (designs its own SSA), ML, Microsoft Visual Studio 2015, Java Virtual Machine (extended SSA), etc. We try to explain some rules of the SSA form using examples below:

In the above program, one can see that line 1 of the code is useless, and can be omitted from the IR entirely. As its name suggests, the two main rules of the SSA form are that all variables must be assigned only once, and must be defined before it is used. Erlang Compiler uses Compiler Optimization techniques to achieve this IR.

History

Before implementing the SSA form for its IR, Erlang implemented a BEAM code till January 2018. This decision was make by the developers after realizing the limitations in optimization when using the BEAM code. Erlang IR is generated as:

Core Erlang

In Erlang, the ‘Erlang’ code is translated to ‘Core Erlang’ code. Core Erlang is a Programming Language, its functional and concurrent. An example function in Erlang:

This one line function ‘simplefunc’ has no arguments and returns a string “Result” as an output. This code is written in Erlang. This code will be translated to Core Erlang, which is very similar but much simpler than Erlang itself. Some key differences are that atoms always have quotes (Unlike in Erlang where unquoted atoms can be of different types), the name and implementation of the function is not the same (need to first declare the name of function, and then associate this name to the implementation of a function).

A translation of the above program to Core Erlang would look like:

‘simplefunc’/0 = fun() -> ‘Result’

Variable names typically start with a ‘_’ in Core Erlang. While this is not a rule (variable name restrictions are same for Core Erlang as Erlang), it is simply an inherited feature. So when translating a program, Erlang Compiler assigns new variable names. A program that looks like this in Erlang:

getNumber(Num) -> Num.

Would look something like this in translated Core Erlang:

'getNumber'/1 = fun(_@c0) -> _@c0

Even though the following code is syntactically correct:

'getNumber'/1 = fun(Num) -> Num

Why use Core Erlang?

As mentioned earlier, Core Erlang is simpler (for the Compiler), and thus more suitable for Code analyzing tools for optimization. While translating to Core Erlang may seem like an extra task for the compiler, it actually makes optimization of the entire program simpler, hence more accurate. It is a tool that allows the programmer to write code in readable format.

Toolchain

A toolchain is a group of programming tools that are used to develop a complex software product which is typically another computer program. The tools forming toolchains are executed one after the other so that the output state of the of each tool becomes the input or input environment for the next one. A toolchain consists of a compiler and linker, libraries, debugger.

In order to develop a software product in Erlang using rebar3 toolchain we have to follow the below steps:

Pick the right type of project:

Setting up Dependencies:
The basic configuration of a project must do at least the below things:

1. Always keep track of the rebar.lock file.
2. Ignore the _build directory.

Lock file tracking helps to have repeatable builds and allows rebar3 to automatically re-update dependencies whenever switching branches.

Adding Dependencies: Next thing is to add dependencies in the project. But adding dependencies does not integrate it to the project.
But the {deps,[…]} configuration tells rebar3 to download which dependencies and which to track.
We must configure our system to use these dependencies.

Upgrading the dependencies: In order to upgrade the dependencies, we follow the below steps.
1. Updating the index cache.
2. Updating the dependency.

Create Aliases for common tasks: Complex projects require more tools to run. For example, we may need to run xref to find dead code,
dialyzer for type analysis, ct for Common Test suites, cover for coverage analysis.

Instead of calling the multiple tasks one by one we can create an alias to run these multiple tasks.

{alias, [
{check, [xref, dialyzer, edoc,
{proper, “ — regressions”},
{proper, “-c”}, {ct, “-c”}, {cover, “-v — min_coverage=80”}]}
]}.

Recommended Configuration for various other tools: Some of the rebar3 configurations and default settings can restrict some of the functionalities
of other tools, so in order to avoid the risk of breaking the project that relied on the certain configurations we can have a collection of few
configurations that can be used as a new default for starting a new project.

{dialyzer, [
{warnings, [
%% Warn about undefined types and unknown functions
unknown
]}
]}.

{xref_checks,[
%% enable most checks, but avoid ‘unused calls’ which is often
%% very verbose
undefined_function_calls, undefined_functions, locals_not_used,
deprecated_function_calls, deprecated_functions
]}.

{profiles, [
{test, [
%% Avoid warnings when test suites use `-compile(export_all)`
{erl_opts, [nowarn_export_all]}
]}
]}.

Internals

It is important to know the internals of any programming language to improve its performance and to understand how it really works or to build our own runtime environment.

The Erlang RunTime system (ERTS) is a complex system which consists of many interdependent components. It is written in a very portable way and hence can run on anything from a normal computer to a largest multicore system having terabytes of memory. To optimize the performance of such a system for our application, knowing our application is not enough. We also need to have a thorough understanding of ERTS as well.

Once we gain some good knowledge on how ERTS works, we will be able to understand how our application behaves when running on ERTS and will be able to find and fix the performance related problems of our application.

Let us now look at some basic Erlang concepts:

Erlang is called a concurrency-oriented language. So, to understand how Erlang system works, we need to understand the concurrency model of Erlang. Erlang uses processes to achieve concurrency. Erlangs processes are similar to other OS processes. They execute in parallel and communicate through signals. But in practice Erlangs processes are much more light weight that other OS processes.

Introducing and Understanding Erlang Runtime System (ERTS)

Erlang is designed to build fault tolerant and distributed systems which can contain large number of concurrent processes. The Erlangs development environment contains the following building blocks: The Erlang runtime system; an integrated, window-based interface for program development and the application development tools.

The Erlang Runtime system is made up of the Erlang virtual machine, the kernel and the standard library.

Erlang Virtual Machine

Erlang Virtual machine runs on top of a host operating system. BEAM is the Erlangs virtual machine. It is used to execute erlang code just like JVM is used to execute Java code. Beam runs in an erlang node. The Erlang virtual machine provides following support to Erlang programs like memory allocation and real time garbage collection, location and encapsulation of run-time errors, light-weight concurrency and support for thousands of simultaneous tasks. We can say that ERTS is an implementation of a general concept of Erlang Runtime System and BEAM is an implementation of a more general Erlang Virtual Machine.

Kernel

It is the first application to be started. It provides some low-level services necessary for an Erlang system to start. It provides some services to handle errors, to participate in a distributed system and to perform IO operations.

Standard Library

Erlang’s standard library consists of a large number of re-usable software modules which help the Erlang system developers to a great extent. Many of these modules are specially adapted to program concurrent, distributed systems.

Components of ERTS

Erlang Node

When you start an Erlang application, what you really start is an Erlang node. The node is the one which runs the Erlang Runtime System and Virtual machine. The application code runs in an Erlang node and all the layers of the node will affect the performance of the application. As per the Object-Oriented terminology, one can say that Erlang node is an object of Erlang runtime system class. The execution of Erlang code is done within the node.

Layers in the Execution Environment

The application or the program you run will run on one or more nodes and the performance of the program depends not only on the application code but also on the layers below your code on the ERTS stack.

Let us now look at each layer of the stack and see how we can tune them to our application needs.

The bottom layer is the hardware on which we are running the application. The easiest way to improve the performance of the application is to run it on a better software. The next layer is the OS. Most of the developments are being made on Linux and OS X, so we can expect a better performance on these platforms. The third layer in the stack is the Erlang Runtime system ERTS and the fourth layer is Erlang Virtual Machine Beam. The fifth layer is the Open Telecom Platform which supplies the Erlang Standard Libraries. And the last layer is our application and any third-party libraries that we use. The application can use all the functionality provided by the below layers.

Erlang Compiler

Erlang Compiler compiles the Erlang source code from .erl to .beam files. BEAM is the virtual machine used to execute Erlang Code. The compiler itself is written in Erlang.

Processes

Erlang Process is similar to an OS process. Every process will have its own memory and a process control block which stores the information about the process. Each erlang node can have multiple processes and each of them communicate through messages and signals.

Scheduling

Scheduler chooses which Erlang Process to execute. Scheduler will have two queues: a ready queue which consists of the processes which are ready to run and a waiting queue which contains the processes which are waiting to receive a message. Scheduler picks a process from the top of the ready queue and hands it to BEAM for execution for one time slice. Once the time slice is used up, BEAM add the process to the end of the ready queue. If the process is blocked during the time slice, it is then added to the waiting queue.

Erlang Tag Scheme

Erlang uses a tagging scheme to provide the runtime system a way to keep track of the type of each data object. These tags are used for primitive operations, for pattern matching and even used by the garbage collector.

Memory Handling

Memory allocation and deallocation happens automatically using automatic memory management. Every process will have a heap and a stack which can grow and shrink as needed. Virtual machine uses Garbage collection to reclaim the free heap space if a process runs out of heap space.

Authors of Group 13: