Erlang Toolchains and Intermediate representations

a brief summary of Erlang’s Intermediate representations, Toolchain and internals

Mustafa
12 min readApr 1, 2019

Now on the fourth part of Erlang in 2019 which you can read the first three parts from here

Part 1

Part 2

Part 3

Moving now on an important part of any programming language, which is Erlang’s Intermediate representations, Toolchain and internals.

Erlang has a very robust development chain, especially if you are an EMACS maven. There is an Erlang specific build system, there is robust support for packaging your application and its dependencies for deployment and don’t forget OTP.

As for tools, there is Dialyzer, real time tracing on running systems, hot code loading ( you can enable and disable or add logging to a running system without restarting it, for example ), remote code execution, there is so much to learn it is dizzying when you start out.

  • Editor: you can use whatever you want. I used emacs for my first year of erlang, but I’m currently using gedit.
  • Version Control: I like git. It seems that most of the erlang community agrees (most projects are hosted on github).
  • Workflow: I’d recommend getting familiar with rebar.

Here is an example of a rebar-flavored Makefile:

REBAR := ./rebar.PHONY: all deps doc test clean releaseall: deps
$(REBAR) compile
deps:
$(REBAR) get-deps
doc:
$(REBAR) doc skip_deps=true
test:
$(REBAR) eunit skip_deps=true
clean:
$(REBAR) clean
release: all test
dialyzer --src src/*.erl deps/*/src/*.erl

Here are some basic pointers:

  • Put your unit tests in the same modules as the code they are testing. See the rebar wiki for details.
  • Add {cover_enabled, true} to your rebar.config file. Every time you run make test you get a coverage report in HTML!
  • Add your project’s dependencies to your rebar.config and you can fetch and build them when you run make deps.
  • Make sure to comment your code with edoc. If you do, rebar can build all of your docs when your run make doc.

Setting up Dependencies

The basic configuration of a project should do at least two things:

  1. Always track the rebar.lock file
  2. Ignore the _build directory

Tracking the lock file will let you have repeatable builds, and will allow rebar3 to do things like automatically re-update dependencies when switching branches.

The _build directory should be safe to delete but you shouldn’t need to

Rebar3 tracks all the applications declared in your rebar.config files and should be able to track all required changes.

There are a few edge cases where this is not possible and may lead to weird bugs, specifically when you are changing your project structure. If you are moving from a single-app project to an umbrella project (i.e. all source files move from src/ to apps/myapp/src) or the opposite, there are chances that various artifacts in the _build directory will conflict with each other. Delete it and ask for a fresh build in this case.

The next thing you’ll want to do is add dependencies to your project. See the Dependenciessection for this. Adding dependencies does not automatically integrate them to your project, however.

The {deps, [...]} configuration value tells rebar3 which dependencies to fetch and download and track, but that's as far as it goes. You must then configure your application to make use of the dependency:

  • If the dependency is needed at runtime by your application in order for it to work (e.g. you need a web server or call the library directly), add it to your application’s .app.src file under the {applications, [stdlib, kernel, ...]} tuple. This will let the Erlang VM know not to boot your app without the dependency being present
  • If the dependency is only needed for releases (for example, observer or recon, which are debugging tools that your application likely does not depend on but you'd like to bundle it), then you need to app that application explicitly to the {releases, ...} configuration tuple. Any dependency in that tuple will have its transitive dependencies included as well.

Other build tools tend to make no distinction between these types of project inclusions, but rebar3 tries to be strict with regards to what should be included or not. It can let you make specific builds, such as escripts or test releases that won’t require specific toolchains like the Wx graphical toolkit in order to run.

GRISP Toolchain

Intermediate representations

Records

Records are, first of all, a hack. They are more or less an afterthought to the language and can have their share of inconveniences. I’ll cover that later. They’re still pretty useful whenever you have a small data structure where you want to access the attributes by name directly. As such, Erlang records are a lot like structs in C (if you know C.)

They’re declared as module attributes in the following manner:

-module(records).
-compile(export_all).
-record(robot, {name,
type=industrial,
hobbies,
details=[]}).

So here we have a record representing robots with 4 fields: name, type, hobbies and details. There is also a default value for type and details, industrial and [], respectively. Here's how to declare a record in the module records:

first_robot() ->
#robot{name="Mechatron",
type=handmade,
details=["Moved by a small man inside"]}.

And running the code:

1> c(records).
{ok,records}
2> records:first_robot().
{robot,"Mechatron",handmade,undefined,
["Moved by a small man inside"]}

Woops! Here comes the hack! Erlang records are just syntactic sugar on top of tuples. Fortunately, there’s a way to make it better. The Erlang shell has a command rr(Module) that lets you load record definitions from Module:

3> rr(records).
[robot]
4> records:first_robot().
#robot{name = "Mechatron",type = handmade,
hobbies = undefined,
details = ["Moved by a small man inside"]}

Ah there! This makes it much easier to work with records that way. You’ll notice that in first_robot/0, we had not defined the hobbies field and it had no default value in its declaration. Erlang, by defaults, sets the value to undefined for you.

To see the behavior of the defaults we set in the robot definition, let's compile the following function:

car_factory(CorpName) ->
#robot{name=CorpName, hobbies="building cars"}.

And run it:

5> c(records).
{ok,records}
6> records:car_factory("Jokeswagen").
#robot{name = "Jokeswagen",type = industrial,
hobbies = "building cars",details = []}

And we have an industrial robot that likes to spend time building cars.

Note: The function rr() can take more than a module name: it can take a wildcard (like rr("*")) and also a list as a second argument to specify which records to load.

There are a few other functions to deal with records in the shell: rd(Name, Definition) lets you define a record in a manner similar to the -record(Name, Definition) used in our module. You can use rf() to 'unload' all records, or rf(Name) or rf([Names]) to get rid of specific definitions.

You can use rl() to print all record definitions in a way you could copy-paste into the module or use rl(Name) or rl([Names]) to restrict it to specific records.

Finally, rp(Term) lets you convert a tuple to a record (given the definition exists)

Writing records alone won’t do much. We need a way to extract values from them. There are basically two ways to do this. The first one is with a special ‘dot syntax’. Assuming you have the record definition for robots loaded:

5> Crusher = #robot{name="Crusher", hobbies=["Crushing people","petting cats"]}. 
#robot{name = "Crusher",type = industrial,
hobbies = ["Crushing people","petting cats"],
details = []}
6> Crusher#robot.hobbies.
["Crushing people","petting cats"]

Ugh, not a pretty syntax. This is due to the nature of records as tuples. Because they’re just some kind of compiler trick, you have to keep keywords around defining what record goes with what variable, hence the #robot part of Crusher#robot.hobbies. It's sad, but there's no way out of it. Worse than that, nested records get pretty ugly:

7> NestedBot = #robot{details=#robot{name="erNest"}}.
#robot{name = undefined,type = industrial,
hobbies = undefined,
details = #robot{name = "erNest",type = industrial,
hobbies = undefined,details = []}}
8> (NestedBot#robot.details)#robot.name.
"erNest"

And yes, the parentheses are mandatory.

Update:
Starting with revision R14A, it is now possible to nest records without the parentheses. The NestedBot example above could also be written as NestedRobot#robot.details#robot.name and work the same.

To further show the dependence of records on tuples, see the following:

9> #robot.type.
3

What this outputs is which element of the underlying tuple it is.

One saving feature of records is the possibility to use them in function heads to pattern match and also in guards. Declare a new record as follows on top of the file, and then add the functions under:

-record(user, {id, name, group, age}).%% use pattern matching to filter
admin_panel(#user{name=Name, group=admin}) ->
Name ++ " is allowed!";
admin_panel(#user{name=Name}) ->
Name ++ " is not allowed".
%% can extend user without problem
adult_section(U = #user{}) when U#user.age >= 18 ->
%% Show stuff that can't be written in such a text
allowed;
adult_section(_) ->
%% redirect to sesame street site
forbidden.

The syntax to bind a variable to any field of a record is demonstrated in the admin_panel/1 function (it's possible to bind variables to more than one field). An important thing to note about the adult_section/1 function is that you need to do SomeVar = #some_record{} in order to bind the whole record to a variable. Then we do the compiling as usual:

10> c(records).
{ok,records}
11> rr(records).
[robot,user]
12> records:admin_panel(#user{id=1, name="ferd", group=admin, age=96}).
"ferd is allowed!"
13> records:admin_panel(#user{id=2, name="you", group=users, age=66}).
"you is not allowed"
14> records:adult_section(#user{id=21, name="Bill", group=users, age=72}).
allowed
15> records:adult_section(#user{id=22, name="Noah", group=users, age=13}).
forbidden

What this lets us see is how it is not necessary to match on all parts of the tuple or even know how many there are when writing the function: we can only match on the age or the group if that’s what’s needed and forget about all the rest of the structure. If we were to use a normal tuple, the function definition might need to look a bit like function({record, _, _, ICareAboutThis, _, _}) -> .... Then, whenever someone decides to add an element to the tuple, someone else (probably angry about it all) would need to go around and update all functions where that tuple is used.

Key-Value Stores

I’ve had you build a tree back a few chapters, and the use was to use it as a key-value store for an address book. That book sucked: we couldn’t delete or convert it to anything useful. It was a good demonstration of recursion, but not much more. Now is the time to introduce you to a bunch of useful data structures and modules to store data under a certain key. I won’t define what every function does nor go through all the modules. I will simply link to the doc pages. Consider me as someone responsible about ‘raising awareness about key-value stores in Erlang’ or something. Sounds like a good title. I just need one of these ribbons.

For small amounts of data, there are basically two data structures that can be used. The first one is called a proplist. A proplist is any list of tuples of the form [{Key,Value}]. They're a weird kind of structure because there is no other rule than that. In fact the rules are so relaxed that the list can also contain boolean values, integers and whatever you want. We're rather interested by the idea of a tuple with a key and a value in a list here, though. To work with proplists, you can use the proplists module. It contains functions such as proplists:delete/2, proplists:get_value/2, proplists:get_all_values/2, proplists:lookup/2 and proplists:lookup_all/2.

You’ll notice there is no function to add or update an element of the list. This shows how loosely defined proplists are as a data structure. To get these functionalities, you must cons your element manually ([NewElement|OldList]) and use functions such as lists:keyreplace/4. Using two modules for one small data structure is not the cleanest thing, but because proplists are so loosely defined, they're often used to deal with configuration lists, and general description of a given item. Proplists are not exactly complete data structures. They're more of a common pattern that appears when using lists and tuples to represent some object or item; the proplists module is a bit of a toolbox over such a pattern.

If you do want a more complete key-value store for small amounts of data, the orddict module is what you need. Orddicts (ordered dictionaries) are proplists with a taste for formality. Each key can be there once, the whole list is sorted for faster average lookup, etc. Common functions for the CRUD usage include orddict:store/3, orddict:find/2 (when you do not know whether the key is in the dictionaries), orddict:fetch/2 (when you know it is there or that it must be there) and orddict:erase/2.

I would therefore say that your application’s needs is what should govern which key-value store to choose. Different factors such as how much data you’ve got to store, what you need to do with it and whatnot all have their importance. Measure, profile and benchmark to make sure.

Arrays

But what about code that requires data structures with nothing but numeric keys? Well for that, there are arrays. They allow you to access elements with numerical indices and to fold over the whole structure while possibly ignoring undefined slots.

A Set of Sets

If you’ve ever studied set theory in whatever mathematics class you have an idea about what sets can do. If you haven’t, you might want to skip over this. However, I’ll just say that sets are groups of unique elements that you can compare and operate on: find which elements are in two groups, in none of them, only in one or the other, etc. There are more advanced operations letting you define relations and operate on these relations and much more. I’m not going to dive into the theory (again, it’s out of the scope of this book) so I’ll just describe them as it is.

There are 4 main modules to deal with sets in Erlang. This is a bit weird at first, but it makes more sense once you realize that it’s because it was agreed by implementers that there was no ‘best’ way to build a set. The four modules are ordsets, sets, gb_sets and sofs (sets of sets):

ordsetsOrdsets are implemented as a sorted list. They’re mainly useful for small sets, are the slowest kind of set, but they have the simplest and most readable representation of all sets. There are standard functions for them such as ordsets:new/0, ordsets:is_element/2, ordsets:add_element/2, ordsets:del_element/2, ordsets:union/1, ordsets:intersection/1, and a bunch more.

sets Sets (the module) is implemented on top of a structure really similar to the one used in dict. They implement the same interface as ordsets, but they're going to scale much better. Like dictionaries, they're especially good for read-intensive manipulations, like checking whether some element is part of the set or not.

gb_sets Gb_sets themselves are constructed above a General Balanced Tree structure similar to the one used in the gb_trees module. gb_sets are to sets what gb_tree is to dict; an implementation that is faster when considering operations different than reading, leaving you with more control. While gb_sets implement the same interface as sets and ordsets, they also add more functions. Like gb_trees, you have smart vs. naive functions, iterators, quick access to the smallest and largest values, etc.

sofs Sets of sets (sofs) are implemented with sorted lists, stuck inside a tuple with some metadata. They’re the module to use if you want to have full control over relationships between sets, families, enforce set types, etc. They’re really what you want if you need mathematics concept rather than ‘just’ groups of unique elements.

Directed Graphs

There is one other data structure that I want to mention here (not that there are not more than what’s mentioned in this chapter, on the contrary): directed graphs. Again, this data structure is more for readers who already know the mathematical theory that goes with it.

Directed graphs in Erlang are implemented as two modules, digraph and digraph_utils. The digraph module basically allows the construction and modification of a directed graph: manipulating edges and vertices, finding paths and cycles, etc. On the other hand, digraph_utils allows you to navigate a graph (postorder, preorder), testing for cycles, arborescences or trees, finding neighbors, and so on.

Because directed graphs are closely related to set theory, the ‘sofs’ module contains a few functions letting you convert families to digraphs and digraphs to families.

Queues

The queue module implements a double-ended FIFO (First In, First Out) queue:

They’re implemented a bit as illustrated above: two lists (in this context, stacks) that allow to both append and prepend elements rapidly.

The queue module basically has different functions in a mental separation into 3 interfaces (or APIs) of varying complexity, called ‘Original API’, ‘Extended API’ and ‘Okasaki API’:

  • The original API contains the functions at the base of the queue concept, including: new/0, for creating empty queues, in/2, for inserting new elements, out/1, for removing elements, and then functions to convert to lists, reverse the queue, look if a particular value is part of it, etc.
  • The extended API mainly adds some introspection power and flexibility: it lets you do things such as looking at the front of the queue without removing the first element (see get/1 or peek/1), removing elements without caring about them (drop/1), etc. These functions are not essential to the concept of queues, but they're still useful in general.
  • The Okasaki API is a bit weird. It’s derived from Chris Okasaki’s Purely Functional Data Structures. The API provides operations similar to what was available in the two previous APIs, but some of the function names are written backwards and the whole thing is relatively peculiar. Unless you do know you want this API, I wouldn’t bother with it.

You’ll generally want to use queues when you’ll need to ensure that the first item ordered is indeed the first one processed. So far, the examples I’ve shown mainly used lists as a accumulators that would then be reversed. In cases where you can’t just do all the reversing at once and elements are frequently added, the queue module is what you want (well, you should test and measure first! Always test and measure first!)

--

--