Getting started with OTP: creating psycho families !

OTP or Open Telecommunications Platform is what makes Erlang — “Erlang” (there are a lot of other thing too but I just wanted to sound dramatic !). It is such an integral part of Erlang that whenever people talk or write about Erlang they usually use Erlang/OTP. So you might be wondering what is this OTP thing. OTP is nothing but a framework for creating servers and process hierarchies. Just like you have web frameworks like django, ruby on rails for creating websites, Erlang has OTP framework for creating servers and process trees. This post will describe what process trees are, why is this OTP thing so cool and also describe why OTP creates psycho families !

OTP framework is designed to create psycho process families !

In the following two sections I will discuss about message passing in Erlang and process linking. If you are familiar with these then you can jump to juicy stuff right away i.e. OTP section.

Processes and message passing

We begin by firstly understanding how to create a process in Erlang and send messages to it. Consider the following code,

-module(proc_msg).
-export([echo/0]).
echo() ->
receive
{msg, Msg} -> io:format("~p received ~p~n", [self(), Msg])
end.

In the module proc_msg.erl we have defined echo/0 function which has a receive statement. When echo/0 is executed it will halt on the receive statement and wait until it receives a message. Once it gets a message it stores that messages in “Msg” and prints it to the shell (using io:format/2) and exits after that. In general receive has the following expression,

receive
Pattern1 [when GuardSeq1] ->
Body1;
...;
PatternN [when GuardSeqN] ->
BodyN
after
Timeout ->
BodyT
end

In the above we can see that a received message can be pattern matched and also checked with guards. Also note that we can also specify a timeout using the after, and in “BodyT” we can decide what to do in case of timeout. Following is a sample usage and an example usage is as follows,

pos_neg_checker() ->
receive
{msg1, Number} when Number < 0 ->
io:format("Got a negative number");
{msg1, Number} when Number > 0 ->
io:format("Got a positive number")
    after
5000 ->
io:format("Tired of waiting for the number ! exiting")
end.

Now we go back to our module and execute use our module and spawn a process,

1> Pid = spawn(proc_msg, echo, []).
<0.42.0>
2> Pid ! {msg, hello}
<0.42.0> got hello
  • In the first line we use spawn/3 to spawn our function. The spawn/3 function does nothing but execute our echo/0 function as a separate process. Since spawn executes the echo/0 function, echo/0 will halt at receive statement to wait for any incoming message.
  • In the second line we send message to “Pid” i.e. to our spawned echo/0 function. Notice that the moment we send this message get we an output on shell. Recall I previously stated that echo/0 was waiting on receive statement so when it receives the message it will execute ahead.

Link and Monitor

Consider a situation where you might want to restart a spawned process if it crashes. In order to accomplish this you will need to observer this spawned process and see if it crashes, but how do we do that ? have a look below,

-module(proc_link).
-export([only_echo/0]).
only_echo() ->
receive
echo -> io:format("~p received echo~n", [self()]);
_ -> exit("received unexpected message")
end.

Following is how we observer it,

1> Pid = spawn(proc_msg, only_echo, []).
<0.46.0>
2> link(Pid).
true
3> self().
<0.34.0>
4> Pid ! echo.
<0.46.0> received echo
echo
5> Pid2 = spawn(proc_msg, only_echo, []).
<0.50.0>
6> link(Pid2).
true
7> self().
<0.34.0>
8> Pid ! something_else.
** exception exit: "received unexpected message"
9> self().
<0.55.0>
  • In the first line we spawn the only_echo/0 function and store its process id in “Pid”.
  • Next we observe the “Pid” using inbuilt link/1 function. The process that executes link/1 is the one that observes the “Pid”. Since link/1 is executed by the shell, so shell is observing our process.
  • Next we list the process id of shell using the self/0 function.
  • In the 4th we send our process an “echo” message and it receives it and print it to the shell and exits normally.
  • We again spawn our only_echo/0 function and store its process id in “Pid2”.
  • We link the shell to observer “Pid2”.
  • Then we again list the pid of the shell, which is same to the value that we got previously.
  • In the 8th line we send a message to the “Pid2”. Recall that our process (i.e. only_echo/0) only accepts “echo” message, incase it receives any other message it crashes. Also recall that our shell was linked to our process so it crashed too ! In order to verify that the shell crashed we check the process id of the shell again. This time it's “<0.55.0>” and previously it was “<0.34.0>”. So what this means is that our shell crashed and it was restarted by somebody (don't worry about who restarted it) and the new shell has the pid “<0.55.0>”.

The lesson learnt here is that when we observe a process using link/0 and if observed process crashes then the observing process crashes too ! But what if you don't want the observing (for eg. shell) to crash even when the observed process crashes ? Following is one way to tackle this,

1> process_flag(trap_exit, true).
true
2> Pid = spawn_link(proc_msg, only_echo, []).
<0.49.0>
3> self().
<0.34.0>

3> Pid ! something_else.
somthing_else
4> flush().
Shell got {'EXIT',<0.43.0>,"received unexpected message"}
ok
5> self().
<0.34.0>
  • In the first line we use inbuilt function process_flag/2 to trap ‘EXIT’ messages the shell receives. When a process crashes it sends out ‘EXIT’ message to all the processes which are linked to it and if the linked processes don't trap or handle this message they will die too. So with process_flag/2 we trap this ‘EXIT’ message and prevent the shell from crashing.
  • spawn_link/3 is an inbuilt function that spawns a process and links to it.
  • flush/0 prints out all the messages in the message box of shell. We can see that our process had crashed and sent it out ‘EXIT’ message to the linked process i.e. shell.

We can also avoid the observing process from crashing by making use of monitors.

OTP

As previously discussed OTP is a powerful framework which allows you to easily create process trees or hierarchies. OTP can be best explained with the analogy of a family which has children, parents, grandparents and so on each of which play a different role in the family, for eg. children are expected to their homework, parents are supposed to keep an eye on their children or supervise them and if ever the parents go rouge grandparents are there to handle/supervise them and it goes on like this. Similar to this OTP captures these roles in a family in the form of behaviours like supervisor, gen_server, gen_fsm, gen_event. Here gen_server, gen_fsm and gen_event are worker behaviours and supervisor, as the name suggests, is the supervising behaviour.

Supervisor

A supervisor process is usually responsible for creating/spawning child processes which can be supervisors or worker processes. Have a look below,

-module(ch_sup).
-behaviour(supervisor).

-export([start_link/0]).
-export([init/1]).

start_link() ->
supervisor:start_link(ch_sup, []).

init(_Args) ->
SupFlags = #{strategy => one_for_all, intensity => 5, period => 3},
ChildSpecs = [#{id => ch3,
start => {ch3, start_link, []},
restart => permanent,
shutdown => brutal_kill,
type => worker,
modules => [cg3]}],
{ok, {SupFlags, ChildSpecs}}.
  • Notice in second line we specify the behaviour for this module using the “-behaviour(Name)” statement.
  • The module exports two functions i.e. start_link/1 and init/1
  • We can start the supervisor by calling “ch_sup:start_link/0” which internally calls start_link/2 from supervisor module to which we pass the module name as the first argument. This start_link/2 function spawns the supervisor process and creates a link to it. Once the process is spawned successfully the supervisor:start_link/2 automatically calls the init/1 function from the module which called supervisor:start_link/2
  • In the init/1 function of our supervisor we start the child processes. A child process is started by passing the child specifications which can be thought of as the qualities the parent wants in their children ! So our supervisor not only gets to produce children but also have the desired qualities in them. You see it's good is to have children in OTP tribe !
  • The “ChildSpecs” is a list of child specifications where for each specification the supervisor starts a child. Our supervisor has only one child, so let’s have a close look at this specification
#{id => ch3,
start => {ch3, start_link, []},
restart => temporary,
shutdown => brutal_kill,
type => worker,
modules => [cg3]}
  • Notice that the specification is nothing but a map. Above maps shows all the fields for available for the child.
  • id is used to identify the child specification internally by the supervisor.
  • start defines the function call used to start the child process. It must be a module-function-arguments tuple {M,F,A} used as apply(M,F,A). The start function must create and link to the child process, and must return {ok,ChildPid} or {ok,ChildPid,Info} where Child is the pid of the child process and Info an arbitrary term which is ignored by the supervisor. We will discuss the child process in the next section. For now just remember that you need to specify {M,F,A} and the supervisor will call M:F(A1, A2 ..) to start the child.
  • restart defines what happens when a child process gets terminated. Going back to our family analogy, let's say that the parent is teaching its child to ride a bicycle. The restart strategy defines what will be the parents action in case the child falls of the bicycle (i.e. crashes) or in case the child successfully completes the lesson (i.e. normal exit) . There are three types of restart strategy i.e. permanent, temporary and transient. A permanent child process will always be restarted, a temporary child process will never be restarted and a transient child process will be restarted only if it terminates abnormally i.e if the child crashes it will be restarted. So depending on how much the supervisor loves its children it will choose a strategy. Our supervisor seems to be a bad parent ! i.e. it chooses transient strategy which means that even if the child falls of the bicycle (i.e. crashes) it won't help it (i.e. restart it).
  • shutdown defines how a child process shall be terminated. As strange as it may sound a parent in OTP tribe can kill its children ! That is why I said OTP creates psycho families, parents killing their own children. Here brutal_kill means that the child process will be unconditionally terminated using exit(ChildPid,kill). Other strategies can be read here.
  • type specifies if the child process is a supervisor or a worker.
  • Just take the modules as is for now, but you can surely refer here.
  • Supervisor also has supervisor flags which define its own behavior. We return the supervisor flags along with the specs in the init/1 function.
SupFlags = #{strategy => one_for_all, intensity => 5, period => 3},
  • The supervisor flag allows the user to define what the supervisor will do in case one of its child dies. There are 4 different restart strategies available, we use one_for_one which means that if a child dies then supervisor will try to start that child again. Since we don't want the supervisor to get into an infinite loop of child process terminations and restarts, a maximum restart intensity is defined using two integer values specified with the intensity and period keys in the above map. What the above mean is that if more than 5 restarts occur for the child process in a period of 3 seconds then the supervisor will terminate the child.
  • Notice the format of return value by the init/1 function in the module.
The lesson learnt here is that supervisors make psycho parents !

Workers

As discussed previously there are 3 types of worker behaviours out which we will discuss the most widely used behaviour gen_server. Recall from the previous section we created a supervisor with child spec which had,

start => {ch3, start_link, []}

So we will create a module named ch3 with gen_server behaviour,

-module(ch3).
-behaviour(gen_server).

-export([start_link/0]).
-export([alloc/1, free/1]).
-export([init/1,
handle_call/3,
handle_cast/2,
handle_info/2,
terminate/2]).

start_link() ->
gen_server:start_link({local, ch3}, ch3, [], []).
init(_Args) ->
{ok, #{channels => []}}.
alloc(Channel) ->
gen_server:call(ch3, {alloc, Channel}).

free(Ch) ->
gen_server:cast(ch3, {free, Ch}).


handle_call({alloc, Ch}, _From, #{channels := Chs} = State) ->
Chs_new = [Ch | Chs],
{reply, ok, State#{channels => Chs_new}}.

handle_cast({free, Ch}, #{channels := Chs} = State) ->
Chs2 = lists:filter(fun(X) ->
if X == Ch -> false;
true -> true
end
end, Chs),
{noreply, State#{channels => Chs2}}.
handle_info(_Info, State) ->
{noreply, State}.
terminate(_Reason, _State) ->
ok.
code_change(_OldVersion, Library, _Extra) -> {ok, Library}.
  • The initial lines of the module should feel similar after which we have start_link/0 which is used to start ch3 and it internally uses gen_server:start_link/4 which basically starts the gen_server process. Recall that in the supervisor we said that the supervisor must create a link to the child process, start_link/4 helps in doing that i.e. once the start_link/4 returns the supervisor will be connected to our gen_server process and will be monitoring it. We have our little hierarchy or process tree created where the supervisor is connected to the child and monitoring it !
  • Notice that first argument here is used specify the name of the spawned process, “{local, ch3}” means that the process name is ch3 and this name is local to this current running shell, meaning if you connect two
    Erlang shells then you won't be able to access this process on the other shell using the name ch3.
  • After gen_server:start_link/4 is successful it will automatically call init/1 which returns “{ok, #{channels => []}}” where the second element of the tuple is usually called the state of the process (we will talk more about this state thing down below).
  • After the above is completed our new is process is spawned and initialized. There are three ways one can send a message to a gen_server process, call, cast and info.
  • The call way of sending message is just like calling your friend and conveying him your message and then waiting for his answer. This is achieved using gen_server:call/2 which takes the name of the process and message as arguments. Once you execute the call it will send the a call message to the process specified by the name . The call message received by the gen_server process are handled by handle_call/3 function. The first argument to this function is the message itself, second argument is sender process’s id and third argument is the State, here recall this State is the second element in the tuple we returned in the init/1 function . In our module the alloc/1 function is used to allocate a channel which internally uses gen_server:call/2 which will end up invoking the handle_call/3. Notice the return value of the handle_call/3 in our module,
{reply, ok, State#{channels => Chs_New}}
  • The first element means that the tuple contains a reply, the second element is the reply and the third element is the updated state. The different types of returns from a handle_call/3 can be referred here.
  • In cast way of sending message, the sender sends the message and continues with his work i.e. sender does not wait for a reply. The cast messages are sent using gen_server:cast/2 which takes the same arguments as call/2. The cast messages are handled by handle_cast/2 where the first argument is the message itself and the second argument is the state. Notice the return value in handle_cast/2,
{noreply, State#{channels => Chs2}}
  • The first element says that there is no reply i.e. the sender will not receive any reply from this process and the second argument is the State. The different types of returns for handle_cast/2 can be referred here.
  • The handle_info/2 function handles any other messages that the process might receive apart from the call and cast messages. These are called outbound messages. Following is how you can send an outbound message,
Pid_of_GenServer ! message
  • The terminate/2 function is used by our gen_server process to gracefully shutdown itself. You can close files if you have opened any or perform any routine in order for the gen_server to shutdown gracefully.
  • Lastly, code_change/3 is used for hot code upgrade. As of now I advice you shouldn't concern yourself with this function.

With this I have covered the basic idea of what OTP is all about and discussed behaviours like supervisor and gen_server. I encourage the reader to go ahead and dig deep into OTP, following are some helpful links:

I you find the above discussion interesting and helpful let me know by sharing it !