Part 6. Managing multiple Supervisors and a supervision tree for the key-value store — Elixir/OTP

Published in

Gamezop Tech

8 min readMar 1, 2020

This is part 6 of the 7 part series. Please read the previous articles to catch up on the topic.

In the previous article, we achieved fault tolerance by using supervisors in our process of storing values. Now further we need to isolate our error effects because currently no matter what process in our system crashes our Manager will be restarted which will restart our database and its workers again no matter if the error were not relating to them.

This problem can result in too many restarts and after some time the supervisor will stop restarting the server since there is a soft limit to that by default. Also, it is a good thing to keep errors as locally scoped as possible if you know about the scopes beforehand.

So what we will do is separate the parts which are loosely dependent on each other and give them their supervisors so that their restart won’t let the other process get restarted.

Let’s get started 🏎️

To minimize or isolate the error effects, we are going to pull out the calling of database start function from the managers init function and put it in the KeyVal.System’s init function. So now if the Database fails the manager won’t be restarted again and if the manager fails database won’t be restarted.

This change in structuring the application would give a system like shown in the image.

Now since there are a lot of moving parts in the application we need and a process may get restarted many times without our knowledge we need to find a way to store their PID’s because it is not necessary for the process to have the same PID after the PID. So a case can be where a process has been restarted and some other process is calling it using the old PID.

To add this functionality, we need to use something called a registry. A registry allows developers to look up one or more processes with a given key. If the registry has :unique keys, a key points to 0 or 1 processes. If the registry allows :duplicate keys, a single key may point to any number of processes. In both cases, different keys could identify the same process.

Each entry in the registry is associated with the process that has registered the key. If the process crashes, the keys associated with that process are automatically removed.

So, this would enable us to give names to our process such that if we have given some process name A with PID #PID<0.143.0> and the process is restarted I would still be able to call the process with the name A and in the backend, the process will now be mapped to the PID of the restarted process which can be something like #PID<0.142.0>. This enables the developer to now worry about storing PIDs.

After implementing our registry module our system design would look like this.

You can find out more about Registry here.

Now create a mix project by typing

mix new ex6_better_supervison_with_process_discovery

This command should create your project structure like this

Copy all the files from the previous exercise into the lib folder and also create a file called app.ex and registry.ex.

I will let you know why need an app file in a while.

Show me some code 👨‍💻

Since we will now manage the manager, database and the registry via a supervisor there is some extra code that we need to write around them for them to be compatible with the supervisor.

Also, introduce a new module called a registry module.

Here the registry module is a straightforward module that starts up a registry process under our supervisor the keys would be unique as given in Registry.start_link/2. Now, the via_tuple/1 function accepts a unique that can be supplied via any module that wants to register a process under a registry.

You may be thinking that how are we going to implement it in our current process. Let me show you.

As shown in this example you can pass the value returned by calling via_tuple/1 function as a name to your GenServer. All the GenServer wants is that the function should return a value in the shape of {:via, some_module, some_arg}. Such a tuple is also called a via tuple.

If you provide a via tuple as the name option, GenServer will invoke a well-defined function from some_module to register the process. Likewise, you can pass a via tuple as the first argument to GenServer.cast and GenServer.call, and GenServer will discover the PID using some_module. In this sense, some_module acts like a custom third- party process registry, and the via tuple are the way of connecting such a registry with GenServer and similar OTP abstractions.

Now since we are promoting our Database from being a GenServer to be a Supervisor of GenServer process we would have to remove the old code that used GenServer and hence we would have to provide a child_spec/2 since Supervisors expects a child_spec in a child for them to be able to start it.

Also, we are passing worker_ids of db_workers which can be picked up by the via_tuple to form a unique name when they are called. Since each worker will be supervised separately we need to provide child_spec for each one of them. The rest is would be same.

Now, the only thing that has changed in the db_worker file are the 4 functions that include a via_tuple call rest everything is the same. As explained earlier to register the workers PID we call a via_tuple function to get it registered against a specific name in the Registry so that in future we can refer to the worker using the name that we provided instead of the PID which was allotted to it which might have been changed if the worker has restarted. So, previously wherever we were using the PIDs we will be replacing it with the via_tuple call which would resolve the name to PID for us.

The manager file holds a big change because we have introduced the concept of a Dynamic Supervisor here. The Supervisor module was designed to handle mostly static children that are started in the given order when the supervisor starts. A Dynamic Supervisor starts with no children. Instead, children are started on-demand via start_child/2. When a dynamic supervisor terminates, all children are shut down at the same time, with no guarantee of ordering.

It’s quite simple when the manager is started we just boot up our supervisor process unlike in the database process. And when we receive a command to create a store we call the start_child function to dynamically spin up a GenServer for us and put it under the supervision of the manager.

Like in db_worker.ex we have will now replace all the functions that were using PIDs in server.ex with the via_tuple function calls which would resolve the PIDs for us when we provide it with the store name. Rest everything in the server.ex would be same as before.

Finally, We will start our system by specifying all the modules in the array provided to the init/2 function.

The final run 📟

Before starting with the final demo. Let me tell you something interesting about a special feature in elixir. Its how you can observe your application in a GUI.

You can see the GUI by calling:observer.start in your elixir session. But to see the KeyVal store in it you would need to register your application as an Elixir Application and you could do it in these simple steps.

Remember I asked you to create a file called app.ex in the previous section. Now, what you need to make the app.ex look like this.

Also, go to the mix.exs file and change the application function to look like this

def application do[extra_applications: [:logger], mod: {KeyVal.Contractor , []},]end

All these changes do is that whenever you will start your elixir session using mix. Your application would be automatically started as you have provided mix the module and a function to start your application.

To start the whole system, we first initiate an elixir session
by typing

iex -S mix

You will see that the Manager, Database and the Workers have been automatically started from the application specification. Now type in the :observer.start command to see the nice GUI.

Create a store and provide any name to it.

Go to the application tab in the GUI and then to the tab that mentions the name of your project folder. You will see something like this in the image.

This is how your app looks. You can easily make out what is a supervisor and what are the supervised processes. Now it’s the time to test the app you made. Go ahead and kill the server store that you created by right-clicking on it’s PID as shown in the image.

As you will kill the process the manager supervisor will start a new server instantly and you would get this message in your command prompt.

And in the GUI you would have another process spawned up that will take place of your killed server. You can see in the image that the new server has a different PID.

So with this, I conclude this post. It is the second last post of this complete series. In the next post, we will learn how to expose our app to the web.

I hope this post has helped you get a little bit better understanding of the whole process.

The complete source code of all the parts is here.

Part 6. Managing multiple Supervisors and a supervision tree for the key-value store — Elixir/OTP

Let’s get started 🏎️

Show me some code 👨‍💻

The final run 📟

References 📝

Written by Arpit Dubey