Call Me… Maybe?

Terminating a relationship is never easy.

This story is about something that’s perfectly documented, but still a bit surprising for newbies and experienced erlangers alike. The question is when will gen_server's Module:terminate/2 be evaluated?

A really catchy song, I must say

The Server

When you learn how to write gen_servers in Erlang, you eventually learn about the (now optional) callback terminate/2. That callback is described as the opposite of init/2 in the sense that, while init/2 is used to initialize your server and setup everything that it needs to run, terminate/2 is called when the server dies and therefore it’s used to clean up and tear down what was started/created on init/2.

To give you an example, let’s create a very simple server…

As you can see there, the server logs something on init/2 and then logs something else when terminate/2 is called. Let’s see it in action…

1> maybe_server:start_link().
=INFO REPORT==== 25-Sep-2017::23:08:00 ===
Server <0.1986.0> starting.
2> maybe_server:stop().
=INFO REPORT==== 25-Sep-2017::23:08:05 ===
Server <0.1986.0> terminating with reason normal

Perfect. So far, everything works as expected.

The Supervisor

But gen_servers are usually not alone, they tend to live in supervision trees. So, let’s add a supervisor!

I’m using a small trick here: Since supervisor:stop/1 does not exist I’m using gen_server:stop/1 to stop the supervisor because I know supervisors are implemented as gen_servers. Trust me: it doesn’t affect anything I’m about to show you… or don’t trust me and stop your supervisor in your own way ;)

In any case, let’s see what happens when we run this…

3> maybe_sup:start().
=INFO REPORT==== 25-Sep-2017::23:26:00 ===
Server <0.3453.0> starting.
4> maybe_sup:stop().
5> is_process_alive(<0.3452.0>).

Where is our terminate message? As you can see,terminate/2 is not evaluated anymore.

Where did it go?

What’s going on here?

As I stated in the intro, this time the documentation is pretty clear about what’s happening:

If the gen_server process is part of a supervision tree and is ordered by its supervisor to terminate, this function is called with Reason=shutdown if the following conditions apply:
· The gen_serverprocess has been set to trap exit signals.
· The shutdown strategy as defined in the child specification of the supervisor is an integer time-out value, not brutal_kill.

Notice what I marked in bold: If your server is in a supervision tree, for terminate/2 to be evaluated, it needs to be trapping exits.

Let’s try that for ourselves…

I added process_flag(trap_exit, true) to init/1. If we recompile and start/stop the supervisor in the console now…

6> maybe_sup:start().
=INFO REPORT==== 25-Sep-2017::23:41:49 ===
Server <0.3711.0> starting.
7> maybe_sup:stop().
=INFO REPORT==== 25-Sep-2017::23:41:51 ===
Server <0.3711.0> terminating with reason shutdown

There you have it. But now the question is why? Why do we need to trap exits for terminate/2 to be evaluated?

And the answer to that question is a bit related to what happened with our old friend The Unstoppable Exception: Exit signals travel faster than function evaluations. When a supervisor dies the mechanism it uses to terminate its children is based on good old exit signals. The dying supervisor uses exit/2 to send exit signals to (or just kill, if brutal_kill is its strategy) all its children.

When processes are not trapping exit signals (and gen_servers are not, by default), as soon as they receive one with a reason other than normal, they die (i.e. there is no way the process will evaluate terminate/2 or any other function before its gone for good). That’s why, the only way to allow gen_server to evaluate terminate/2 is to trap exit signals and let it deal with them appropriately.

A Warning Note

But be careful: if a process is trapping exit signals, those are converted to messages and added to the process message queue. That means, following gen_server logic, the message will only be read after all other messages were processed. In other words, the server won’t terminate instantaneously. If your supervisor has a shutdown timeout defined for your server that is smaller than what it takes for the server to process all messages before the one that corresponds to the exit signal, then it will be brutally killed and therefore terminate/2 will not be evaluated.

Want to see it for yourself?

Notice how I added a long running function to handle_cast/2 so that I can get the server to be busy for longer than the 5000 defaultshutdown that our supervisor is using. Now let’s see what happens…

9> maybe_sup:start().
=INFO REPORT==== 26-Sep-2017::00:04:13 ===
Server <0.3972.0> starting.
10> maybe_server:sleep().
*DBG* maybe_server got cast something
11> maybe_sup:stop().
12> is_process_alive(<0.3972.0>).

terminate/2 is not evaluated.

So… is terminate/2 worth implementing at all, now that it’s an optional callback? What do you think? Let me know in the comments below.

OffTopic Shameless Plug

On October 14th I’ll giving a talk about the stuff I write on this blog (but in Spanish :P) at EmprenDevs. So, if you happen to be around Rosario that day and you want to listen to me and other argentinian devs/entrepreneurs, register yourself at the website or follow the conference team on twitter or facebook for more information.