Call Me… Maybe?
Terminating a relationship is never easy.
This story is about something that’s perfectly documented, but still a bit surprising for newbies and experienced erlangers alike. The question is when will gen_server's
Module:terminate/2 be evaluated?
When you learn how to write gen_servers in Erlang, you eventually learn about the (now optional) callback
terminate/2. That callback is described as the opposite of
init/2 in the sense that, while
init/2 is used to initialize your server and setup everything that it needs to run,
terminate/2 is called when the server dies and therefore it’s used to clean up and tear down what was started/created on
To give you an example, let’s create a very simple server…
As you can see there, the server logs something on
init/2 and then logs something else when
terminate/2 is called. Let’s see it in action…
=INFO REPORT==== 25-Sep-2017::23:08:00 ===
Server <0.1986.0> starting.
=INFO REPORT==== 25-Sep-2017::23:08:05 ===
Server <0.1986.0> terminating with reason normal
Perfect. So far, everything works as expected.
But gen_servers are usually not alone, they tend to live in supervision trees. So, let’s add a supervisor!
I’m using a small trick here: Since
supervisor:stop/1 does not exist I’m using
gen_server:stop/1 to stop the supervisor because I know supervisors are implemented as gen_servers. Trust me: it doesn’t affect anything I’m about to show you… or don’t trust me and stop your supervisor in your own way ;)
In any case, let’s see what happens when we run this…
=INFO REPORT==== 25-Sep-2017::23:26:00 ===
Server <0.3453.0> starting.
Where is our terminate message? As you can see,
terminate/2 is not evaluated anymore.
What’s going on here?
As I stated in the intro, this time the documentation is pretty clear about what’s happening:
gen_serverprocess is part of a supervision tree and is ordered by its supervisor to terminate, this function is called with
Reason=shutdownif the following conditions apply:
gen_serverprocess has been set to trap exit signals.
· The shutdown strategy as defined in the child specification of the supervisor is an integer time-out value, not
Notice what I marked in bold: If your server is in a supervision tree, for
terminate/2 to be evaluated, it needs to be trapping exits.
Let’s try that for ourselves…
process_flag(trap_exit, true) to
init/1. If we recompile and start/stop the supervisor in the console now…
=INFO REPORT==== 25-Sep-2017::23:41:49 ===
Server <0.3711.0> starting.
=INFO REPORT==== 25-Sep-2017::23:41:51 ===
Server <0.3711.0> terminating with reason shutdown
There you have it. But now the question is why? Why do we need to trap exits for
terminate/2 to be evaluated?
And the answer to that question is a bit related to what happened with our old friend The Unstoppable Exception: Exit signals travel faster than function evaluations. When a supervisor dies the mechanism it uses to terminate its children is based on good old exit signals. The dying supervisor uses
exit/2 to send exit signals to (or just kill, if
brutal_kill is its strategy) all its children.
When processes are not trapping exit signals (and gen_servers are not, by default), as soon as they receive one with a reason other than
normal, they die (i.e. there is no way the process will evaluate
terminate/2 or any other function before its gone for good). That’s why, the only way to allow gen_server to evaluate
terminate/2 is to trap exit signals and let it deal with them appropriately.
A Warning Note
But be careful: if a process is trapping exit signals, those are converted to messages and added to the process message queue. That means, following gen_server logic, the message will only be read after all other messages were processed. In other words, the server won’t terminate instantaneously. If your supervisor has a shutdown timeout defined for your server that is smaller than what it takes for the server to process all messages before the one that corresponds to the exit signal, then it will be brutally killed and therefore
terminate/2 will not be evaluated.
Want to see it for yourself?
Notice how I added a long running function to
handle_cast/2 so that I can get the server to be busy for longer than the 5000 default
shutdown that our supervisor is using. Now let’s see what happens…
=INFO REPORT==== 26-Sep-2017::00:04:13 ===
Server <0.3972.0> starting.
*DBG* maybe_server got cast something
terminate/2 is not evaluated.
terminate/2 worth implementing at all, now that it’s an optional callback? What do you think? Let me know in the comments below.
OffTopic Shameless Plug
On October 14th I’ll giving a talk about the stuff I write on this blog (but in Spanish :P) at EmprenDevs. So, if you happen to be around Rosario that day and you want to listen to me and other argentinian devs/entrepreneurs, register yourself at the website or follow the conference team on twitter or facebook for more information.