Implementing Connection Draining for Phoenix (or any library that uses Ranch!)

This is a problem that many of us are familiar with. We would like to deploy our programs, and deploying a new version requires turning off the old version (unless you are using hot code swapping, but that’s another kettle of fish). So you send a SIGTERM to your server, and since everything works well, it stops accepting new requests, and waits for some period of time to give the in flight requests a chance to finish. (Right?)

But this isn’t what happens. As you’ve probably already discovered, your server just stops, and all requests are terminated immediately. Not the soft landing we were hoping for. As the title of this article has already promised you, there is a solution.


Before we get into that solution, let’s just do a quick dive into the technologies upon which Phoenix is built. Phoenix, as you may know, is built on Plug. Plug has adapters for other libraries, but in the most common configuration, it’s built on Cowboy, which is an HTTP server. So both Plug and Phoenix are, for our purposes, sugar on top of Cowboy.

Cowboy itself relies on a library called Ranch (I think we can all appreciate the pun) to manage a socket acceptor pool. Ranch is the last stop before we get to the OS. Or, put another way, Ranch is the first stop a request has to make on its way into our web server.

If we could tell Ranch to stop accepting new requests, and then wait until the number of active connections drops to 0, we would have connection draining.

Now on to the solution! First, we will need a couple of functions from Ranch:

:ranch.suspend_listener/1 will tell Ranch to stop accepting new requests.

:ranch.wait_for_connections/3 will wait until there are as many connections as we specify.

If we combine these functions, we get this:

:ranch.suspend_listener(ranch_ref)
:ranch.wait_for_connections(ranch_ref, :==, 0)

Now we have the kernel of a solution. This snippet of code will tell Ranch to stop accepting new requests, and then wait until there are no active requests anymore. At that point in time, it will be safe to stop the server.

Next problem: how to get this code to execute when our program is shutting down. For this, we can use OTP to get the behaviour we want.

First, we need to learn a bit about how OTP’s Supervisor handles startup and shutdown. Supervisor starts its children in the order in which they are defined. It also shuts them down in reverse order! So if we want to execute some code while shutting down, and delay shutting down our Phoenix endpoint, we can put a child process after our Phoenix endpoint and add some code to the terminate callback.

Second, we probably don’t want to wait forever for our remaining http requests to finish up. By defining shutdown in the child spec of our process, we can configure how long the supervisor will wait for the process to exit (and therefore how long we will wait for our remaining requests to finish).

Let’s put this all together:

defmodule RanchConnectionDrainer do
use GenServer

def start_link(ranch_ref) do
GenServer.start_link(__MODULE__, ranch_ref)
end

def init(ranch_ref) do
Process.flag(:trap_exit, true)
{:ok, ranch_ref}
end

def terminate(_reason, ranch_ref) do
:ok = :ranch.suspend_listener(ranch_ref)
:ok = :ranch.wait_for_connections(ranch_ref, :==, 0)
end
end

We can add this to our application supervisor right underneath our Phoenix endpoint:

children = [
MyPhoenix.Endpoint,
%{
id: RanchConnectionDrainer,
start: {RanchConnectionDrainer, :start_link, [MyPhoenix.Endpoint.HTTP]},
shutdown: 10_000
}
]
Supervsior.init(children, opts)

Nobody likes writing out child specs like that in their supervisors, so let’s add RanchConnectionDrainer.child_spec/1 , and have it check to make sure we actually specified a shutdown parameter.

defmodule RanchConnectionDrainer do
def child_spec(options) when is_list(options) do
ranch_ref = Keyword.fetch!(options, :ranch_ref)
shutdown = Keyword.fetch!(options, :shutdown)
%{
id: __MODULE__,
start: {__MODULE__, :start_link, [ranch_ref]},
shutdown: shutdown
}
end
end

Now we can simply use {RanchConnectionDrainer, shutdown: 10_000, ranch_ref: MyPhoenix.Endpoint.HTTP} in our list of children. For completeness, that would look like this:

children = [
MyPhoenix.Endpoint,
{RanchConnectionDrainer, shutdown: 10_000, ranch_ref: MyPhoenix.Endpoint.HTTP}
]
Supervisor.init(children, opts)

Before you start copy + pasting code out of this article, I have wrapped this up in a tidy little package for use in your projects! You can find it here on GitHub, or hex.pm.

Or, for the truly lazy among us, simply add this line to your mix deps: {:ranch_connection_drainer, "~> 0.1"}, and don’t forget to add RanchConnectionDrainer to your supervisor!

That’s it for now! I hope you learned a little about how Phoenix works under the hood, and most importantly, found a solution for your problem!