Building Graceful Node Applications in Docker
So the good news is that you have a Node application that’s up and running and serving traffic. Congratulations! The bad news is that at some point, your application is going to do something it’s not supposed to. Maybe a database connection goes stale, the node runs out of memory, or a deploy forces your server to restart. The question you need to ask is what happens to any requests that are in the middle of being processed? Chances are, when your application dies unexpectedly, so do any in-flight requests.
Graceful exiting is a way of handling these situations by allowing the application to finish responding to requests and shut itself down before the process actually dies. Although it is relatively easy to add graceful shutdown to a Node application, the way that both Docker and npm
spawn child processes and handle signals leads to some unexpected differences between local and Dockerized implementations.
Graceless exit
To test this out, let’s create a very simple node application.
// package.json
{
"name": "simple_node_app",
"main": "server.js",
"scripts": {
"start": "node server.js"
},
"dependencies": {
"express": "^4.13.3"
}
}// server.js
'use strict';const express = require('express');const PORT = process.env.port || 8080;const app = express();app.get('/', function (req, res) {
res.send('Hello world\n');
});app.get('/wait', function (req, res) {
const timeout = 5;
console.log(`received request, waiting ${timeout} seconds`);
const delayedResponse = () => {
res.send('Hello belated world\n');
};
setTimeout(delayedResponse, timeout * 1000);
});app.listen(PORT);
As expected, when we run our application locally, it does not exit gracefully.
# Start the application
$ npm install && npm start> start simple_node_app
> node server.js
Make a request in another terminal
$ curl http://localhost:8080/wait
Send a SIGTERM signal to `npm` before the request has completed
# find the PID of the npm process
$ ps -falx | grep npm | grep -v grep
UID PID PPID CMD
502 68044 31496 npm
# send a SIGTERM (-15) signal to that process
$ kill -15 68044
See that npm
has terminated and the request aborts midway
$ npm start
> node server.jsRunning on http://localhost:8080
received request, waiting 5 seconds
Terminated: 15
$ curl http://localhost:8080/wait
curl: (52) Empty reply from server
Handle all the signals
To fix this, we need to add explicit signal handling in our server.js
file (signal handling borrowed from this great post by Grigoriy Chudnov)
const server = app.listen(PORT);// The signals we want to handle
// NOTE: although it is tempting, the SIGKILL signal (9) cannot be intercepted and handled
var signals = {
'SIGHUP': 1,
'SIGINT': 2,
'SIGTERM': 15
};// Do any necessary shutdown logic for our application here
const shutdown = (signal, value) => {
console.log("shutdown!");
server.close(() => {
console.log(`server stopped by ${signal} with value ${value}`);
process.exit(128 + value);
});
};// Create a listener for each of the signals that we want to handle
Object.keys(signals).forEach((signal) => {
process.on(signal, () => {
console.log(`process received a ${signal} signal`);
shutdown(signal, signals[signal]);
});
});
Now when we re-run our experiment from earlier, we can see that the server finishes processing the request and shuts down before exiting.
$ npm start> node server.jsRunning on http://localhost:8080
received request, waiting 5 seconds
process received a SIGTERM signal
shutdown!
sending response!
server stopped by SIGTERM with value 15
and the request completes
$ curl http://localhost:8080/wait
Hello belated world
Note: npm
actually complains here because it's not expecting node
to exit. However, since node is doing what it's supposed to do, this error isn't really meaningful.
npm ERR! simple_node_app@1.0.0 start: `node server.js`
npm ERR! Exit status 143
Dockerize all the things
Docker is a software containerization tool that offers the ability to efficiently package, deploy, and manage your application. I won’t go into all of the reasons that Docker is awesome, so just take my (and the internet’s) word for it.
Dockerizing our application is simple: we just add a Dockerfile
, and then build and run our new container.
# Dockerfile
FROM node:boron# Create app directory
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app# Install app dependencies
COPY package.json /usr/src/app/
RUN npm install --production --quiet# Bundle app source
COPY . /usr/src/appEXPOSE 8080CMD ["npm", "start"]
We can then build and run our Docker application
$ docker build -q -t grace . && docker run -p 1234:8080 --rm --name=grace grace> node server.js
Now we want to repeat our earlier experiment by sending a request to the dockerized application and shutting it down before the request completes. We do this by making a request to our new endpoint (Docker internally maps port 8080 to the external port 1234), and calling docker stop grace
(which sends a SIGTERM signal to the docker container named grace
).
$ curl http://localhost:1234/wait
curl: (52) Empty reply from server
What happened? Why are we seeing an aborted request, when the same code behaved gracefully when we ran it on our machine?
NPM guts
To understand this, let’s look a little deeper into how npm start
works.
When we run npm start
locally, it spawns the node process as a child. We know this because the Parent Process ID (PPID) of the node
process is the Process ID (PID) of the npm
process.
$ ps -falx | grep "node\|npm" | grep -v grep
UID PID PPID CMD
502 65378 31800 npm
502 65379 65378 node server.js
We can double-check that npm
only spawns one child process by searching for all processes within that Process Group ID (PGID)
$ ps xao uid,pid,ppid,pgid,comm | grep 65378
UID PID PPID PGID CMD
502 65378 31800 65378 npm
502 65379 65378 65378 node
However, we see something a bit different when we check the process graph on our docker container.
$ ps falx
UID PID PPID COMMAND
0 1 0 npm
0 16 1 sh -c node server.js
0 17 16 \_ node server.js
In Docker, the npm
process shawns a shell
process, which then spawns the node
process. This means that npm
does not spawn the node
process as a direct child.
Let’s investigate whether this discrepency is due to the way Docker executes CMD ["npm", "start"]
or a more general quirk of npm
running in Docker. To do this, we're going to ssh into a running Docker container and run npm start
manually to see how it spawns child processes.
# Add an extra port mapping to our container so that we can run two node servers
$ docker run -p 1234:8080 -p 5678:5000 --rm --name=grace grace# SSH into the container in another terminal and check the currently-running processes
$ docker exec -it grace /bin/sh
$ ps falx
UID PID PPID COMMAND
0 1 0 npm
0 15 1 sh -c node server.js
0 16 15 \_ node server.js# Start up a second node server on a different port
$ port=5000 npm start
> node server.js
Running on http://localhost:5000
Now we ssh into the container in another terminal and check to see what the process graph looks like
$ docker exec -it grace /bin/sh
$ ps falx
UID PID PPID COMMAND
0 22 0 /bin/sh
0 46 22 \_ npm
0 56 46 \_ sh -c node server.js
0 57 56 \_ node server.js
0 1 0 npm
0 15 1 sh -c node server.js
0 16 15 \_ node server.js
Here we can see that, regardless of how npm start
is called, it always spawns a shell
process, which then spawns a node
process. This is different than how npm
behaves locally, where it spawns the node
process directly.
The great signal hand-off
I’m not sure why npm
behaves differently in these two scenarios, but this spawning discrepancy is my best guess as to why the same code exits gracefully locally, but flops and dies in Docker. npm
knows how to pass signals to its child process, but the shell process doesn't know how to pass signals to its child node
process.
Note: there have been a number of great posts written about the way that the main docker process passes (or fails to pass) signals when it’s run as PID 1, including this one by Grigoriy Chudnov, this one by Brian DeHamer, and this one by Yelp. Many people have written solutions for this problem, including Yelp’s dumb-init library, the tini library and the docker run --init
flag. However, it's important to note that the problem we're seeing centers around the way that npm
passes signals, regardless of whether it's run as PID 1 or not.
The solution to this signal-passing problem is disappointingly simple: run node server.js
directly from the Dockerfile instead of npm start
.
# Dockerfile
EXPOSE 8080CMD ["node", "server.js"]
This is unfortunate, since npm start
offers a uniform entry point for starting your application. This is particularly useful if you're passing a number of flags or options to node
, but I think the ability to exit gracefully trumps the niceties of being able to call a single start
command.
With this change, we can see that calling docker stop
propogates the signal to the node process so that it can respond and shut down gracefully before exiting.
$ docker build -q --no-cache -t grace . && docker run -p 1234:8080 --rm --name=grace grace
Running on http://localhost:8080
received request, waiting 5 seconds
process received a SIGTERM signal
shutdown!
sending response!
server stopped by SIGTERM with value 15
Our request completes as expected.
$ curl http://localhost:1234/wait
Hello belated world
Really really long requests
You might notice something strange if you have a particularly long-running request.
app.get('/wait', function (req, res) {
// increase the timeout
const timeout = 15;
console.log(`received request, waiting ${timeout} seconds`);
const delayedResponse = () => {
console.log("sending response!");
res.send('Hello belated world\n');
};
setTimeout(delayedResponse, timeout * 1000);
});
When we repeat our usual setup of running the docker container, making a request, and then stopping the container, we’re back to gracelessness.
$ docker build -q --no-cache -t grace . && docker run -p 1234:8080 --rm --name=grace --init grace
Running on http://localhost:8080
received request, waiting 15 seconds
process received a SIGTERM signal
shutdown!
$ curl http://localhost:1234/wait
curl: (52) Empty reply from server
The request aborts because docker stop
has a default timeout of 10 seconds before it gives up and sends a kill signal. Recall that SIGKILL signals cannot be caught or ignored, which means we cannot exit gracefully once one has been sent. However, docker stop
does have a --time, -t
flag that you can use to increase the amount of time before the container gets killed. This is useful if you expect a single request to take more than 10 seconds.
A graceful conclusion
It is extremely important for a web application to be able to exit gracefully so that it may perform any cleanup work and finish serving in-flight request. This is easy to do in Node applications by adding explicit signal handling to the node
process; however, this may not be sufficient for Dockerized applications because of the way processes spawn children and pass signals.
Any intermediary that is used to start node
, such as a shell script or npm
, may not be able to pass signals down to the actual node
process. Because of this, it is better to start node
directly from the Dockerfile, so that it can receive signals properly. In addition, because docker sends a KILL signal after a timeout following docker stop
, applications with long-running requests may need to increase this timeout to allow requests to complete before the application shuts down.