How to Implement Docker Health Checks

How to tell if your app is running happily as it should

Nassos Michas
Jan 22 · 6 min read
Photo by Edu Lauton on Unsplash

You Dockerfile is created, your image built, and you’re ready to roll out to prod. After a quick docker run, your container is up, and you’re eagerly waiting for your app to start serving requests.

You wait a few seconds … docker ps reports your container running, but nothing gets served. What happened?


The Half-Truth of Container Up

Let’s start by creating the simplest Docker container using the following Dockerfile:

FROM nginx:1.17.7

Build the image, and start a container:

docker build -t docker-health .
docker run --rm --name docker-health -p 8080:80 docker-health

An NGINX container is now running and listening on local port 8080. Go right ahead and test it by issuing curl localhost:8080, or just open http://localhost:8080 in your browser:

NGINX running in Docker (image by author)

The container is reported as being Up, running for a few seconds:

Docker processes running (image by author)

But what is Docker checking to report your container as running? Let’s examine the processes running inside the container:

Processes running inside the container (image by author)

The process started by the ENTRYPOINT or CMD of the Dockerfile runs as a
PID 1. PID is an acronym for process identification number and is automatically assigned to each process when it’s created. Each process has a unique PID on any Unix-like system. A process running as PID 1 inside a container is treated specially as it ignores any signal, such as SIGINT or SIGTERM, and won’t terminate unless it’s coded to do so.

As long as PID 1 is up and running, the Docker engine will keep reporting the container as being up and running too. Even if you pause the container via a docker pause, the container’s still reported as Up but with a (Paused) flag:

A paused container (image by author)

The Docker engine doesn’t really know, nor cares, what a containerised application is doing. But we do.

Let’s imagine our NGINX-based container was created to serve (among others) some static system-status.txt file with system-health information. This file might not be available immediately when the container is initially brought up, as we might create it somehow else. Let’s try to fetch this file shortly after having created the container:

The system-status health check file isn’t available just yet (image by author)

The file doesn’t exist, and we got back a 404 error. Does this mean the container is OK or not? After all, we did get a response back from NGINX.

Although a simple example with a static file is used here, your check could very well be, for example, about the availability of your application’s API. Considering that your containerised application might take several seconds to boot up and make its API available, your container will be reported as Up although nothing can be served just yet.

Similarly, if an application experiences a problem rendering its API endpoints inaccessible, the PID might still be running but no client will be able to interact with it. In fact, you end up with a zombie container that docker ps reports as running fine.

Wouldn’t it be nicer if Docker provided a way to perform custom checks other than the process check it already does? A check that we, application developers, could define in a custom way, appropriate in nature for the application running inside the container?

Let’s see how to do exactly that and define a custom health check.


Container Health Checks

The Docker engine, starting back in version 1.12, provides a way to define custom commands as health checks. Although this feature is available from mid-2016, it’s quite surprising to still see many Docker images not using it.

A custom health check is specified in a Dockerfile using the HEALTHCHECK directive. The check command is executed inside the container, so make sure it’s available. As for the type of check it performs, it can be anything you might think of as long as it returns an appropriate exit-status code. As per the official Docker documentation, those exit codes can be:

“0: success - the container is healthy and ready for use
1: unhealthy - the container is not working correctly
2: reserved - do not use this exit code”

In practice, any status code greater than 0 seems to be working fine, indicating a failing container.

Let’s try to rewrite the Dockerfile used on the example above, this time implementing health checks:

FROM nginx:1.17.7RUN apt-get update && apt-get install -y wgetHEALTHCHECK CMD wget -q --method=HEAD localhost/system-status.txt

First, the Dockerfile was complemented with a RUN command to install the Wget client to be used in defining the health check.

Second, the HEALTHCHECK was defined. The health check comprises a call, using wget, to fetch the specific URL of the custom system-status.txt file. If the file is found, the exit code of the command will be 0, whereas if it’s not, it’ll be 8(indicating a “Server issued an error response,” as per Wget’s official exit codes page).

Testing the health check

Using the augmented Dockerfile above, let’s build a new image and rerun the container:

Building and running the container with health checks enabled (image by author)

Issuing a docker ps now, we can see the Docker engine reporting the health status of the container in addition to its UP status:

A starting container with no health checks executed yet (image by author)

With the default configuration for health checks, the health-check command doesn’t get executed immediately, so the initial status you see is (health: starting). This is indicating your container has started, but no health checks have been performed yet. After 30 seconds, the health-check command is executed, and since system-status.txt doesn’t exist, the container is now reported as unhealthy:

An unhealthy container (image by author)

Let’s try to heal the container by creating a sample system-status.txt:

docker exec docker-health sh -c \
'echo OK > /usr/share/nginx/html/system-status.txt'

In the next 30 seconds, the container should be reported as healthy:

A healthy container (image by author)

Health-check parameters

The HEALTHCHECK command can also be used with four different options:

--interval=DURATION (default: 30s)
--timeout=DURATION (default: 30s)
--start-period=DURATION (default: 0s)
--retries=N (default: 3)

The interval option specifies the number of seconds to initially wait before executing the health check and then the frequency at which subsequent health checks will be performed.

The timeout option specifies the number of seconds Docker awaits for your health check command to return an exit code before declaring it as failed (and your container as unhealthy).

The start-period option specifies the number of seconds your container needs to bootstrap. During this period, health checks with an exit code greater than zero won’t mark the container as unhealthy; however, a status code of 0 will mark the container as healthy.

The retries option specifies the number of consecutive health check failures required to declare the container as unhealthy.

Docker Swarm behaviour

A useful feature of health checks when running containers in Docker Swarm is that for as long as the container is in an unhealthy or (health: starting) status, routing is disabled and no requests reach the container at all.


Conclusion

Having your container started doesn’t necessarily mean your application is up or it behaves as designed. Docker health checks can be very easily implemented and can help you quickly identify erratic behaviour before it becomes a real problem.

Next time you create a Dockerfile, consider adding a health check.

Better Programming

Advice for programmers.

Nassos Michas

Written by

Software engineer | Cert. Scrum master | Cert. Professional for Requirements Engineering | CTO at European Dynamics SA — New posts every week

Better Programming

Advice for programmers.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade