Meaningful web service /health checks

Lorenzo Arribas
The Glovo Tech Blog
7 min readOct 3, 2019

About 10 years ago, I deployed my first web service. It was a nice, silly PHP application to store game cheat sheets. Interestingly, what made me really proud about it was the fact that I was able to release new versions with a single command, with a weird mix of git hooks and rsync-powered bash scripts.

When I think about the massive transformation that our field has undergone in the last few years in terms of continuous delivery and service orchestration, I always come back to that memory, and I can’t help laughing a bit.

Cloud platforms such as AWS, Heroku, Azure or Kubernetes have enabled us to use deployment strategies, such as canary releases, staged rollouts, or blue-green deployments, regardless of whether we’re deploying a side project or a critical enterprise service.

All of these strategies have but one goal: to minimise client-facing downtime. Which brings me to an important (yet somehow easily forgotten) topic: Health checks.

A health check is a way for a service to report whether it’s running properly or not. Web services usually expose this via a /health HTTP endpoint. Then orchestration components, such as load balancers or service discovery systems, poll that endpoint to monitor the health of a fleet of services and make some key decisions, such as:

  • Is a new version of the service ready to receive requests?
  • Should we roll back a deployment?
  • Should we restart an instance?

Anemic health checks

In my short experience (I’ve been in the industry for about 6 years), I’ve seen a bunch of health checks for different services and I’ve realised that most of them are pretty basic. They attempt to establish a connection to their downstream dependencies (select something from the database, ping Redis…) and report that they’re okay as soon as they:

  • Can process a request to the /health endpoint (that must mean our application is fully loaded);
  • Can connect to their dependencies.

After that, we’re basically good to go.

So when I needed to write my own health check for a new service, I copied that pattern.

Lorenzo Arribas
The Glovo Tech Blog

Staff Software Engineer at Glovo. I specialize in event-driven architectures and Machine Learning Operations. My website is https://larribas.me

Recommended from Medium

Lists

See more recommendations