Meaningful web service /health checks
--
About 10 years ago, I deployed my first web service. It was a nice, silly PHP application to store game cheat sheets. Interestingly, what made me really proud about it was the fact that I was able to release new versions with a single command, with a weird mix of git hooks and rsync-powered bash scripts.
When I think about the massive transformation that our field has undergone in the last few years in terms of continuous delivery and service orchestration, I always come back to that memory, and I can’t help laughing a bit.
Cloud platforms such as AWS, Heroku, Azure or Kubernetes have enabled us to use deployment strategies, such as canary releases, staged rollouts, or blue-green deployments, regardless of whether we’re deploying a side project or a critical enterprise service.
All of these strategies have but one goal: to minimise client-facing downtime. Which brings me to an important (yet somehow easily forgotten) topic: Health checks.
A health check is a way for a service to report whether it’s running properly or not. Web services usually expose this via a /health
HTTP endpoint. Then orchestration components, such as load balancers or service discovery systems, poll that endpoint to monitor the health of a fleet of services and make some key decisions, such as:
- Is a new version of the service ready to receive requests?
- Should we roll back a deployment?
- Should we restart an instance?
Anemic health checks
In my short experience (I’ve been in the industry for about 6 years), I’ve seen a bunch of health checks for different services and I’ve realised that most of them are pretty basic. They attempt to establish a connection to their downstream dependencies (select something from the database, ping Redis…) and report that they’re okay as soon as they:
- Can process a request to the /health endpoint (that must mean our application is fully loaded);
- Can connect to their dependencies.
After that, we’re basically good to go.
So when I needed to write my own health check for a new service, I copied that pattern.