Kubernetes: Liveness Checks

Published in

Google Cloud - Community

6 min readJun 16, 2018

Recently I put together a quick article about the Kubernetes Readiness Probe and how important it was for your cluster. If Readiness Probe’s are an 9 out of 10 in the important scale, Liveness Checks are a solid 10 out of 10. Very similar to the Readiness check adding this feature is really easy. In this article we will look at the code necessary for your application along with your YAML file to make this check possible. Finally we will even test our code to ensure that things are working just as we expect.

If you haven’t gone through or even read the first part of this series you might be lost, have questions where the code is, or what was done previously. Remember this assumes you’re using GCP and GKE. I will always provide the code and how to test the code is working as intended.

Kubernetes: Day One

This is the obligatory step one Kubernetes post. If you’re interested in Kubernetes you’ve probably read 100 of these…

medium.com

Why Do We Need A Liveness Check?

At this point let’s believe that you’ve already created a wonderful Kubernetes Cluster with many Pods already replicating as necessary. If one of these Pods runs into an issue we need to let Kubernetes know that the Pod can be taken down and a new Pod spun up. The Liveness Check does this by periodically hitting all of your Pods and ensuring they are still responding as necessary. If a Pod fails the Liveness Check then Kubernetes can bring down that Pod and spin up a new one. If you skip the Liveness check then it is highly likely that you would have unhealthy Pods in your cluster with no automated way to repair the problem.

Service Code

For my example, and since my test application uses NodeJs with Express, I am going to add in the wonderful express-healthcheck NPM package to help do the heavy lifting.

express-healthcheck

Super-simple healthcheck middleware for express

www.npmjs.com

With this package we can respond to liveness requests by just adding one line of code.

// basic liveness check
app.use('/healthcheck', require('express-healthcheck')());

I’ve gone further and — for our test case — created a way to get the liveness but also to set the Pod as unhealthy. The code below would never be intended for Product deployments. You can view all of the code here.

// setup
const healthCheck = require('express-healthcheck');
let healthy = true;// sets the Pod status to `unhealthy`
app.use('/unhealthy', function(req, res, next){
    healthy = false;
    res.status(200).json({ healthy });
});// returns the liveness response
app.use('/healthcheck', healthyIntercept, healthCheck());// function to `check` liveness
function healthyIntercept(req, res, next){
    if(healthy){
        next();
    } else {
        next(new Error('unhealthy'));
    }
}

With this code in our Pod we have all we need on the server to respond to our liveness probe.

Do I Have To Use HTTP Requests?

No. In the Pod Definition you could either use HTTP Requests (probably the simplest option) or you could also set a TCP Probe or even run a Command Scripts to validate your Pod is running. You have lots of options, I just like the easiest one.

Pod Definition

The changes to the Pod definition are super minor and only take a moment to complete. We just need to state where and how to test for our Pod’s liveness. I’ve added notes in the code to show all the possible places you can customize your probe.

# the liveness probe details
livenessProbe:
  httpGet: # make an HTTP request
    port: 8080 # port to use
    path: /healthcheck # endpoint to hit
    scheme: HTTP # or HTTPS
  initialDelaySeconds: 3 # how long to wait before checking
  periodSeconds: 3 # how long to wait between checks
  successThreshold: 1 # how many successes to hit before accepting
  failureThreshold: 1 # how many failures to accept before failing
  timeoutSeconds: 1 # how long to wait for a response

From the Pod/Deployment definition I’ve provided previously.

Testing Our Liveness Probe

Now comes the fun part! We can now startup our Kubernetes Cluster, deploy our Container, and make our Cluster fail the liveness probe.

Starting Up Our Kubernetes Cluster

Following the steps from the Kubernetes: Day One article you already know how to go into the Google Cloud Platform Cloud shell and run the following commands.

$ git clone https://github.com/jonbcampos/kubernetes-series.git
$ cd kubernetes-series/partone/scripts
$ sh startup.sh
$ sh deploy.sh
$ sh check-endpoint.sh endpoints

This will start up your Kubernetes Cluster and deploy the Container. When the startup script completes you’ll have everything setup for you and the IP address you can hit to view our services. Start by just getting the liveness probe endpoint. For this you can use your browser and go to the following URL.

https://[your_external_ip_address]/healthcheck

The service response will let you know how long that Pod has been running.

In your console you’re going to want to watch the next part live as it happens. We are going to put a watch onto our Pods so we can see when a Pod becomes unhealthy and replaced by the Cluster. To add this watch just type in the following command.

$ watch kubectl get pods

With your watcher running you can see your Pods running and any changes as they happen.

Make Your Pods Fail

We will now flip the switch and make one of our Pods unhealthy. We do this by hitting our /unhealthy endpoint.

https://[your_external_ip_address]/unhealthy

Kubernetes will route our request to one of our Pods (I will probably write up later on how to route requests between Pods), thus changing the state of one of the Pods.

Watching our console output we will see, after the liveness probe runs, that one of our Pods are failing. This may happen too fast for you to even notice, but you will notice that one of the Pods is restarted to fix the “outage”.

And just like that we have a whole new healthy Pod ready to receive more service requests.

Teardown

Before you leave make sure to cleanup your project so you aren’t charged for the VMs that you’re using to run your cluster. Return to the Cloud Shell and run the teardown script to cleanup your project. This will delete your cluster and the containers that we’ve built.

$ cd ~/kubernetes-series/autoscaling/scripts # if necessary
$ sh teardown.sh

Closing

This article goes fully into liveness probes with Kubernetes. Having a self monitoring Kubernetes Cluster than can respond to unhealthy Pods is a life changing piece of technology. Well, life changing if you are that devops manager that is used to restarting VMs at 3am.

Kubernetes: Liveness Checks

Kubernetes: Day One

This is the obligatory step one Kubernetes post. If you’re interested in Kubernetes you’ve probably read 100 of these…

Why Do We Need A Liveness Check?

Service Code

express-healthcheck

Super-simple healthcheck middleware for express

Do I Have To Use HTTP Requests?

Pod Definition

Testing Our Liveness Probe

Starting Up Our Kubernetes Cluster

Make Your Pods Fail

Teardown

Closing

Other Posts In This Series

Kubernetes: Day One

This is the obligatory step one Kubernetes post. If you’re interested in Kubernetes you’ve probably read 100 of these…

Kubernetes: Horizontal Pod Scaling

With Pod Autoscaling your Kubernetes Cluster can monitor the load of your existing Pods and determine if we need more…

Kubernetes: Cluster Autoscaler

Autoscaling is a huge (and marketed) feature of Kubernetes. When your site/app/api/project makes it big and the flood…

Kubernetes: Readiness Probe

In case there was any question about this feature, I am writing about it specifically to state that this is not an…

Written by Jonathan Campos