Managing dependencies between Kubernetes Services
Last week I had the job to add ElasticSearch to my project. Since my Application relies now on a third party service, I have to make sure that ElasticSearch is running. On startup, my application uses ElasticSearch to build up an index. This means ElasticSearch has to be started before my Application. Otherwise my Application will fail at startup.
What are Liveness-, Readiness- and Startup-Probes
According to kubernetes.io:
The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite bugs.
The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
The kubelet uses startup probes to know when a container application has started. If such a probe is configured, it disables liveness and readiness checks until it succeeds, making sure those probes don’t interfere with the application startup. This can be used to adopt liveness checks on slow starting containers, avoiding them getting killed by the kubelet before they are up and running.
For our use case, this means that we can define a liveness probe on our own application, that checks, whether or not a connection to ElasticSearch can be established. If the probe failed, the application should restart and if it succeeded, it should keep running.
Creating a Liveness-Probe with HTTP
There are different types of liveness probes. We can use one of the following:
- Liveness Command
- Liveness HTTP request
- Liveness gRPC request
For our example, we will use a simple HTTP request.
It is important to know that every HttpStatusCode between 200 and 400 will result in a positive probe, while everything above 400 or below 200 will result in a failed probe.
In our ElasticAdapter, we have a method that throws an exception if the cluster is not reachable.
As you can see, it simply makes a call to
clusterHealth() and does nothing with the result if it gets one. If it throws a
ConnectionException it will catch it and throw a
To make this check available to the outside, we need to create a new REST endpoint.
This Controller creates a
GetMapping for the /
liveness endpoint which calls the
isReachable() function of the elasticAdapter. If the check succeeds, it won’t throw an exception and
HttpStatus.OK will be returned. If it fails, we will catch the exception and return
HttpStatus.SERVICE_UNAVAILABLE which has a Status code of 503.
Since Status code 503 is higher than 400, it will make the probe fail.
Add Liveness probe to Kubernetes Deployment
Now we can use our REST endpoint to create a liveness probe in our deployment.
In our liveness probe, we have to define the port on which our application runs on and the path on which our liveness probe should be checked. In our case, we defined it as
/liveness . The
initialDelaySeconds delays the liveness probe for the first three seconds. This is needed to give our own application some time to startup and make the endpoint available. The
periodSeconds is the time between each probe.
This probe helps us in two scenarios:
- Our application has started, while ElasticSearch is still unavailable.
- Restarts our application until ElasticSearch is reachable and lets it run.
- Both apps started successfuly but ElasticSearch crashes.
- Restarts our application until ElasticSearch is reachable again.
- Ensures that we always have both applications running and don’t loose data.
What went good
In my opinion, the Kubernetes documentation about livenessProbes was really good and it was really easy to implement. Also, creating the Rest-Endpoint itself was no problem for me, since I already did that a few times.
What needs improvement
For me, the biggest problem was testing the liveness probe. Since we use OpenShift in production, I wanted to use OpenShift CRC. This did not work properly on my machine and I needed to restart the whole cluster a few times. After that, I switched to Minikube, which solved all my issues. It was blazing fast, compared to the slow and resource intensive CRC. Next time, I would directly use Minikube to test local deployments or ask my team members, how they would test something like this.