Kubernetes Readiness & Liveliness Probes — Best Practices
In Kubernetes, pods are the smallest deployable units of computing that can be created and managed. A pod is a group of one or more containers (Docker, rocket, etc), with shared storage/network, and a specification for how to run the containers.
Readiness & Liveliness Probes can be generically called in Kubernetes context Health Checks. Container probes are small processes that run periodically. The result of these probes (Success, Failed, or Unknown ) determines Kubernetes perspective of the container’s state. Based on them, Kubernetes decides how to treat each and every container in order to maintain resilience, high availability and higher uptime.
Health checks are a requirement for any distributed system!
Kubernetes health checks:
Kubernetes offers two types of health checks: readiness and liveness, and both of them have their own purpose. In the context of this article will choose:
- /.well-known/live — for HTTP live probe
- /.well-known/ready — for HTTP ready probe
In a few words, HTTP probing means that Kubernetes performs at specific intervals of time HTTP Get requests on /.well-known/live and /.well-known/ready. The status code of the response is used to decide what needs to be done with the pod. If the status code is in the interval [200, 300) then everything is ok. Otherwise:
- if the status code for the live probe is 4xx or 5xx, the pod is restarted
- if the status code for the ready probe is 4xx or 5xx then pod is marked as unhealthy and HTTP traffic will no longer be redirected to it for increasing reliability and uptime.
If containers are independent of any backing services, containers can have liveness and readiness checks exposed on the same handler and what follows does not apply.
Together with team mates from Metro Systems Romania — Site Reliability Team, we have identified a list of best practices for the Health Checks and has advised the application developers to follow them. Such best practices are:
- Live & Ready handlers need to be independent functions!
As mentioned before, for each product deployed in a Kubernetes context, 2 handlers that answer on HTTP calls for “live” and “ready” should be implemented. First best practice regarding these probes is that each handler needs to have its own function implemented.
2. Do not decouple the logic for “live” / “ready” from your application!
This is applicable for job processing apps. For Kubernetes it is important to know if the processing app is running or not. If the logic for live/ready was decoupled in a new process, the result is not conclusive.
3. Do not implement any logic on “live” handler. It needs to return status 200 if the main thread is running and 5xx if is not.
This probe lets Kubernetes know if the application is alive or dead. The decision is made by checking the status code for /.well-known/live and if the application is declared dead, Kubernetes restarts the pod. From the reliability point of view, the response for live probe needs to be true if the main thread of the app is up and running and false otherwise.
In this context, “logic” means implementing some sort of checks over the interconnected services.
4. Implement logic in “ready” handler in order to offer a complete answer about the readiness of the app.
Ready probe lets Kubernetes know if the pod is ready to receive HTTP traffic. As a developer it is important to implement here some logic to check the availability of all your backend dependencies for the application. When the “ready” handler is implemented it is very important to clearly know what ready really means for your application. In other words, on the “ready” handler it is important to run all steps that determine that the app is ready to receive and to process https requests. For example, if the application needs to establish a connection to a database in order to be ready to process HTTP requests, on readiness probe is essential to check if the connection to the database is established and ready to be used.
Let’s consider a specific case: the application gets stuck doing some processing on a separate thread but the main thread is working well. As a developer, you know that a pod restart will solve the issue and you can convince Kubernetes to automatically perform it for you. All you need to do is to make sure the application is responding with 5xx on the ready probe and force the live probe to return a minimum amount of 5xx responses on Kubernetes calls. In this case, Kubernetes will restart the pod because it is considered dead.
5. Do not try to re-establish the readiness status of the app on the ready handler. This is made only to check is the app is ready, not to make it ready.
It’s advised that there’s no logic implemented that tries to re-establish the state of readiness. Having such a logic might be dangerous for some of the components of the system.
Live and Ready probes are the heart and the soul of the applications deployed in Kubernetes. They are the standard ways to communicate with the hypervisor and talk about their status and their problems. Live and ready probes are powerful weapons that a developer and an application need in order to make sure that the application is reliable and resilient.
Special thanks to Ionut Ilie. Part of this material is the result of his research and wisdom.