Exec Probes: A Story about Experiment and Relevant Problems

netcracker_team
netcracker
Published in
6 min readFeb 2, 2022

Netcracker teams actively deploy their applications to Kubernetes and use HTTP probes for monitoring health of the services. However, we decided to experiment with the exec probes and… Ended up with the cluster down!

How so? It is the exec probes that are first to be covered in all Kubernetes manuals and guidelines, including the official ones! So what is wrong?

Here you will find some relevant information that can save you from potential problems and rash decisions. At the very end, we will explain what is wrong with HTTP probes and why did we experiment with exec probes at all.

Experiment parameters

Our cluster:

· Dev cluster used by the development and testing teams. No production load, so we can try something new there;

· About 120 vCPU, 1Tb RAM on worker nodes;

· About 1,800 running pods. Each uses HTTP probes for liveness/readiness/startup.

A set of probes is a required standard for all of our services. If there are no probes, the service will not be accepted even for internal testing.

Average cluster load before the experiment: CPU load is 45%, RAM load is about 50%. System processes (including systemd) consume about 10% of one CPU core.

We have updated about 490 services to use exec probes after which the CPU load reached 68% (+23%) and continued to increase as the probes were installed. The CPU consumption on systemd increased to almost 100% making the cluster virtually unresponsive.

So what went wrong? And why did we have to roll back everything?

How does the Kubernetes cluster work?

To understand the problem, I will briefly refer to the roles of processes in the cluster. For sure, this is a large topic worthy of a series of articles. If you want to know more about it, please write in the comments.

We will be interested in the major players:

· kube-scheduler

· kubelet

· containerd/runc

· systemd

Let’s assume that you have already deleted Docker from your cluster and switched to the CRI implementation from containerd/cri-o. Therefore, we will not talk about Docker.

Simply put, it looks like this:

· kube-scheduler assigns a pod to the node. This is some sort of a cluster team lead. If you imagine it as your team lead, that’s a person who assigns pods as bugs and stories to team members.

· kubelet is an executor. It takes tasks assigned by kube-scheduler and executes them. However, it does not actually do them itself. It just passes high-level commands such as “execute the container” to the CRI layer, which is containerd/cri-o.

· containerd/runc deal with the physical load of the container layers, mounting them on top of each other and other system tasks. However, they also do not do these tasks themselves.

· systemd executes all system calls and low-level commands.

How the HTTP probe works

The HTTP probe works as simple as that:

HTTP probe code is present in pkg/probe/http/http.go:91 and does the following:

1. Connects to the pod using its internal address;

2. Requests the specified page;

3. Defines the result based on the HTTP code of the response.

Shortly, it looks as follows:

func DoHTTPProbe(url *url.URL, headers http.Header, client GetHTTPInterface) (probe.Result, string, error) {req, err := http.NewRequest("GET", url.String(), nil)res, err := client.Do(req)b, err := utilio.ReadAtMost(res.Body, maxRespBodyLength)body := string(b)if res.StatusCode >= http.StatusOK && res.StatusCode < http.StatusBadRequest {if res.StatusCode >= http.StatusMultipleChoices { // Redirectreturn probe.Warning, body, nil)return probe.Success, body, nil)return probe.Failure, fmt.Sprintf("HTTP probe failed with statuscode: %d", res.StatusCode), nil)

As you can see, everything is really very simple and clear.

There is a little problem with the last line, but we will talk about it later. Now we are focused on the fact that this is a simple, clear, and lightweight request that causes no significant overheads and can be used in any volume.

How the exec probe works

This is a much more complicated, interesting, and multi-layered process:

1. Kubelet gives a task to execute a command within the existing pod sandbox to the underlying CRI layer — pkg/kubelet/prober/prober.go#L160.

2. Containerd executes a series of actions with underlying layers, which include searching for the active sandbox, attaching to that sandbox, allocating the command and input-output buffers, actual execution, and subsequent cleanup. All this results in about 5–7 calls to the underlying layers and is described here: pkg/cri/server/container_execsync.go#L196-L211

3. Systemd executes all this jazz, passing the calls to the Linux kernel.

4. The kernel executes the commands and returns the results.

Thus, compared to a plain HTTP probe executing only one request, exec probe performs a set of actions at both systemd and kernel level.

What it led to

Simple math tells us:

· 490 probes that are executed every three seconds. On average, each includes 7 systemd calls => 1,143 systemd calls per second to the cluster;

· For the cluster with 11 nodes, it is about 100 systems calls per second on each node.

And here is the result:

systemd occupies about 70% of one CPU. And metrics on this screenshot were not even our worst! Unfortunately, I have misplaced the one showing beautiful 100% consumption of CPU.

Important: systemd is a single-threaded process. And when it consumes 100%, it does not just consume the cluster resources. It also makes the cluster unresponsive even for such simple routine tasks as deploying new containers.

As a result, almost entire workload has stopped. It became impossible to use the cluster as it allocated almost all systemd resources for servicing probes.

Why did we even bother doing this

Now you may ask why have we even tried using exec probes if we already had working HTTP.

The answer is in this part of the Kubernetes code — pkg/probe/http/http.go#L138.

body := string(b)if res.StatusCode >= http.StatusOK && res.StatusCode < http.StatusBadRequest {if res.StatusCode >= http.StatusMultipleChoices { // Redirectreturn probe.Warning, body, nil)return probe.Success, body, nil)return probe.Failure, fmt.Sprintf("HTTP probe failed with statuscode: %d", res.StatusCode), nil

As you can see, when the probe fails, it completely removes the read body and returns a non-informative message “HTTP probe failed with statuscode: 500”.

And for troubleshooting, we want to see clear information on why this particular probe failed. We can easily display this information in the probe body, but we need Kubernetes to read it from there.

Whereas, exec probes display complete information that is further provided to stderr. This looked more convenient for us.

Now it seems, that a simple pull request to Kubernetes would be a more appropriate solution.

Conclusions

Exec probes are a well-documented Kubernetes tool. Even the official “Configure Liveness, Readiness and Startup Probes” article uses exec probes as examples. Naively relying on them, we got very unexpected results, which I shared with you. And there is no information about the possible consequences, such as a non-working cluster. But now you know the situation. Forewarned is forearmed!

After rolling back to HTTP probes, the cluster returned to operating state with an average systemd load of about 10%.

Have you had a similar experience? Share in the comments!

--

--