Running Go application on Kubernetes

Published in

Inloco Tech Blog

4 min readJan 21, 2020

Here at In Loco, Go has become one of the main programming languages in our stack, it has proven itself to be a production-ready, fast, easy to maintain and versatile tool, but was in the moment that we deployed a quite simple CPU bound application with no IO to our production Kubernetes environment that we noticed that it had something wrong with our services. For some obscure reason, our simple gRPC application was showing a p99 latency of 70ms which was drastically higher than our expectations, so we started a siege of tests.

During our testing, we found three major problems. The first one was that the application was deployed with modest CPU request resources, only 200 millicpu and a limit of 400 millicpu (millicpu is used in Kubernetes meaning the millesimal part of a single CPU core), so this means that, for each time unit, Kubernetes only gives 20% of continuous processing time to each pod, since the standard strategy of the Linux kernel threads is based on time slices.

In this strategy, the kernel gives time slices to each process to run, locking the CPU resource to that process during its time slice, making rounds between all processes consuming that resource. Each pod will execute for 20% of that time slice, then it is going to wait for at least 80% of the time slice to execute again. Even if there is no other pod competing for that CPU time, the processing will only continue up to 40% of the time. So in a context where the node has only one instance of our Go app, the CPU stays idle rather than fill the computation time with the app.

Another thing we had considered was the Kubernetes QoS (Quality of Service). The desired QoS can be specified in the deployment manifest among Guaranteed, Burstable, and Shared. Here is the Kubernetes documentation that explains it beautifully. In our tests, we specified Guaranteed QoS (the CPU request and limit to 1 full CPU) in the deployment, having almost the same results. But we struggled with request spikes since we have a signature behavior where this kind of load happens:

In a few seconds, our requests/min increases over fifty percent. Horizontal autoscaling does not handle this properly since newly spawned pods have a couple of minutes delay to get ready, so we changed it back to Burstable QoS but now not setting a CPU limit. This gives our pods the freedom to grow and handle this kind of spike. “Won’t it interfere with other pods performance?”, you might be thinking. Not that much, since Kubernetes ensures at least the requested CPU time for each pod scheduled in each node. The only possibility of interference is when two pods grow beyond their CPU requested resources at the same time. Even then, the pods will have proportional CPU time, avoiding throttling.

Also, Go runtime sets a variable called GOMAXPROCS that controls the number of system threads that it can spawn. That means the number of goroutines that can actually run in parallel. In Kubernetes, all the available CPU cores on the node are visible by its pods (instead of the limits configured in the manifest), you can check it yourself:

# Set your k8s environment to the desired namespace, then:
kubectl exec -it your_pod_name bash
# Inside your pod, run nproc as below:
nproc

This happens because when we set the request/limit resource for the CPU on our deployment, no actual CPUs are allocated to the pod. The pod receives CPU time and Kubernetes manages how much time each pod will have. This behavior can lead to starvation due to threads competing for resources, making your application really slow.

Wrapping Up

Two things you must ensure to successfully run CPU bound applications with Go on Kubernetes (both configurations can be specified in the manifest):

Set the CPU resource request to at least 1 while not setting the CPU resource limit. This way, Go runtime can have at least one full thread;
Set the environment variable GOMAXPROCS to the number of requested CPUs in the manifest plus one. This enables Go runtime to check how many system threads it can spawn. It also ensures enough threads during bursts, to use the extra CPU properly.

Are you interested?

If you are interested in building context-aware products using a powerful location technology that genuinely cares about user’s privacy, then check out our opportunities.

Running Go application on Kubernetes

Wrapping Up

Are you interested?

Written by João Filipe Moura