How we choked our Kubernetes NodeJS services

Published in

Pipedrive R&D Blog

7 min readFeb 7, 2023

The days of counting bits and clock ticks are history. CPU and memory may be cheap, but not enough to waste them — especially since it would be at our users’ expense.

In this blog post, I’ll explain how we run our services in Kubernetes and manage their resources, how to make the most of them and what to look out for.

Cover image by Jeremy Waterhouse at pexels.com

Kubernetes resources

When we deploy a service to a Kubernetes cluster, it usually creates multiple identical instances called pods. These pods are scheduled to run on a number of nodes (physical or virtual servers). All the services share the resources available on a node. Next, let’s have a closer look at the CPU and memory resources.

For Kubernetes to effectively manage resource allocation, we can configure the number of resources we expect our service to use (resource requests) and define an upper limit for them (resource limits). Memory is measured in bytes and CPU in the number of cores. Fractional cores with 0.001 precision are also available (1 core = 1000m or millicores).

Here’s how a reasonable resource definition for a NodeJS container might look:

resources:
  requests:
    memory: "100M"
    cpu: "250m"
  limits:
    memory: "200M"
    cpu: "5000m"

We define requests and limits per container. Multiple containers can run in a pod, and a container can run multiple processes, which, in turn, can run multiple threads. The resources of a pod are shared between all the threads it runs.

Memory

A pod can be scheduled on a node when there’s enough memory available to fulfill the request. Once the pod is running, it can use more memory than requested as long as it’s below the defined memory limit. Once the pod exceeds the limit, it gets Out-Of-Memory-killed (OOM-killed). There can be a cluster-specific configuration for when exactly the killing happens, but we’ll keep things simple.

Memory limit vs. NodeJS heap size

The V8 engine manages NodeJS JavaScript heap memory (used for dynamic allocations when running your code). If we don’t specify a limit, V8 will use about half of the available memory for the JS heap. Since that isn’t ideal if we only use memory inside JS, we can set the limit ourselves by providing the max-old-space-size argument when running NodeJS.

NodeJS will use the entire available heap when busy handling payloads or processing data. Garbage collection runs more heavily once the heap starts to fill up or when the Event Loop is idler. If we set the heap size equal to the container memory limit, the pod would be OOM-killed before JS could fill the entire heap. It would assume there was more space available when, in fact, the container had already run out. This would happen because NodeJS needs to maintain memory available for other functions outside the JS heap.

Since the overhead of an empty NodeJS instance is about 40 MB, we should set at least that amount aside when calculating the optimal heap size. For services with relatively stable memory usage — when consumption is around 200MB — I would set the memory limit at 300M and max-old-space-size=260. In more fluctuating services — message queue consumers, for example — I would leave a wider gap between the limit and max-old-space-size.

CPU

If enough CPU cores are available to fulfill the request, pods can be scheduled on the node. However, limiting the CPU resource isn’t as simple as comparing two numbers and making a decision.

CPU usage is measured by the running time of all the threads in the pod. The pod doesn’t get killed when it uses all the CPU time set in the limit — it gets throttled.

The Linux kernel controls the pods’ limits. With the default configuration, the limits are applied in 100ms periods. A CPU limit of 50m means that during each 100ms period, 5ms of CPU time can be consumed. If your service runs some CPU-heavy computation — which requires multiple seconds of CPU time — it will only run for 50 milliseconds per second. Your process is throttled for 950 milliseconds — the service runs 20 times slower than it could.

But JavaScript is single-threaded, right?

If we don’t want to slow down our services, shouldn’t we set the CPU limit to 1?

Well, not quite.

You may have heard that JavaScript is single-threaded, but there’s more to it. The JavaScript code inside NodeJS runs in a single thread (the event loop), but there’s a lot in NodeJS that runs outside the event loop. NodeJS also has a libuv worker pool (containing four threads by default) that runs a lot of computation-heavy tasks and other code impractical to run inside the JS event loop.

What’s happening outside the event loop?

Crypto (encryption, random bytes)
Zlib (asynchronous compression algorithms)
File access
DNS lookup
Garbage collection
C++ add-ons that might be loaded by dependencies

Depending on the nature of a service, plenty of its operations might be running in the worker pool rather than in JS Event Loop. If you limited the CPU resource to one core, the worker pool and event loop would compete over the CPU time. Even if you used asynchronous APIs to handle other payloads while performing IO for one — for example, DB or HTTP requests — your JS code might be blocked by the worker pool or block the asynchronous call you’d made to decompress a file.

Let’s look at an example of how a payload might be handled:

Illustration showing CPU work distribution between JS event loop and worker threads under different CPU limits.

So, what’s the magic number?

I’ve advised my fellow developers to set the CPU limit at 1.15. This magic number is based on our observation of CPU-bound services. With a limit of one core, CPU usage graphs would flatten at 100%. If we raised the limit slightly to around 115% (1.15 cores), the graph would remain relatively stable, even with a 1.5 or 2 limit. The 15% increase represents work done outside the JS event loop. For services utilizing encryption or compression, that overhead would be even higher.

It’s worth noting that the CPU graphs didn’t show the ongoing CPU throttling that would slow down the services, affecting overall service throughput and response times.

Taking these insights into account, I believe we should set the CPU limits to higher than 1 for NodeJS services. The default number of threads on a NodeJS process is currently 6, but it can be even higher. In our experience, a limit of 4 cores can mostly eliminate CPU throttling in more demanding services.

But what about over-provisioning the resources?

Glad you asked.

When provisioning Kubernetes clusters, it’s useful to estimate the combined amount of resources needed for all services. It would make sense (by a twisted logic) to set resource requests and limits equal to the amounts used at peak hours. There would always be enough resources available for all the services, but limits don’t have to match requests. In fact, and as we learned, it’s inadvisable to set them equal.

The resource requests determine if and how pods can be scheduled on the nodes. We should set the requests based on the actual resource usage during peak hours, with a little overhead to handle fluctuations.

The limits can be higher, which won’t change the total amount of work done. By setting the limits, we change the speed at which work can be done. When it comes to CPU congestion, pods get CPU time in proportion to their requests. The work still gets done, just a bit slower than under normal operation.

There are three solutions when a service misbehaves by using on average more CPU than requested. We can fix the service or introduce appropriate rate limiting. However, if higher usage, new features or changes elsewhere in the system cause frequent high-CPU payloads, we might prefer to raise the CPU limit.

Bonus content: Redis

Though Redis’ key-value store is famously blazing-fast, it’s affected by the CPU limits. For Redis containers, we should set the limit at 1 core, which is plenty since it runs the core functionality in a single thread. And while doing so, you should still set the request based on actual CPU utilization with a small buffer for emergencies.

Conclusion

Managing resources in Kubernetes for NodeJS services requires setting resource requests and limits for CPU and memory. A pod can be scheduled on a node when enough memory and CPU are available to fulfill the requests.

If the pod uses more memory than the limit, it gets killed. However, limiting CPU usage isn’t as straightforward and can slow down the service if set too low. Additionally, when running NodeJS, it’s important to consider a limit for the JavaScript heap memory (max-old-space-size) and the overhead and nature of the service to determine the optimal resource allocation.

As always, the values should be tuned per service based on real metrics. There is no silver bullet configuration to handle all the different services equally well. No automation can reliably make decisions for us by considering all the variables.

Overall, properly managing resources is crucial to avoid waste and ensure a smooth user experience.