Not long after recently taking a job at Splunk, a co-worker (Hi Tameem!) hit me up over Slack and asked about one of my Kubernetes metrics blog posts.
His question was about which container “memory usage” metric the OOMKiller uses to determine if a container should be killed. The assertion I made in that post was:
You might think that memory utilization is easily tracked with
container_memory_usage_bytes, however, this metric also includes cached (think filesystem cache) items that can be evicted under memory pressure.The better metric is
container_memory_working_set_bytesas this is what the OOM killer is watching for.
This is one of the more highlighted section of this post, so I decided I needed to see it action. Let’s see which of the metrics the OOMKiller is watching for.
I threw together a toy program that would continuously allocate memory until the OOMKiller got involved and killed the container in the pod.
Running this in minikube with memory requests and limits both set to 128MB we see that both
container_memory_working_set_bytes track almost 1:1 with each other. When they both reach the limit set on the container, the OOMKiller kills the container and the process starts over.
container_memory_usage_bytes also tracks the amount of filesystem cache that the process uses, I extended the toy program to also write bytes to a file on the filesystem.
After introducing the filesystem cache to the picture, we start to see
container_memory_working_set_bytes start to diverge.
Now what’s interesting is that the container is still not allowed to use more than the amount of memory at the container limit, but the OOMKiller does not kill the container until
container_memory_working_set_bytes gets to the memory limit.
Another interesting aspect of this behavior is that
container_memory_usage_bytes tops out at the memory limit of the container, even though bytes continue to be written to disk.
If we take a look at
container_memory_cache, we see that the amount of cache used continues to increase until
container_memory_usage_bytes hits the limit, then it starts to decrease. Very interesting.
We can see from this experiment that container_memory_usage_bytes does account for some filesystem pages that are being cached. We can also see that OOMKiller is tracking container_memory_working_set_bytes. This makes sense as shared filesystem cache pages can be evicted from memory at any time. There’s no point in killing the process just for using disk I/O.
Hope this helps.
Thanks to Stephen Coles for the album image.
Join our community Slack and read our weekly Faun topics ⬇