In my professional journey, I’ve come to understand that the more resources you have at your disposal, the more efficiently you should utilize them. The effectiveness of resource utilization is key to improving your performance. Let’s explore together the possibilities of utilizing NUMA in Kubernetes for efficient resource management on larger servers.
1.Stop, stop, stop. NU-what-MA?
Non-uniform memory access (NUMA) is a crucial method in multiprocessing systems, enabling a cluster of microprocessors to efficiently share memory by organizing it locally. This architecture optimizes performance by reducing memory access latency and enhancing overall system efficiency.
Imagine your application is running on core1 and core11, utilizing RAM 2. In this scenario, your application is wasting resources if core1 attempts to access RAM 2, which is not directly connected to this CPU.
2. Kubernetes and Topology Manager
In Kubernetes, assigning pods to NUMA isn’t as straightforward. We need to configure Kubelet to use the topology manager, as well as the memory manager and CPU manager, to achieve this:
topology-manager-policy=single-numa-node
topology-manager-scope=pod
cpu-manager-policy=static
memory-manager-policy=Static
reserved-memory=0:memory=2Gi
kube-reserved=cpu=1,memory=1Gi
system-reserved=memory=1Gi— topology-manager-policy is the name of topology manager policy to use. `single-numa-node` is creating pod only when requested CPU and memory for pod id in single NUMA
— topology-manager-scope represents the scope of topology hint generation that topology manager requests and hint providers generate. Can be container or `pod`.
— cpu-manager-policy is the name of cpu manager. `static` will enable single NUMA reservation of CPU
— memory-manager-policy is the name of memory manager. `Static` will enable single NUMA reservation of memory
— reserved-memory, memory manager will not allocate reserved memory for container workloads. `0:memory=2Gi` will reserve in numa0 2Gi of memory (you can reserve memory on every numa with specification of list to this argument)
— kube-reserved specify how many from reserved memory and cpu will use kube
— system-reserved specify how many resources from reserved resources will take system. (For example, when kube-reserved memory is 1Gi and system-reserved memory is 1Gi, then reserved-memory must be 2Gi)
To run pods/containers in one NUMA, pod/container must be in guaranteed QoS (pod/container scheduled in one NUMA must have specified resource limit same as resource request). Example of guaranteed QoS:
apiVersion: v1
kind: Pod
metadata:
name: numarunner
namespace: default
spec:
containers:
- name: numarunner-0
image: numarunner
resources:
limits:
memory: "200Mi"
cpu: "700m"
requests:
memory: "200Mi"
cpu: "700m"3. Lets configure NUMA in RKE2
Its easy to configure kubelet in RKE2. All you need is configure Ansible variable rke2_server_options. Don’t forget! The system reserved resources, kube reserved resources, and reserved resources must sum up to your server’s total resources.
rke2_server_options:
- "kubelet-arg: ['topology-manager-policy=single-numa-node', 'topology-manager-scope=pod', 'cpu-manager-policy=static', 'memory-manager-policy=Static', 'reserved-memory=0:memory=2Gi', 'kube-reserved=cpu=1,memory=1Gi', 'system-reserved=memory=1Gi']"4. Lets cry with NUMA in AWS
On AWS managed nodes, each node has predefined reserved memory. This value must match — reserved-memory value! It’s complicated, because the exact values of AWS predefined memory is different on each instance type. Yes, its written on various pages and also published how to calculate them (based on max number of pods for specific instance type), but during the writing of this article it didn’t always fit.
When you configured wrong amount of memory, your kubelet on EKS node will log (journalctl -u kubelet):
Oct 02 08:19:20 ip-172–31–25–233.ec2.internal kubelet[8243]: Error: failed to run Kubelet: the total amount "0" of type "memory" is not equal to the value "8462Mi" determined by NodeWith this error, we have all information needed to configure NUMA on exactly your instance type:
export KUBELET_EXTRA_ARGS='--topology-manager-policy=single-numa-node --topology-manager-scope=pod --cpu-manager-policy=static --memory-manager-policy=Static --reserved-memory="0:memory=8462Mi" --kube-reserved="cpu=1"'5. It’s probably not for everyday use…?
At KubeCon 2024 in Paris, we stumbled upon a presentation about leveraging NUMA in Kubernetes. A new technology was introduced, making configuration so seamless that I found myself asking why I hadn’t started using it in my work yet: https://intel.github.io/cri-resource-manager/stable/docs/policy/balloons.html
Notes:
— Let me know if the article helped you!
