Disabling Transparent Huge Pages in Kubernetes

Allan Lei
Allan Lei
Sep 15, 2018 · 2 min read

I’ve recently needed to revisit some of our deployments which were created in the earlier days of GKE where some useful features were not available. One component revisited was the disabling the kernel setting Transparent Huge Pages (THP) recommended for mongo and redis.

The solution at the time was to use a Daemonset running a startup script with gcr.io/google-containers/startup-script:v1.

There are a couple of areas that could be improved

  • hostPID and securityContext seemed excessive
  • No checks if the setting actually changed
  • gcr.io/google-containers/startup-script:v1 is a relatively large image
  • Timing conflicts with pod scheduling

hostPID and securityContext

volumes:
- name: sys
hostPath:
path: /sys
volumeMounts:
- name: sys
mountPath: /rootfs/sys

Checking if settings applied

grep -q -F [never] /sys/kernel/mm/transparent_hugepage/enabled
grep -q -F [never] /sys/kernel/mm/transparent_hugepage/defrag

Large Images

Pod Scheduling Conflicts

For this problem, we can use labels on nodes in conjunction with nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution. This solution will pend the pod scheduling until a node with the proper labels exist.

To label a node within a pod, there are some prerequisites.

  • kubectl label node needs RBAC permission (skip if it is not required). For my case, I used the service account node-controller that is created by default on kube-systemnamespace on GKE by setting serviceAccountName: node-controller
  • Pod needs to know the node name it lives on via Downward API
initContainers:
- name: label-node
image: swaglive/kubectl:1.11
command: ["kubectl"]
args: ["label", "node", "--overwrite", "$(NODE_NAME)", "sysctl/mm.transparent_hugepage.enabled=never", "sysctl/mm.transparent_hugepage.defrag=never"]
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName

Now to add the label restriction to pods that need it.

affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: sysctl/mm.transparent_hugepage.enabled
operator: In
values:
- "never"
- key: sysctl/mm.transparent_hugepage.defrag
operator: In
values:
- "never"

Putting it all together

Using it with a redisdeployment

Allan Lei

Written by

Allan Lei

The Adventures of Me