Disabling Transparent Huge Pages in Kubernetes

  • hostPID and securityContext seemed excessive
  • No checks if the setting actually changed
  • gcr.io/google-containers/startup-script:v1 is a relatively large image
  • Timing conflicts with pod scheduling

hostPID and securityContext

Instead of using hostPID and priviledged: true, we can mount the host’s /sysinto the pod as a volume.

volumes:
- name: sys
hostPath:
path: /sys
volumeMounts:
- name: sys
mountPath: /rootfs/sys

Checking if settings applied

This part is straight forward. We simply grep the property and return an appropriate exit code.

grep -q -F [never] /sys/kernel/mm/transparent_hugepage/enabled
grep -q -F [never] /sys/kernel/mm/transparent_hugepage/defrag

Large Images

This one is not a critical problem. gcr.io/google-containers/startup-scriptis 12.5MB, but since we are essentially just running a shell script, it can be changed to a slimmer image, like busyboxwhich has an image size of 1.15MB. Of course busybox is lacking the startup functionality of gcr.io/google-containers/startup-script. For this we can utilize initContainers which were unavailable at the time.

Pod Scheduling Conflicts

This problem is referring to a dependency conflict where redis or mongo can be scheduled on a node where the kernel-tuner has not yet completed. Since the process was started before the setting was applied, it will not receive the updated kernel settings and would need a restart.

  • kubectl label node needs RBAC permission (skip if it is not required). For my case, I used the service account node-controller that is created by default on kube-systemnamespace on GKE by setting serviceAccountName: node-controller
  • Pod needs to know the node name it lives on via Downward API
initContainers:
- name: label-node
image: swaglive/kubectl:1.11
command: ["kubectl"]
args: ["label", "node", "--overwrite", "$(NODE_NAME)", "sysctl/mm.transparent_hugepage.enabled=never", "sysctl/mm.transparent_hugepage.defrag=never"]
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: sysctl/mm.transparent_hugepage.enabled
operator: In
values:
- "never"
- key: sysctl/mm.transparent_hugepage.defrag
operator: In
values:
- "never"

Putting it all together

--

--

The Adventures of Me

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store