01001101
Published in

01001101

Azure Kubernetes Service — why you should take care of your nodes

Azure Kubernetes Service is a fully managed Kubernetes Cluster provided by Azure. This means that you don’t have to care about anything related to the Kubernetes infrastructure and just care about your apps deployed on it. Unfortunately, that is not entirely true with regard to your worker nodes as mentioned in the documentation:

To protect your clusters, security updates are automatically applied to Linux nodes in AKS. These updates include OS security fixes or kernel updates. Some of these updates require a node reboot to complete the process. AKS doesn’t automatically reboot these Linux nodes to complete the update process.

As mentioned above, Azure will automatically install all required updates and security patches on its own, but you have to decide when to restart your nodes if necessary. Of course, this is something that needs to be automated to make sure all of your worker nodes are secure and up-to-date. The below guide does not support Windows nodes. Also, this is only needed for the workers and not for the master nodes.

Every Linux distribution will create a file called /var/run/reboot-require as soon as a patch requires a reboot. This means we can use the file as an indicator to find nodes that require a reboot. In order to do this as well as to initiate the restart, the easiest way is to use a open-source project called kured (KUbernetes REboot Daemon) by Weaveworks.

Kured utilizes a DaemonSet which then schedules a Pod on every existing worker nodes to verify whether the reboot-require file exists. The DaemonSet ensures that all nodes are verified, including newly created ones. As soon as kured finds nodes which need to be restarted it will schedule a restart based on your definitions:

  • start and end hours (only from 2–5 am)
  • defined days (only on weekends)
  • prevent reboots based on labeled pods
  • skip on active Prometheus alerts

Kured also exposes metrics that can be captured by Prometheus or can send Slack notifications via a Slack hook.

How to start?

First of all, you need to create a Service Account and some Rules and Bindings to provide the needed privileges (you can skip this part if you have not activated RBAC, which you shouldn't):

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kured
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "patch"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["list","delete","get"]
- apiGroups: ["apps"]
resources: ["daemonsets"]
verbs: ["get"]
- apiGroups: [""]
resources: ["pods/eviction"]
verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kured
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kured
subjects:
- kind: ServiceAccount
name: kured
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: kube-system
name: kured
rules:
- apiGroups: ["apps"]
resources: ["daemonsets"]
resourceNames: ["kured"]
verbs: ["update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
namespace: kube-system
name: kured
subjects:
- kind: ServiceAccount
namespace: kube-system
name: kured
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kured
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kured
namespace: kube-system
---

Now you can create the DeamonSet:

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kured # Must match `--ds-name`
namespace: kube-system # Must match `--ds-namespace`
spec:
selector:
matchLabels:
name: kured
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
name: kured
spec:
serviceAccountName: kured
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
hostPID: true
restartPolicy: Always
nodeSelector:
beta.kubernetes.io/os: linux
containers:
- name: kured
image: docker.io/weaveworks/kured:1.2.0
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
env:
- name: KURED_NODE_ID
valueFrom:
fieldRef:
fieldPath: spec.nodeName
command:
- /usr/bin/kured
- --ds-name=kured
- --ds-namespace=kube-system
- --reboot-days=su,sa
- --start-time=2:00
- --end-time=5:00
- --time-zone=Europe/Berlin
- --period=1h
- --reboot-sentinel=/var/run/reboot-required
---

Kured will automatically restart my nodes based on the above example on Saturdays and Sundays between 2–5 am when needed. Kured also ensures that the entire workload is shifted to other nodes within the cluster before a worker node gets restarted.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store