BACKUP AND RESTORE ON KUBERNETES HA CLUSTER

4 min readNov 10, 2023

All Kubernetes objects are stored on etcd. Periodically backing up the etcd cluster data is important to recover Kubernetes clusters under disaster scenarios, such as losing all control plane nodes. The snapshot file contains all the Kubernetes states and critical information.

In this kubernetes tutorial, you will learn the etcd backup and restore on Kubernetes cluster with an etcd snapshot.

etcd has a built-in snapshot mechanism.etcdctl is the command line utility that interacts with etcd for snapshots.

You can find various options given by etcdctl. For example:

For durability and high availability, run etcd as a multi-node cluster in production and back it up periodically. A five-member cluster is recommended in production.

Now it is time for taking a snapshot using etcd-client:

1. I download etcd-client on my ubuntu machine, you can find more information about OS and cluster installation steps from this link for this environment.

sudo apt install etcd-client

2. Check member list of etcd :

sudo ETCDCTL_API=3 etcdctl --endpoints localhost:2379 --cert=/etc/kubernetes/pki/etcd/server.crt  --key=/etc/kubernetes/pki/etcd/server.key --cacert=/etc/kubernetes/pki/etcd/ca.crt  member list
7298d0ffe808f928, started, controller2, https://10.252.54.133:2380, https://10.252.54.133:2379
97548e4644ef8b94, started, controller1, https://10.252.54.45:2380, https://10.252.54.45:2379
ac50bf4ccc6f21b0, started, controller3, https://10.252.54.88:2380, https://10.252.54.88:2379

3. Start taking a snaphot to /tmp on any controller host(in this scenario, controller1):

 sudo ETCDCTL_API=3 etcdctl --endpoints localhost:2379   --cert=/etc/kubernetes/pki/etcd/server.crt   --key=/etc/kubernetes/pki/etcd/server.key   --cacert=/etc/kubernetes/pki/etcd/ca.crt   snapshot save /tmp/etcdbackup
2023-11-10 13:06:18.995931 I | clientv3: opened snapshot stream; downloading
2023-11-10 13:06:19.750636 I | clientv3: completed snapshot read; closing
Snapshot saved at /tmp/etcdbackup

4. Create a test pod called testngninx , current output for k get pods:

alcalab@controller1:~$ kubectl get pods
NAME                        READY   STATUS    RESTARTS   AGE
nginx-app-5777b5f95-2m8fz   1/1     Running   0          43s
nginx-app-5777b5f95-mh4wb   1/1     Running   0          43s

alcalab@controller1:~$ kubectl run testnginx --image=nginx
pod/testnginx created

alcalab@controller1:~$ kubectl get pods
NAME                        READY   STATUS    RESTARTS   AGE
nginx-app-5777b5f95-2m8fz   1/1     Running   0          113s
nginx-app-5777b5f95-mh4wb   1/1     Running   0          113s
testnginx                   1/1     Running   0          86s

6. Go to cd /etc/kubernetes/manifests directory and move all contents to another place for all controller nodes:

alcalab@controller1:~$ cd /etc/kubernetes/manifests
alcalab@controller1:/etc/kubernetes/manifests$ sudo ls ..
admin.conf  controller-manager.conf  kubelet.conf  manifests  pki  scheduler.conf  tmp
alcalab@controller1:/etc/kubernetes/manifests$ sudo mv * ..
alcalab@controller1:/etc/kubernetes/manifests$ ls
alcalab@controller1:/etc/kubernetes/manifests$ ls ../
admin.conf               etcd.yaml            kube-controller-manager.yaml  kube-scheduler.yaml  pki             tmp
controller-manager.conf  kube-apiserver.yaml  kubelet.conf                  manifests            scheduler.conf

7. Restore backup on controller1 where the backup was taken:

For controller1 restore /tmp/etcdbackup to /var/lib/etcd-new, do not restore orijinal place like /var/lib/etcd/

alcalab@controller1:/etc/kubernetes/manifests$ sudo ETCDCTL_API=3 etcdctl snapshot restore /tmp/etcdbackup \
  --name controller1 \
  --initial-cluster controller1=https://10.252.54.45:2380,controller2=https://10.252.54.133:2380,controller3=https://10.252.54.88:2380 \
  --data-dir /var/lib/etcd-new \
  --initial-advertise-peer-urls https://10.252.54.45:2380 \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key  \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt
2023-11-10 16:12:03.753424 I | mvcc: restore compact to 4485
2023-11-10 16:12:03.774471 I | etcdserver/membership: added member 2cf569c03fd68935 [https://10.252.54.133:2380] to cluster 6225e4007f2763e7
2023-11-10 16:12:03.774668 I | etcdserver/membership: added member 55529448cf60e3aa [https://10.252.54.88:2380] to cluster 6225e4007f2763e7
2023-11-10 16:12:03.774743 I | etcdserver/membership: added member 97548e4644ef8b94 [https://10.252.54.45:2380] to cluster 6225e4007f2763e7

After restore , check from controller 2 or 3, the new path should be created

root@controller2:/etc/kubernetes/manifests# ls /var/lib/etcd-new/member/
snap  wal

8. Check the new path on controller1

alcalab@controller1:/etc/kubernetes/manifests$ sudo ls /var/lib/etcd-new/member
snap  wal

9. sudo vi /etc/kubernetes/etcd.yaml go to hostPath and change volume from /var/lib/etcd to /var/lib/etcd-new(for all controller nodes)

10. Move the files on step 6 to original location(for all controller nodes)

alcalab@controller1:/etc/kubernetes/manifests$ sudo mv ../*.yaml .
alcalab@controller1:/etc/kubernetes/manifests$ ls
etcd.yaml  kube-apiserver.yaml  kube-controller-manager.yaml  kube-scheduler.yaml

11- Now you should not see the pod testnginx, because it was created after backup:

alcalab@controller1:/etc/kubernetes/manifests$ kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
nginx-app-5c64488cdf-27tgm   1/1     Running   0          10d
nginx-app-5c64488cdf-p2rks   1/1     Running   0          10d

Conclusion:

etcd database, It is an integral part of the system containing the information about cluster states. It can be backed up either manually or automatically, depending on your backup solution. The manual method is via etcdctl snapshot save db command, which creates a single file with the name snapshot.db. etcd backup, and restore are essential tasks in Kubernetes cluster administration.

BACKUP AND RESTORE ON KUBERNETES HA CLUSTER

Written by Murat Bilal