BACKUP AND RESTORE ON KUBERNETES HA CLUSTER

Murat Bilal
4 min readNov 10, 2023

--

All Kubernetes objects are stored on etcd. Periodically backing up the etcd cluster data is important to recover Kubernetes clusters under disaster scenarios, such as losing all control plane nodes. The snapshot file contains all the Kubernetes states and critical information.

In this kubernetes tutorial, you will learn the etcd backup and restore on Kubernetes cluster with an etcd snapshot.

etcd has a built-in snapshot mechanism.etcdctl is the command line utility that interacts with etcd for snapshots.

You can find various options given by etcdctl. For example:

etcdctl utility

For durability and high availability, run etcd as a multi-node cluster in production and back it up periodically. A five-member cluster is recommended in production.

Now it is time for taking a snapshot using etcd-client:

1. I download etcd-client on my ubuntu machine, you can find more information about OS and cluster installation steps from this link for this environment.

sudo apt install etcd-client

2. Check member list of etcd :

sudo ETCDCTL_API=3 etcdctl --endpoints localhost:2379 --cert=/etc/kubernetes/pki/etcd/server.crt  --key=/etc/kubernetes/pki/etcd/server.key --cacert=/etc/kubernetes/pki/etcd/ca.crt  member list
7298d0ffe808f928, started, controller2, https://10.252.54.133:2380, https://10.252.54.133:2379
97548e4644ef8b94, started, controller1, https://10.252.54.45:2380, https://10.252.54.45:2379
ac50bf4ccc6f21b0, started, controller3, https://10.252.54.88:2380, https://10.252.54.88:2379

3. Start taking a snaphot to /tmp on any controller host(in this scenario, controller1):

 sudo ETCDCTL_API=3 etcdctl --endpoints localhost:2379   --cert=/etc/kubernetes/pki/etcd/server.crt   --key=/etc/kubernetes/pki/etcd/server.key   --cacert=/etc/kubernetes/pki/etcd/ca.crt   snapshot save /tmp/etcdbackup
2023-11-10 13:06:18.995931 I | clientv3: opened snapshot stream; downloading
2023-11-10 13:06:19.750636 I | clientv3: completed snapshot read; closing
Snapshot saved at /tmp/etcdbackup

4. Create a test pod called testngninx , current output for k get pods:

alcalab@controller1:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-app-5777b5f95-2m8fz 1/1 Running 0 43s
nginx-app-5777b5f95-mh4wb 1/1 Running 0 43s

alcalab@controller1:~$ kubectl run testnginx --image=nginx
pod/testnginx created

alcalab@controller1:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-app-5777b5f95-2m8fz 1/1 Running 0 113s
nginx-app-5777b5f95-mh4wb 1/1 Running 0 113s
testnginx 1/1 Running 0 86s

6. Go to cd /etc/kubernetes/manifests directory and move all contents to another place for all controller nodes:

alcalab@controller1:~$ cd /etc/kubernetes/manifests
alcalab@controller1:/etc/kubernetes/manifests$ sudo ls ..
admin.conf controller-manager.conf kubelet.conf manifests pki scheduler.conf tmp
alcalab@controller1:/etc/kubernetes/manifests$ sudo mv * ..
alcalab@controller1:/etc/kubernetes/manifests$ ls
alcalab@controller1:/etc/kubernetes/manifests$ ls ../
admin.conf etcd.yaml kube-controller-manager.yaml kube-scheduler.yaml pki tmp
controller-manager.conf kube-apiserver.yaml kubelet.conf manifests scheduler.conf

7. Restore backup on controller1 where the backup was taken:

For controller1 restore /tmp/etcdbackup to /var/lib/etcd-new, do not restore orijinal place like /var/lib/etcd/

alcalab@controller1:/etc/kubernetes/manifests$ sudo ETCDCTL_API=3 etcdctl snapshot restore /tmp/etcdbackup \
--name controller1 \
--initial-cluster controller1=https://10.252.54.45:2380,controller2=https://10.252.54.133:2380,controller3=https://10.252.54.88:2380 \
--data-dir /var/lib/etcd-new \
--initial-advertise-peer-urls https://10.252.54.45:2380 \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt
2023-11-10 16:12:03.753424 I | mvcc: restore compact to 4485
2023-11-10 16:12:03.774471 I | etcdserver/membership: added member 2cf569c03fd68935 [https://10.252.54.133:2380] to cluster 6225e4007f2763e7
2023-11-10 16:12:03.774668 I | etcdserver/membership: added member 55529448cf60e3aa [https://10.252.54.88:2380] to cluster 6225e4007f2763e7
2023-11-10 16:12:03.774743 I | etcdserver/membership: added member 97548e4644ef8b94 [https://10.252.54.45:2380] to cluster 6225e4007f2763e7

After restore , check from controller 2 or 3, the new path should be created

root@controller2:/etc/kubernetes/manifests# ls /var/lib/etcd-new/member/
snap wal

8. Check the new path on controller1

alcalab@controller1:/etc/kubernetes/manifests$ sudo ls /var/lib/etcd-new/member
snap wal

9. sudo vi /etc/kubernetes/etcd.yaml go to hostPath and change volume from /var/lib/etcd to /var/lib/etcd-new(for all controller nodes)

10. Move the files on step 6 to original location(for all controller nodes)

alcalab@controller1:/etc/kubernetes/manifests$ sudo mv ../*.yaml .
alcalab@controller1:/etc/kubernetes/manifests$ ls
etcd.yaml kube-apiserver.yaml kube-controller-manager.yaml kube-scheduler.yaml

11- Now you should not see the pod testnginx, because it was created after backup:

alcalab@controller1:/etc/kubernetes/manifests$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-app-5c64488cdf-27tgm 1/1 Running 0 10d
nginx-app-5c64488cdf-p2rks 1/1 Running 0 10d

Conclusion:

etcd database, It is an integral part of the system containing the information about cluster states. It can be backed up either manually or automatically, depending on your backup solution. The manual method is via etcdctl snapshot save db command, which creates a single file with the name snapshot.db. etcd backup, and restore are essential tasks in Kubernetes cluster administration.

--

--