Kubernetes: Backups and recovery

If you are into kubernetes world, you know you need to setup backups sooner than later and this article will guide you on why/what/how to backup.

Why?

There are many use cases why you need to backup the state of your cluster. Here are some reasons:

  • To recover from disasters: such as someone accidentally knocking off a namespace
  • To test upgrades: one can quickly snip up a new cluster from backups and test scenarios such as kubernetes upgrades or if you are using a service mesh such as istio upgrades
  • Replicate your environment: Once you’ve a state of k8s, you can selectively choose what to re-create and can easily replicate that into your other environments such as staging.
  • Scheduled migrations: Say, you want to migrate from one cloud provider to another, creating a new cluster and re-playing the kubernetes state from backups shouldn’t be a big deal.

What?

Now, what to backup?

  • Every resource in kubernetes platform is treated as an API object. So, you need to backup all k8s resources as configs.
  • If you are running stateful containers, you may want to take backups of your persistent volumes.

How?

I’ve investigated couple of tools:

  1. Ark from Heptio: This looks like a promising project with its own set of utilities to backup and restore. It can also call underlying cloud provider for example to take snapshots of persistent volumes.
  2. kube-backup : All it does is, export configured k8s resources using kubectl and pushes them to a git repository of your choice. It allows you to dump k8s secrets as well into the repository using git-crypt. However, you can’t use it to snapshot persistent volumes.

If all your apps are stateless in your k8s cluster, you can choose solution#2 but if you need to recover state as well i.e., if you use persistent volumes, you can consider #1.

Personally, I liked #2 because you get to see all that happened in your cluster at one place and that is your git repository. I’ve made some fixes and minor improvements to kube-backup and you can checkout my repo here: https://github.com/krishnapmv/kube-backup

While it has instructions on how to setup backups, it doesn’t talk about how to recover from them. So, the next section talks about how to use these backups and create a new cluster.

How to use backups (from kube-backup) to create a new cluster?

In this example, I use GKE but you can pretty much use any hosted container orchestration platform such as AKS.

It assumes that:

  • you’ve required access to your GCP project
  • you already configured your gcloud CLI
  • configured your VPC

Create a cluster using GKE. Note the pod-network : 10.60.0.0/15

gcloud container clusters create test-cluster — additional-zones=asia-south1-c — cluster-ipv4-cidr=10.60.0.0/15 — cluster-version=1.9.6-gke.1 — disk-size=20 — zone=asia-south1-b subnetwork=<> — network=<>

This creates a new k8s cluster named “test-cluster” and a default “node pool”. node pool is a subset of k8s worker nodes in the cluster that has same config.

You can as well create a new node-pool of your choice, labeling and tainting them appropriately. For example, below command creates a node pool named “service-pool” and attaches label “nodepool=service”.

gcloud container node-pools create service-pool-1 — cluster mumbai-test — disk-size=100 — enable-autorepair — image-type=COS — machine-type=custom-32–65536 — node-labels=nodepool=service — enable-autoscaling — max-nodes=25 — min-nodes=1

Once your k8s cluster is ready, setup kubectl and confirm that the cluster is indeed ready by running “kubectl get nodes”. It should show your worker nodes and their state.

Now that your cluster is ready, you can start recovering from backups. First you need to grant yourself admin permissions:

kubectl create clusterrolebinding <user>-admin — clusterrole cluster-admin — user <your_email>

Now, to recover from backups, it is as simple as running:

“kubectl apply -f <backedup_dir or backeup_file>”

However, the order is important. First you need to recover global resources such as namespaces, customresourcedefinitions etc.

After that you can go per namespace, starting with kube-system, default and then on your custom namespaces. For example, to recover kube-system configs on to new cluster in same namespace, you run:

“kubectl -n kube-public apply -f kube-public/”

Thats it! Its as simple as that. I hope this post is helpful and if it does, please do share. And here is the link to my github repo for this project: https://github.com/krishnapmv/kube-backup