Kubernetes StatefulSet Recovery from AWS Snapshots

Rosemary Wang
May 23, 2018 · 9 min read

It’s been a while! I’ve been on the road a lot lately. Meanwhile, my thoughts have been occupied with the nuances of stateful applications and the question of persistence. A colleague provoked an interesting thought experiment about Kubernetes StatefulSets, namely if we had to do disaster recovery on a StatefulSet. In the worst case scenario where everything is deleted, what is the minimum back-up we need to recover that stateful application? I pondered for quite some time…

Red Rock Canyon, a neat place to contemplate the nuances of persistence.

I tried it and tested the bounds of what is possible. I ended up not sticking to my main tenet of deploying everything locally because I wanted to try a more accurate production scenario on a cloud provider. Here, I’ll go the manual steps I took to recover a Kubernetes StatefulSet using a snapshot of an Elastic Block Store (EBS) volume on Amazon Web Services (AWS).

What’s a Kubernetes StatefulSet?

StatefulSets allow you to deploy stateful applications in Kubernetes by scheduling a pod backed by a storage volume of your choice. It neatly handles mounting and lifecycle tasks — particularly useful for the times you want to persist some data. StatefulSets are generally available in Kubernetes 1.9, so a lot of what you’ll see here requires version 1.9 or higher. You can choose a lot of backend storage classes, some of which dynamically provision the storage volume for you. This is handled through a PersistentVolumeClaim (PVC), which creates the backend PersistentVolume (PV). A Persistent Volume Claim is a manifest that describes a request for storage. I discovered two important things to keep in mind:

  • Once I create a PersistentVolumeClaim and PersistentVolume, their specifications are immutable. I can’t edit them in-place and re-create them.

Starting with Minikube

For my own edification, I tracked what happens when I create a StatefulSet in Kubernetes. Playing with it locally in Minikube, I used the example on Kubernetes’s StatefulSets explanation with some modifications. The example uses a standard nginx container that reads from a volume containing its HTML pages.

In the manifest, I noticed a volumeClaimTemplates directive, which dynamically provisioned a storage backend to that specification. I added a storageClassName directive because I wanted to be explicit about which storage backend I was using. For a list of valid storage classes that were available in my cluster, I ran:

$ kubectl get storageclass
NAME                 PROVISIONER                AGE
standard (default)   k8s.io/minikube-hostpath   9m

You can also see the Kubernetes reference for a full list of StorageClasses. These will differ depending on where your Kubernetes cluster is hosted.

When I created it in Minikube, I retrieved the StatefulSet and its corresponding pod. When I examined the pod details, I saw that a volume had been mounted to the container with a PersistentVolumeClaim.

$ kubectl get statefulset
NAME      DESIRED   CURRENT   AGE
web       1         1         23h
$ kubectl get pods -o yaml
...
    volumes:
    - name: www
      persistentVolumeClaim:
        claimName: www-web-0
...

The claim didn’t reveal too much new information to me. I looked at the specification for the PersistentVolume and found more useful information about the hostPath, status, and claim reference.

$ kubectl get pv pvc-6611a9f6-4d35-11e8-8278-08002763d4fe -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
...
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: www-web-0
    namespace: default
    resourceVersion: "306"
    uid: 6611a9f6-4d35-11e8-8278-08002763d4fe
  hostPath:
    path: /tmp/hostpath-provisioner/pvc-6611a9f6-4d35-11e8-8278-08002763d4fe
    type: ""
  persistentVolumeReclaimPolicy: Delete
  storageClassName: standard
status:
  phase: Bound

The hostPath interested me the most. This is the mapping of the PersistentVolume to an actual directory on the host. When I logged into my Minikube machine using minikube ssh, I noticed that the directory in Minikube is named after the PersistentVolume. I inserted my own index.html file into the directory to see if nginx would serve it.

$ ssh -i ~/.minikube/machines/minikube/id_rsa docker@$(minikube ip) sudo 'ls /tmp/hostpath-provisioner'
pvc-6611a9f6-4d35-11e8-8278-08002763d4fe$ ssh -i ~/.minikube/machines/minikube/id_rsa docker@$(minikube ip) sudo 'echo Hello! > /tmp/hostpath-provisioner/pvc-6611a9f6-4d35-11e8-8278-08002763d4fe/index.html'

When I made a call to the service, it returned “Hello!”, exactly what I wanted to see.

$ curl -k -H "Authorization:Bearer ${token}" https://$(minikube ip):8443/api/v1/namespaces/default/services/nginx:web/proxy/
Hello!

If you’re curious to learn more about how I played around with Minikube to learn about StatefulSets, PersistentVolumes, and PersistentVolumeClaims, check out this Github repository with a demo script for reference.

Try it on a Public Cloud

When I deleted a StatefulSet, Kubernetes did not delete the PersistentVolume or PersistentVolumeClaim. When I re-created the StatefulSet, it read the tags from the existing PersistentVolumeClaim and attached everything to the new StatefulSet. However, what happens when the actual volume is deleted entirely?

In the case of my thought experiment, I had a hypothesis:

If I delete the persistent volume entirely but I have a snapshot of it, I can create a StatefulSet that re-attaches to a persistent volume restored from that snapshot.

Let’s try this. I wanted to do this outside of Minikube to better reflect what might happen in a production cluster, so I chose AWS since I already have templates to deploy to it.

Keep in mind there are many, many ways to deploy Kubernetes. I use kops but there are many other options. Templates I use for rapid testing are located in this Github repository with a brief README on how to run it.

Create the StatefulSet

My StatefulSet leveraged a sample application I wrote called hello-stateful. Its sole purpose is to write the current time and date to /usr/share/hello/stateful.log. This directory was mounted as a persistent volume. When deploying a StatefulSet to a cluster with kops, it defaults the backend storage volume to the default cloud provider — in this case, an AWS Elastic Block Store (EBS) device. The statefulset.yaml manifest I used is below.

I applied this using kubectl apply -f statefulset.yaml. When I retrieve all of the volume, pod, and PersistentVolumeClaim information, I can see what is written to my stateful.log file.

$ volume=$(kubectl get pvc | grep hello | tail -n 1 | cut -d' ' -f9)$ pvc=$(kubectl get pvc | grep hello | tail -n 1 | cut -d' ' -f1)$ pod=$(kubectl get pods | grep hello | cut -d' ' -f1)$ kubectl exec ${pod} -- cat /usr/share/hello/stateful.log
Sun May 20 11:32:35 UTC 2018
Sun May 20 11:32:45 UTC 2018
Sun May 20 11:32:55 UTC 2018
Sun May 20 11:33:05 UTC 2018

Take the Snapshot

The stateful.log file contained a date and time written every ten seconds. I retrieved the volume ID information so I could pass it to AWS and create a snapshot. Snapshots back up the volume in AWS.

$ volumeID=$(kubectl get pv ${volume} -o jsonpath='{.spec.awsElasticBlockStore.volumeID}' | cut -d'/' -f4)$ snapshotID=$(aws ec2 create-snapshot --volume-id ${volumeID} --region=${REGION} | jq -r .SnapshotId)$ echo $snapshotID
snap-0cb9ddd144d8c047b

I saved the last line that was written to the snapshot for my own reference. It wasn’t fully accurate since the application wrote every ten seconds but I wanted to track the approximate time and date that I backed up my volume.

$ lastline=$(kubectl exec ${pod} -- cat /usr/share/hello/stateful.log | tail -n 1)$ echo $lastline
Sun May 20 11:40:55 UTC 2018

Disaster! Delete the Backend Volume.

I imagined the worst case scenario where my backend volume was completely deleted. First, I deleted the StatefulSet to make sure the backend volume doesn’t get re-created. Then, I deleted the backend volume without deleting the PersistentVolume or PersistentVolumeClaim!

$ kubectl delete -f statefulset.yaml
service "hello-stateful" deleted
statefulset "hello-stateful" deleted$ aws ec2 delete-volume --volume-id ${volumeID} --region=${REGION}$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIMSTORAGECLASS   REASON    AGE
pvc-70bb3b45-5c21-11e8-8495-0261cf26af3a   1Gi        RWO            Delete           Bound     default/log-hello-stateful-0gp2                      9m$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIMSTORAGECLASS   REASON    AGE
pvc-70bb3b45-5c21-11e8-8495-0261cf26af3a   1Gi        RWO            Delete           Bound     default/log-hello-stateful-0gp2                      10m

What happened when I re-created the StatefulSet and tried to access my stateful.log file again? Did I see my last line?

$ kubectl apply -f statefulset.yaml
service "hello-stateful" created
statefulset "hello-stateful" created$ kubectl exec ${pod} -- cat /usr/share/hello/stateful.log
error: unable to upgrade connection: container not found ("hello-stateful")

No, I didn’t! My PersistentVolume and its PersistentVolume Claim remained but had no volume to reference.

Well, now what? How do I recover my data into a new volume and re-attach it to a StatefulSet?

Recover the StatefulSet

Remember, I had a snapshot of the original volume. This backed up some of my data, so I created a new volume from that back-up.

You’ll notice an AWS tag on the command below, labeled KubernetesCluster. In order for your cluster to recognize that it has a new volume, you must tag it with the cluster name. Otherwise, the cluster ignores the volume.

$ aws ec2 create-volume --availability-zone=${REGION}a --snapshot-id ${snapshotID} --region=${REGION} --volume-type gp2 --tag-specifications 'ResourceType=volume,Tags=[{Key=KubernetesCluster,Value=joatmon08.k8s.local}]'
{
    "AvailabilityZone": "us-west-2a",
    "CreateTime": "2018-05-20T12:05:23.000Z",
    "Encrypted": false,
    "Size": 1,
    "SnapshotId": "snap-0cb9ddd144d8c047b",
    "State": "creating",
    "VolumeId": "vol-0e65bfd7da63a06b3",
    "Iops": 100,
    "Tags": [
        {
            "Key": "KubernetesCluster",
            "Value": "joatmon08.k8s.local"
        }
    ],
    "VolumeType": "gp2"
}

To be honest, I didn’t find an elegant way to re-attach the backend volume to the existing PersistentVolume and PersistentVolumeClaim because they are immutable. Instead, I had to create my own PersistentVolume and PersistentVolumeClaim manifests with the volume information.

Here is the PersistentVolume template that I used.

Note that you must substitute the {{ AWS region }} (e.g., us-east-1) and the {{ volume ID }} generated from creating a new volume based on the snapshot command (see VolumeId in output of aws ec2 create-volume).

$ kubectl apply -f pv.yaml
persistentvolume "joatmon08-pv" created

$ kubectl get pv
NAME           CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM     STORAGECLASS   REASON    AGE
joatmon08-pv   1Gi        RWO            Delete           Available             gp2                      6s

Then, I created the PersistentVolumeClaim using the manifest below.

The metadata directive must contain a label for the app that requires the PersistentVolume.

$ kubectl apply -f pv-claim.yaml
persistentvolumeclaim "joatmon08-pvc" created$ kubectl get pvc
NAME            STATUS    VOLUME         CAPACITY   ACCESS MODES   STORAGECLASS   AGE
joatmon08-pvc   Bound     joatmon08-pv   1Gi        RWO            gp2            8s

Now, I created the StatefulSet and attached it to my new volume. Unfortunately, this means I had to create a new StatefulSet manifest. The one I created previously just recreated new PersistentVolume and PersistentVolumeClaim due to the volumeClaimTemplate directive.

The main difference between the manifest you see here and the previous one is that it doesn’t use template to create the PersistentVolumeClaim. Instead, it explicitly attaches to a claimName.

I applied my new StatefulSet and watched it start.

$ kubectl apply -f statefulset-restore.yaml
service "hello-stateful" created
statefulset "hello-stateful" created$ kubectl get pods
NAME               READY     STATUS    RESTARTS   AGE
hello-stateful-0   1/1       Running   0          23s

Validate the Persistent Data

Once I had a new StatefulSet with my old data, I wanted to validate that it indeed contained the last time and date I saved the volume. I hypothesized that I would see a stream of logs from the last time the volume was backed up (11:40AM), a break in the timing of the log, and the log starting up with a new time. I checked stateful.log to confirm.

$ kubectl exec hello-stateful-0 -- cat /usr/share/hello/stateful.log
...
Sun May 20 11:38:45 UTC 2018
Sun May 20 11:38:55 UTC 2018
Sun May 20 12:16:53 UTC 2018
Sun May 20 12:17:03 UTC 2018

My hypothesis was proven correct! I expected some lag from the time I took the snapshot to the time I retrieved its last line, so the log above is exactly what I was hoping to see.

Summary

I learned a lot from this exercise.

  • Kubernetes abstractions are very powerful in empowering me to understand similar patterns of deployment and creation, whether locally or on the public cloud.

To see the code I used for learning the patterns locally with Minikube, take a look at this script which walks through some of the commands I used. I created a similar script for AWS that walks through this blog post in greater detail.

The References

Rosemary Wang

Written by

dev advocate @hashicorp. explorer of infrastructure-as-code. enthusiast of cloud. formerly @thoughtworks. curious traveller & foodie.