(An Attempt at) Learning Kubernetes Operators for StatefulSet Recovery

Rosemary Wang
14 min readOct 9, 2018

--

I got a comment on Kubernetes StatefulSet Recovery from AWS Snapshots asking…

How do we do this for Kubernetes Operators? One example will really help..

  • My first question was, “What’s an Operator?”
  • My second question was, “How is this relevant to disaster recovery?”

I love finding something new. I dug into it and found this world of Kubernetes Operator — and renewed my coding chops to write my own. It was tricky and I learned a lot — here is a bit of the journey and some of the lessons I took away when I created my own operator to back up and recover a StatefulSet on Minikube.

Before we begin…

This post assumes some familiarity with Kubernetes. At minimum, read a little bit about:

  • How to spin up an application in Kubernetes
  • How to mount volumes to Kubernetes pods
  • Kubernetes terminology
  • Golang fundamentals (the Operator SDK is written in Go)

For more information on manually re-creating StatefulSet recovery, check out Kubernetes StatefulSet Recovery from AWS Snapshots.

As of the time of this blog post, the Operator SDK is still under development. APIs and more are subject to change! The example may or may not work with continued development work on the Operator framework & SDK. The day this post was released, Kubernetes 1.12 even released a Volume Snapshot capability. Check it out if you’re curious!

If you want to go straight to the example, see the Github repository here.

What’s an Operator?

As per CoreOS’s documentation:

An Operator is a method of packaging, deploying and managing a Kubernetes application. A Kubernetes application is an application that is both deployed on Kubernetes and managed using the Kubernetes APIs and kubectl tooling.

This is pretty exciting for me. As an individual who tends to support applications that are pretty much blackboxes to me, it’s not always easy to know what to do when an application goes down (and whether or not I can do anything about it at all). A great example is that the other day, I was helping with a performance test and we had a service that consumed from a queue and inserted the data into a database. Pretty simple, right?

Not as simple as I thought. What happens when the application goes down? I think it’s stateless…but is it? What happens when the connection to the database is broken? How do I purge my queues? All of these are “operator” actions that are specific to an application, things that I might not know about but still need some automation to handle. The “Operator” framework encapsulates the lifecycle tasks associated with my application, all of the other “stuff” that we forget to automate.

How is this relevant to Disaster Recovery?

Initially, I wanted to determine how I could recover a stateful application that writes to AWS Elastic Block Storage (EBS) volumes if it ever went down (or was catastrophically deleted). In the end, I had a pretty complex sequence of staging AWS volumes, attaching PersistentVolumes, and generating PersistentVolumeClaims. My first instinct in automating this was to create a special deployment pipeline to handle all of this.

Looking in to the Operator framework, this sort of situation lends well to writing an Operator for handling. An Operator can handle the snapshot of the volume, setting up the PersistentVolume and PersistentVolumeClaim, and eventual re-creation of the StatefulSet. I looked into this further and tried to automate a Kubernetes StatefulSet recovery from a back-up, this time, using the Operator framework. Rather than try it right on AWS, I opted to try to make it work with Minikube.

Choose an Application to Add as Operator

I had an existing application called hello-stateful which simply wrote the date and time every ten seconds to a log file, under /usr/share/hello/stateful.log. I deployed it before and knew that if I could properly backup and restore this application, I would see the different time stamps noting when it was backed up and then restored.

When I thought about it in retrospect, this application probably could have been best deployed as a Deployment backed by a PersistentVolume. It didn’t need pod identity stickiness (didn’t need to resolve to hello-stateful-0) but at the time, I was more focused on reverse engineering the persistence aspect of the problem. I ran into two considerations:

  1. If this is an actual StatefulSet like Consul or etcd, I wouldn’t statically define by PersistentVolume. Instead, I’d want it dynamically created via a template and restore it using etcd- or Consul-approved flows.
  2. If I statically define the PersistentVolume as part of a StatefulSet, the multiple pods will write to a single volume.

It was still a valid exercise in writing an operator. I called my operator hello-stateful-operator, representing an operator that spins up different kinds of HelloStateful resources.

Examine the Sample Operator

Before I could do anything with an operator, I wanted to go through a quick sample of how to create my own and use one. The operator-framework/operator-sdk Github contains a good Quick Start, which I followed to reverse engineer all of the bootstrapping.

Running the operator-sdk command generates a simple outline of an Operator with all of the configurations and code necessary. However, I got a little stuck at this point interpreting all of the files and configurations. I referred to the operator-sdk user guide, which guided me through the basics. I’ll break down the Quick Start below with some of the important files. Let’s start with cmd/sample-operator/main.go.

func main() {
...
resource := "app.example.com/v1alpha1"
kind := "App"
namespace, err := k8sutil.GetWatchNamespace()
...
resyncPeriod := 5
sdk.Watch(resource, kind, namespace, resyncPeriod)
...
}

This is a pretty important declaration — this is where the operator is being told what to watch. In this case, it is watching the App resource in the namespace specified by the environment variable, WATCH_NAMESPACE.

Next, I checked out the pkg/apis/app/v1alpha1/types.go.

type AppSpec struct {
// Fill me
}
type AppStatus struct {
// Fill me
}

According to the user guide, this section modifies the CustomResource for the kind. In my case, this will modify the App CustomResource. A Kubernetes CustomResource is a declaration of an endpoint that stores API objects of a custom kind. For example, I am declaring a CustomResource for App that will contain a set of API objects having to do with App. Based on the CustomResource reference, the Operator framework acts as a controller — a component that constantly updates, creates, or deletes resources to match the desired state I declare as part of my CustomResource.

In reverse engineering this, I discovered that the Fill me comments above allow me to add in resources or properties that I might need for my kind. For example, my AppSpec might contain a property like the size of my volume or the volume type. My AppStatus might contain a backend volume path, volume ID, or snapshot ID. For now, I left it blank.

I built an image for my sample-operator pod and push it to my image repository.

$ operator-sdk build joatmon08/sample-operator
$ docker push joatmon08/sample-operator

I noticed that a new YAML configuration popped up under the deploy directory, called operator.yaml. This contains a Kubernetes Deployment that I can use to deploy my operator! There is also a cr.yaml to create my CustomResource that will trigger the reconciliation loop in the operator. In digging around further, I found that Operator SDK also had some boilerplate functions for testing too! I could run my tests end-to-end and figure out if my Operator is working properly.

There were so many pieces to be created so I created a diagram to remember which commands generated which files and why.

Diagram for generated files and a quick note on why.

Create My HelloStateful Resource

Where do I even start?! I knew I needed the PersistentVolume, PersistentVolumeClaim, and StatefulSet for my hello-stateful application. What I realized very quickly was that I needed to know how to work with the Kubernetes API in order to even create them. In an effort to bypass this, I thought:

Maybe if I just pass the entire declaration of the StatefulSet, PersistentVolume, & PersistentVolumeClaim using the cr.yaml, which is my example definition of my resource, I won’t have to declare a bunch of structs in my code.

It looked something like this:

Awesome! I just need to edit types.go to grab these from cr.yaml. I defined some properties under HelloStatefulSpec and ran operator-sdk generate k8s to create my definitions under zz_generated.deepcopy.go.

Did it work? Sure! But then I realized something I missed — how exactly was I going to write specific code to actually handle a backup and restore? When I backup and restore something, the process is going to be different depending on if it is in Minikube, Amazon Web Services, Google Cloud Platform, or otherwise. Sure, the above works, but now I’m asking a user to define a bunch of properties. I know:

  • how my hello-stateful application works
  • how it should be defined
  • how it behaves in Minikube

If I extend it to the user, why am I asking them to define everything for me? Instead, I should encapsulate everything into my own custom resource type.

(Actually) Write the Operator

I started to encapsulate the specification of my application’s StatefulSet, PersistentVolume, and PersistentVolumeClaim into my hello-stateful-operator. I learned a lot during this process, including the fact that I had to practice more clean code! Everything I had went under pkg/hellostateful in my operator.

One of the first things I discovered was that I needed to learn how to use the Kubernetes API more fluidly. The Kubernetes objects generally follow the same structure as the YAML or JSON manifests but it took a bit for me to figure out how do it in Golang. Using an IDE that could jump to definitions was vastly helpful.

Second, I finally ascertained at this point that an operator creates the custom resource you specify. The ramifications of this were not particularly clear to me during my reverse-engineering and didn’t make any sense until I tried to run kubectl create -f deploy/cr.yaml. Basically, just like I would declare a Deployment specification with a unique application name and labels, a CustomResource would also have a unique name and labels. I realized that statically defining a bunch of metadata wasn’t really going to cut it, so I looked into the memcached operator. At this point, I wrote some functions to apply labels to pods and volumes. I also wrote a function to assign an owner reference to ascertain which custom resource actually created everything. It made it infinitely easier to identify my resource. As a result, I could spin up a CustomResource leveraging my hello-stateful stack called joatmon08 and another called minikube-test, both of which use hello-stateful as part of their StatefulSets, with separate backend PersisentVolumes I could back up and restore.

Third, I realized I made some strong assumptions about how my application works and they had to be reflected in my code. These assumptions included:

  • Everything would run in Minikube.
  • Minikube mounts PersistentVolumes to /tmp/hostpath-provisioner (on the Minikube VM).
  • I would back up the PersistentVolume at /tmp/hostpath-provisioner/<volume> to /tmp/backup (on my Minikube VM) to test it.
  • PersistentVolumes are immutable. In order to restore the PersistentVolume and PersistentVolumeClaims, I have to statically define a new ones that would read from the /tmp/backup and bind everything.

Finally, it occurred to me that I sort of messed up how to pass namespace and name to my HelloStateful CustomResource. Like I mentioned before, I had to update the labels and owner reference to properly find my HelloStateful instance but I also needed to make sure that I passed my namespace and name to every single component created as part of that resource. All of my ObjectMeta properties in every object, from StatefulSet to CronJob, had to have the following:

ObjectMeta: metav1.ObjectMeta{
Name: cr.ObjectMeta.Name,
Namespace: cr.Namespace,
Labels: labels,
},

The only exception was PersistentVolumes, which can only be created at a cluster-level scope.

Get Status Information

As it turns out, backing up and restoring a volume requires a bit of metadata. I did end up using HelloStatefulStatus to store information about the path of the backend PersistentVolume on my host. I created a slice of strings titled BackendVolumes that stores the paths.

type HelloStatefulStatus struct {
BackendVolumes []string `json:"backendVolumes"`
IsRestored bool `json:"isRestored"`
}

This slice of strings puzzled me. How was I going to get the backend HostPath information for all of the PersistentVolumes spun up as part of HelloStateful? There wasn’t a neat Operator SDK library that would automatically connect me to the Kubernetes…from within Kubernetes. I had to use the Kubernetes client and configure it for two modes:

  1. When I run the operator in local mode, a la operator-sdk run local. This meant I had to pass my default Kubernetes configuration.
  2. When I run the operator in its own deployment, as it is meant to run. There is a neat InClusterConfig that will handle it for you.

This did the trick. I was able to call the Kubernetes API server in my cluster and retrieve the HostPath information for all of the PersistentVolumes in my StatefulSet.

RBAC Broke My Tests

I got it working locally using the operator-sdk run local configuration but midway through development, Operator SDK came out with code to help facilitate testing! At this point, I hadn’t tried the operator in its own container yet. I wrote the tests to check if the StatefulSet had been created properly and gave it a shot.

The tests failed.

$ operator-sdk test -t ./test/e2e
hellostateful_test.go:110: timed out waiting for the condition
FAIL
FAIL github.com/joatmon08/hello-stateful-operator/test/e2e 96.276s
Error: failed to exec go []string{"test", "./test/e2e/...", "-kubeconfig", "/Users/rwang/.kube/config", "-namespacedMan", "deploy/test/namespace-manifests.yaml", "-globalMan", "deploy/crd.yaml", "-root", "/Users/rwang/go/src/github.com/joatmon08/hello-stateful-operator", ""}: exit status 1

I wondered how to get the tests passing. First, I had to figure out how the tests worked. In summary…

  1. Tests spin up their own namespace, which I found by checking kubectl get ns.
  2. An operator pod is created within that namespace. The pod pulled my operator image from a registry (it doesn’t just build it). I found it by issuing kubectl get pods -n <namespace>.
  3. The tests run by creating a sample CustomResource (in my case, HelloStateful).

When I examined the operator, it came back with the following:

$ kubectl logs <operator pod name> -n <namespace>
time="2018-10-07T15:31:55Z" level=error msg="Failed to create persistentVolume: persistentvolumes is forbidden: User \"system:serviceaccount:hellostateful-hello-stateful-group-instance-1538926444:default\" cannot create persistentvolumes at the cluster scope"

Somehow, the default RBAC that was generated with the operator needed to be updated. Most importantly, since I was statically defining a PersisentVolume, I needed to add a ClusterRole to allow the cluster to spin them up.

A little trip-up here: the test itself dynamically creates a namespace so I had a difficult time binding the specific service account access to a namespace, considering it was renamed every time. I ended up applying the access to any service account in the cluster. If someone knows of a way to do this by regular expression, I’d love to know!

After a bit of trial and error, I created my final RBAC manifest to allow my operator to properly work. I had to add additional access to create CronJob (for backup) and Job (for restore).

When I added the new access lists, it re-generated the /tests directory with my new RBAC manifests and the tests passed!

Putting it All Together

I won’t go into the Operator too much further — it was a lot of struggle and while it’s still not perfect, I learned a lot. When I put it all together, the code (see Github repository) works something like this.

Let’s say I want to deploy a new HelloStateful resource, one that doesn’t have any data and is starting completely new.

  1. I create my CustomResourceDefinition for HelloStateful in my Kubernetes cluster. kubectl create -f deploy/cr.yaml
  2. I apply the RBAC definition for HelloStateful in my Kubernetes cluster. kubectl create -f deploy/rbac.yaml
  3. I create my operator (after it’s been built and pushed to my container registry). kubectl create -f deploy/operator.yaml
  4. I create a CustomResource definition that doesn’t require a restore from an existing backup (see below).

5. I create my HelloStateful called new-hello-stateful. kubectl create -f deploy/cr.yaml

All of my constructs are created for me — StatefulSets, PersistentVolumes, CronJob for back-ups and more.

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-stateful-operator-665c9fd5b7-2p5rc 1/1 Running 0 8m
new-hello-stateful-0 1/1 Running 0 6s
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
new-hello-stateful 1G RWO Retain Bound default/new-hello-stateful standard 9s
$ kubectl get cronjob
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
backup-new-hello-stateful */5 * * * * False 0 <none> 13s
$ kubectl exec -it new-hello-stateful-0 -- tail /usr/share/hello/stateful.log
Sun Oct 7 15:57:30 UTC 2018
Sun Oct 7 15:57:40 UTC 2018
Sun Oct 7 15:57:50 UTC 2018
Sun Oct 7 15:58:00 UTC 2018

When I check my Minikube host, I should see the stateful.log under the /tmp/hostpath-provisioner directory.

$ minikube ssh
$ ls /tmp/hostpath-provisioner/new-hello-stateful/
stateful.log

Perfect. In five minutes, I should see a backup created under /tmp/backup.

$ ls /tmp/backup/new-hello-stateful/
stateful.log
$ tail /tmp/backup/new-hello-stateful/stateful.log
Sun Oct 7 15:58:40 UTC 2018
Sun Oct 7 15:58:50 UTC 2018
Sun Oct 7 15:59:00 UTC 2018
Sun Oct 7 15:59:10 UTC 2018
Sun Oct 7 15:59:20 UTC 2018
Sun Oct 7 15:59:30 UTC 2018
Sun Oct 7 15:59:40 UTC 2018
Sun Oct 7 15:59:50 UTC 2018
Sun Oct 7 16:00:00 UTC 2018
Sun Oct 7 16:00:10 UTC 2018

Awesome, I’ve now backed up my PersistentVolume. Let’s say I delete my custom resource and everything to do with it, including its backend directory on my Minikube host.

$ kubectl delete -f deploy/cr.yaml
hellostateful.hello-stateful.example.com "new-hello-stateful" deleted
$ sudo rm -rf /tmp/hostpath-provisioner/*

OH NO! WHAT EVER SHALL I DO?! Never fear. I re-deploy my new-hello-stateful HelloStateful resource with the restoreFromExisting parameter set to true.

Let me re-deploy and see if it restores everything using kubectl create -f deploy/cr.yaml.

$ kubectl get pods
NAME READY STATUS RESTARTS AGEhello-stateful-operator-665c9fd5b7-2p5rc 1/1 Running 0 18m
new-hello-stateful-0 0/1 Pending 0 3s
restore-new-hello-stateful-rrwt9 0/1 Completed 0 3s

I see that my restore job has kicked off and my new StatefulSet is coming up. Let’s see if my stateful.log file has been restored.

$ kubectl exec -it new-hello-stateful-0 -- tail /usr/share/hello/stateful.log
...
Sun Oct 7 16:00:00 UTC 2018
Sun Oct 7 16:00:10 UTC 2018
Sun Oct 7 16:06:55 UTC 2018
Sun Oct 7 16:07:05 UTC 2018
Sun Oct 7 16:07:15 UTC 2018
Sun Oct 7 16:07:25 UTC 2018

It’s restored! Notice the timestamp above denotes the last time it was backed up (16:00:10 UTC) and then its restoration (16:06:55).

Summary

Phew! Made it to the end of this. Overall, I was really excited to see that the Operator SDK made it much easier to extend the Kubernetes API without having to figure out all of the files needed to write my own CustomResource. I had a tricky time trying to figure it out but it was certainly insightful. I have a new appreciation for anyone who wrote the example ones! There were a few gotchas, probably because I was unfamiliar about certain parts of Kubernetes, but working through them gave me some more insight.

This was a learning example— definitely not perfect or tested thoroughly. I will probably go back and likely extend it for my own purposes to cloud providers, like making API calls to grab the backend volume identities and making snapshots. The restore process is much trickier but it was cool to figure out a pattern for that. In the time that I fiddled with my own Operator, the SDK had changed twice with more features and different interfaces. I’m sure that by the time I learn more about it, the community will have made it even easier!

References

--

--

Rosemary Wang

explorer of infrastructure-as-code. enthusiast of cloud. formerly @thoughtworks. curious traveller & foodie.