Deploying an etcd cluster on OpenShift using the etcd operator

In this article, we’ll see how to deploy an etcd cluster on Minishift using the etcd operator, an open source tool designed to deploy, scale, upgrade and backup an etcd cluster on Kubernetes with ease.

A quick introduction to the etcd operator

The etcd operator is a tool to create, configure and manage etc cluster using a declarative configuration. It is able to create, resize, backup and restore a cluster transparently.

The etcd operator behaves like a human operator: it observes the state of the etcd cluster using the Kubernetes API, it analyzes the differences with the desired state and then acts accordingly, using the Kubernetes API or the etcd management API.

Installing the etcd operator

RBAC setup

Since OpenShift uses RBAC from the get-go to manage permissions (actually, RBAC was contributed in the Kubernetes upstream project), we need to create a cluster-wide role and a role binding which will be associated to the namespace’s default service account, as this is the account that the operator’s pod uses to create the Custom Resource Definition, as well as perform operations on different types of resources, including pods, services, endpoints, persistent volume claims, secrets and events.

# login to the MiniShift cluster with an admin account 
$ oc login --insecure-skip-tls-verify=true -u system:admin
# use the `sandbox` project 
# (or use `oc new-project` if it does not exists yet)
$ oc project sandbox
# define the role and role binding names
$ export ROLE_NAME=etcd-operator
$ export ROLE_BINDING_NAME=etcd-operator
$ export NAMESPACE=sandbox
# create the cluster-wide role
$ curl | sed -e "s/<ROLE_NAME>/${ROLE_NAME}/g" | oc apply -f -
# create the cluster-wide role-binding
$ curl | \
sed -e "s/<ROLE_NAME>/${ROLE_NAME}/g" \
| oc apply -f -
# verify that the default service account in the sandbox namespace
# has the role
$ oc describe clusterrolebinding etcd-operator
Name: etcd-operator
Role: /etcd-operator
ServiceAccounts: sandbox/default

Deploying the etcd operator

Let’s now deploy the etcd operator so it can start the operator replicaset and pod, as well as create a Custom Resource Definition :

$ oc create -f
$ oc get all
deploy/etcd-operator 1 1 1 1
NAME                          DESIRED   CURRENT   READY
rs/etcd-operator-768dc99865 1 1 1
NAME                                READY     STATUS    RESTARTS
po/etcd-operator-768dc99865-gml7w 1/1 Running 0
$ oc get crd

Note that the registered a Custom Resource Definition is limited to the scope of the namespace:

$ oc get crd/ -o yaml
API Version:
Kind: CustomResourceDefinition
Kind: EtcdCluster
List Kind: EtcdClusterList
Plural: etcdclusters
Short Names:
Singular: etcdcluster
Scope: Namespaced
Version: v1beta2

We’ll see later in the article how to query this new kind of resource.

Initializing the etcd cluster

Having the operator in place, we can now proceed with the initialization of a cluster of 3 nodes with the following, simple manifest file:

$ cat etc-cluster.yml
apiVersion: ""
kind: "EtcdCluster"
name: "url-shortener-etcd-cluster"
size: 3
version: "3.2.13"
$ oc apply -f templates/etcd-cluster.yml
etcdcluster "url-shortener-etcd-cluster" created

It only takes a few seconds (or well, minutes if the Docker image needs to be downloaded) to get the cluster available. Once it’s done, the 3 requested pods can be listed using the etcd_cluster=url-shortener-etcd-cluster label that the operator applied on them:

$ oc get pods -l etcd_cluster=url-shortener-etcd-cluster
NAME READY STATUS RESTARTS url-shortener-etcd-cluster-2jzs9wnzfq 1/1 Running 0
url-shortener-etcd-cluster-4s2dcndghh 1/1 Running 0
url-shortener-etcd-cluster-xq6b4g8nqn 1/1 Running 0

Also, the cluster initialization triggered the creation of 2 services, url-shortener-etcd-cluster and url-shortener-etcd-cluster-client, the latter being the one to use to access the cluster.

$ oc get svc
url-shortener-etcd-cluster ClusterIP None <none> 2379/TCP,2380/TCP
url-shortener-etcd-cluster-client ClusterIP <none> 2379/TCP

Having the Custom Resource Definition installed by the ectd operator, the etcd cluster can also be described by its custom type:

$ oc describe etcdclusters url-shortener-etcd-cluster
Name: url-shortener-etcd-cluster
Namespace: sandbox
Kind: EtcdCluster
Size: 3
Version: 3.2.13
Client Port: 2379
Reason: Cluster available
Status: True
Type: Available
Current Version: 3.2.13

The output of the oc describe command provides information about the cluster, including the status of the cluster, the number and the names of the pods running the etcd nodes, and the version of etcd currently deployed.

Operating on the etcd cluster

From now on, adding nodes or removing nodes from the cluster is as simple as changing the size value in the etcd-cluster.yml manifest:

# increase the desired size to the cluster to 5 nodes
$ cat templates/etc-cluster.yml
size: 5
# apply the cluster config change
$ oc apply -f templates/etcd-cluster.yml
etcdcluster "url-shortener-etcd-cluster" configured
# see the changes in the number of pods
$ oc get pods -l etcd_cluster=url-shortener-etcd-cluster
url-shortener-etcd-cluster-2jzs9wnzfq 1/1 Running 0
url-shortener-etcd-cluster-4s2dcndghh 1/1 Running 0
url-shortener-etcd-cluster-5g8hf2prf2 1/1 Running 0
url-shortener-etcd-cluster-xq6b4g8nqn 1/1 Running 0
url-shortener-etcd-cluster-z4tqpm9z2t 1/1 Running 0

Also, as you would expect with an operator on Kubernetes, if a pod is removed, the operator detects the mismatch between the actual state and the expected state of the cluster, and it triggers the initialization of a new etcd node for the cluster:

# delete a pod and monitor the changes
$ oc delete pod url-shortener-etcd-cluster-z4tqpm9z2t && oc get pods -w
pod "url-shortener-etcd-cluster-z4tqpm9z2t" deleted
oc delete pod url-shortener-etcd-cluster-z4tqpm9z2t && oc get pods -w
pod "url-shortener-etcd-cluster-z4tqpm9z2t" deleted
etcd-operator-768dc99865-2lh7p 1/1 Running
url-shortener-etcd-cluster-2jzs9wnzfq 1/1 Running
url-shortener-etcd-cluster-4s2dcndghh 1/1 Running
url-shortener-etcd-cluster-5g8hf2prf2 1/1 Running
url-shortener-etcd-cluster-xq6b4g8nqn 1/1 Running
url-shortener-etcd-cluster-z4tqpm9z2t 1/1 Terminating
url-shortener-etcd-cluster-z4tqpm9z2t 0/1 Terminating
url-shortener-etcd-cluster-gjj9dxqf5j 0/1 Pending
url-shortener-etcd-cluster-gjj9dxqf5j 0/1 Init:0/1
url-shortener-etcd-cluster-z4tqpm9z2t 0/1 Terminating
url-shortener-etcd-cluster-gjj9dxqf5j 0/1 PodInitializing
url-shortener-etcd-cluster-gjj9dxqf5j 0/1 Running 0
url-shortener-etcd-cluster-gjj9dxqf5j 1/1 Running 0

The operator behaves like a replicaset controller by creating a new pod after the deletion, but it also has the application domain knowledge (here, etcd) to make sure that the etcd node running on the new pod joins the current cluster.

Accessing the etcd cluster from the command line

Now that the cluster is ready, it’s now time how to use it. As explained above, the etc operator created 2 services when deploying the etcd cluster node pods:

$ oc get svc -l etcd_cluster=url-shortener-etcd-cluster
url-shortener-etcd-cluster None 2379/TCP,2380/TCP
url-shortener-etcd-cluster-client 2379/TCP

The url-shortener-etcd-cluster-client is the service to use to connect to the etcd cluster as a client. The simplest way to proceed is to run the etcdctl command from within an ephemeral pod in the OpenShift cluster:

$ oc run --rm -i --tty fun --image --restart=Never -- /bin/sh 
If you don't see a command prompt, try pressing enter.
/ # ETCDCTL_API=3 etcdctl --endpoints http://url-shortener-etcd-cluster-client:2379 put foo bar
/ # ETCDCTL_API=3 etcdctl --endpoints http://url-shortener-etcd-cluster-client:2379 get foo
/ # exit

Et voilà! our etcd cluster is now ready to use!

For more information about the etcd operator, visit the project on GitHub. Also, Red Hat/CoreOS recently announced the Operator SDK, a toolkit designed for building Kubernetes native operators such as the etcd one.