Deploying an etcd cluster on OpenShift using the etcd operator

In this article, we’ll see how to deploy an etcd cluster on Minishift using the etcd operator, an open source tool designed to deploy, scale, upgrade and backup an etcd cluster on Kubernetes with ease.

A quick introduction to the etcd operator

The etcd operator is a tool to create, configure and manage etc cluster using a declarative configuration. It is able to create, resize, backup and restore a cluster transparently.

The etcd operator behaves like a human operator: it observes the state of the etcd cluster using the Kubernetes API, it analyzes the differences with the desired state and then acts accordingly, using the Kubernetes API or the etcd management API.

Installing the etcd operator

RBAC setup

Since OpenShift uses RBAC from the get-go to manage permissions (actually, RBAC was contributed in the Kubernetes upstream project), we need to create a cluster-wide role and a role binding which will be associated to the namespace’s default service account, as this is the account that the operator’s pod uses to create the Custom Resource Definition, as well as perform operations on different types of resources, including pods, services, endpoints, persistent volume claims, secrets and events.

# login to the MiniShift cluster with an admin account 
$ oc login https://192.168.99.100:8443 --insecure-skip-tls-verify=true -u system:admin
# use the `sandbox` project 
# (or use `oc new-project` if it does not exists yet)
$ oc project sandbox
# define the role and role binding names
$ export ROLE_NAME=etcd-operator
$ export ROLE_BINDING_NAME=etcd-operator
$ export NAMESPACE=sandbox
# create the cluster-wide role
$ curl https://raw.githubusercontent.com/coreos/etcd-operator/master/example/rbac/cluster-role-template.yaml | sed -e "s/<ROLE_NAME>/${ROLE_NAME}/g" | oc apply -f -
# create the cluster-wide role-binding
$ curl https://raw.githubusercontent.com/coreos/etcd-operator/master/example/rbac/cluster-role-binding-template.yaml | \
sed -e "s/<ROLE_NAME>/${ROLE_NAME}/g" \
-e "s/<ROLE_BINDING_NAME>/${ROLE_BINDING_NAME}/g" \
-e "s/<NAMESPACE>/${NAMESPACE}/g" \
| oc apply -f -
# verify that the default service account in the sandbox namespace
# has the role
$ oc describe clusterrolebinding etcd-operator
Name: etcd-operator
Role: /etcd-operator
...
ServiceAccounts: sandbox/default
...

Deploying the etcd operator

Let’s now deploy the etcd operator so it can start the operator replicaset and pod, as well as create a Custom Resource Definition :

$ oc create -f https://raw.githubusercontent.com/coreos/etcd-operator/master/example/deployment.yaml
$ oc get all
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE
deploy/etcd-operator 1 1 1 1
NAME                          DESIRED   CURRENT   READY
rs/etcd-operator-768dc99865 1 1 1
NAME                                READY     STATUS    RESTARTS
po/etcd-operator-768dc99865-gml7w 1/1 Running 0
$ oc get crd
NAME
etcdclusters.etcd.database.coreos.com

Note that the registered a Custom Resource Definition is limited to the scope of the namespace:

$ oc get crd/etcdclusters.etcd.database.coreos.com -o yaml
Name: etcdclusters.etcd.database.coreos.com
...
API Version: apiextensions.k8s.io/v1beta1
Kind: CustomResourceDefinition
Metadata:
...
Spec:
Group: etcd.database.coreos.com
Names:
Kind: EtcdCluster
List Kind: EtcdClusterList
Plural: etcdclusters
Short Names:
etcd
Singular: etcdcluster
Scope: Namespaced
Version: v1beta2
...

We’ll see later in the article how to query this new kind of resource.

Initializing the etcd cluster

Having the operator in place, we can now proceed with the initialization of a cluster of 3 nodes with the following, simple manifest file:

$ cat etc-cluster.yml
apiVersion: "etcd.database.coreos.com/v1beta2"
kind: "EtcdCluster"
metadata:
name: "url-shortener-etcd-cluster"
spec:
size: 3
version: "3.2.13"
$ oc apply -f templates/etcd-cluster.yml
etcdcluster "url-shortener-etcd-cluster" created

It only takes a few seconds (or well, minutes if the Docker image needs to be downloaded) to get the cluster available. Once it’s done, the 3 requested pods can be listed using the etcd_cluster=url-shortener-etcd-cluster label that the operator applied on them:

$ oc get pods -l etcd_cluster=url-shortener-etcd-cluster
NAME READY STATUS RESTARTS url-shortener-etcd-cluster-2jzs9wnzfq 1/1 Running 0
url-shortener-etcd-cluster-4s2dcndghh 1/1 Running 0
url-shortener-etcd-cluster-xq6b4g8nqn 1/1 Running 0

Also, the cluster initialization triggered the creation of 2 services, url-shortener-etcd-cluster and url-shortener-etcd-cluster-client, the latter being the one to use to access the cluster.

$ oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
url-shortener-etcd-cluster ClusterIP None <none> 2379/TCP,2380/TCP
url-shortener-etcd-cluster-client ClusterIP 172.30.3.47 <none> 2379/TCP

Having the Custom Resource Definition installed by the ectd operator, the etcd cluster can also be described by its custom type:

$ oc describe etcdclusters url-shortener-etcd-cluster
Name: url-shortener-etcd-cluster
Namespace: sandbox
Kind: EtcdCluster
...
Spec:
Repository: quay.io/coreos/etcd
Size: 3
Version: 3.2.13
Status:
Client Port: 2379
Conditions:
Reason: Cluster available
Status: True
Type: Available
Current Version: 3.2.13
Members:
Ready:
url-shortener-etcd-cluster-2jzs9wnzfq
url-shortener-etcd-cluster-4s2dcndghh
url-shortener-etcd-cluster-xq6b4g8nqn
...

The output of the oc describe command provides information about the cluster, including the status of the cluster, the number and the names of the pods running the etcd nodes, and the version of etcd currently deployed.

Operating on the etcd cluster

From now on, adding nodes or removing nodes from the cluster is as simple as changing the size value in the etcd-cluster.yml manifest:

# increase the desired size to the cluster to 5 nodes
$ cat templates/etc-cluster.yml
...
spec:
size: 5
...
# apply the cluster config change
$ oc apply -f templates/etcd-cluster.yml
etcdcluster "url-shortener-etcd-cluster" configured
# see the changes in the number of pods
$ oc get pods -l etcd_cluster=url-shortener-etcd-cluster
NAME READY STATUS RESTARTS
url-shortener-etcd-cluster-2jzs9wnzfq 1/1 Running 0
url-shortener-etcd-cluster-4s2dcndghh 1/1 Running 0
url-shortener-etcd-cluster-5g8hf2prf2 1/1 Running 0
url-shortener-etcd-cluster-xq6b4g8nqn 1/1 Running 0
url-shortener-etcd-cluster-z4tqpm9z2t 1/1 Running 0

Also, as you would expect with an operator on Kubernetes, if a pod is removed, the operator detects the mismatch between the actual state and the expected state of the cluster, and it triggers the initialization of a new etcd node for the cluster:

# delete a pod and monitor the changes
$ oc delete pod url-shortener-etcd-cluster-z4tqpm9z2t && oc get pods -w
pod "url-shortener-etcd-cluster-z4tqpm9z2t" deleted
NAME READY STATUS
oc delete pod url-shortener-etcd-cluster-z4tqpm9z2t && oc get pods -w
pod "url-shortener-etcd-cluster-z4tqpm9z2t" deleted
NAME READY STATUS RESTARTS AGE
etcd-operator-768dc99865-2lh7p 1/1 Running
url-shortener-etcd-cluster-2jzs9wnzfq 1/1 Running
url-shortener-etcd-cluster-4s2dcndghh 1/1 Running
url-shortener-etcd-cluster-5g8hf2prf2 1/1 Running
url-shortener-etcd-cluster-xq6b4g8nqn 1/1 Running
url-shortener-etcd-cluster-z4tqpm9z2t 1/1 Terminating
url-shortener-etcd-cluster-z4tqpm9z2t 0/1 Terminating
url-shortener-etcd-cluster-gjj9dxqf5j 0/1 Pending
url-shortener-etcd-cluster-gjj9dxqf5j 0/1 Init:0/1
url-shortener-etcd-cluster-z4tqpm9z2t 0/1 Terminating
url-shortener-etcd-cluster-gjj9dxqf5j 0/1 PodInitializing
url-shortener-etcd-cluster-gjj9dxqf5j 0/1 Running 0
url-shortener-etcd-cluster-gjj9dxqf5j 1/1 Running 0

The operator behaves like a replicaset controller by creating a new pod after the deletion, but it also has the application domain knowledge (here, etcd) to make sure that the etcd node running on the new pod joins the current cluster.

Accessing the etcd cluster from the command line

Now that the cluster is ready, it’s now time how to use it. As explained above, the etc operator created 2 services when deploying the etcd cluster node pods:

$ oc get svc -l etcd_cluster=url-shortener-etcd-cluster
NAME CLUSTER-IP PORT(S)
url-shortener-etcd-cluster None 2379/TCP,2380/TCP
url-shortener-etcd-cluster-client 172.30.3.47 2379/TCP

The url-shortener-etcd-cluster-client is the service to use to connect to the etcd cluster as a client. The simplest way to proceed is to run the etcdctl command from within an ephemeral pod in the OpenShift cluster:

$ oc run --rm -i --tty fun --image quay.io/coreos/etcd --restart=Never -- /bin/sh 
If you don't see a command prompt, try pressing enter.
/ # ETCDCTL_API=3 etcdctl --endpoints http://url-shortener-etcd-cluster-client:2379 put foo bar
OK
/ # ETCDCTL_API=3 etcdctl --endpoints http://url-shortener-etcd-cluster-client:2379 get foo
foo
bar
/ # exit

Et voilà! our etcd cluster is now ready to use!


For more information about the etcd operator, visit the project on GitHub. Also, Red Hat/CoreOS recently announced the Operator SDK, a toolkit designed for building Kubernetes native operators such as the etcd one.