Elassandra in a Multi-Cloud Kubernetes world — Part One.

8 min readJul 3, 2020

Kubernetes adoption among IT departments is growing over time and that’s definitely a game changer, and running databases under Kubernetes is the next challenge to run your microservices on any cloud provider. Elassandra is providing a nice solution to achieve this as it provides both a distributed database, Apache Cassandra, and an embedded Elasticsearch. With the Kubernetes Elassandra Operator, let’s see how to deploy an Elassandra cluster running in many Kubernetes Clusters, with Azure Kubernetes Service and Google Kubernetes Engine.

Summary:

Overview
Create an AKS cluster
Create a GKE cluster
Deploy and configure additional services
Deploy Elassandra DC1
Deploy Elassandra DC2

Overview

The Elassandra Operator creates one Kubernetes statefulset per availability zone mapped to a Cassandra rack. Thus, in case of a zone failure, data is properly distributed across replicas and remains available.

The Elassandra Operator watches for Kubernetes nodes and identifies availability zones through the node label failure-domain.beta.kubernetes.io/zone. Each statefulset is named with its zone index rather than a zone name to keep the naming standard whatever are zone names. Here is a 9 nodes Elassandra datacenter running in 3 availability zones in a Kubernetes cluster.

By default, for each Elassandra/Cassandra cluster, you can have only one Elassandra node per Kubernetes node (enforced by an anti-affinity rule).

To ensure data consistency, Persistent Volume Claims are allocated in the zone as of the associated statefulset (or Cassandra rack).

For the purpose of the demonstration, we are deploying an Elassandra cluster using public IP addresses to connect the datacenter DC1 running on AKS, to another Elassandra datacenter DC2 running on GKE. Of course, for security reasons, it would be better to run such a cluster in a private network interconnected through a VPC or a VPN, but who can do more can do less.

AKS Setup

We are going to use Azure VMScaleSet with public IP addresses on Kubernetes nodes, and this requires the Azure CLI aks-preview extension:

az extension add --name aks-preview
az extension update --name aks-preview
az feature register --name NodePublicIPPreview --namespace Microsoft.ContainerService

Create an Azure resource group and an AKS regional cluster running on 3 zones with public IP addresses on Kubernetes nodes and the Azure network plugin:

Label the k8s nodes

Unfortunately, AKS does not map VM’s public IP address to the Kubernetes node external IP address, so the trick is to add these public IP addresses as a kubernetes custom label elassandra.strapdata.com/public-ip to each nodes.

And you should get something like this:

Install HELM 2

Install HELM 2 and add the strapdata repo:

AKS StorageClass

Azure persistent volumes are bound to an availability zone, so we need to defined one storageClass per zone in our AKS cluster, and each Elassandra rack or statefulSet will be bound to the corresponding storageClass. This is done here using the HELM chart strapdata/storageclass.

AKS Firewall rules

Finally, you may need to authorise inbound Elassandra connections on the following TCP ports:

Cassandra storage port (usually 7000 or 7001) for internode connections
Cassandra native CQL port (usually 9042) for client to node connections.
Elasticsearch HTTP port (usually 9200) for the Elasticsearch REST API.

Assuming you deploy an Elassandra datacenter respectively using ports 39000, 39001, and 39002 exposed to the internet, with no source IP address restrictions:

GKE Setup

Create a Regional Kubernetes cluster on GCP, with RBAC enabled:

Install HELM 2 (like on AKS).

GKE StorageClass

Google cloud persistent volumes are bound to an availability zone, so we need to defined one storageClass per zone in our Kubernetes cluster, and each Elassandra rack or statefulSet will be bound to the corresponding storageClass. This is done here using the HELM chart strapdata/storageclass.

for z in europe-west1-b europe-west1-c europe-west1-d; do
    helm install --name ssd-$z --namespace kube-system \
        --set parameters.type="pd-ssd" \
        --set provisioner="kubernetes.io/gce-pd" \
        --set zone=$z,nameOverride=ssd-$z \
        strapdata/storageclass
done

GKE Firewall rules

Assuming you deploy an Elassandra datacenter respectively using ports 39000, 39001, and 39002 exposed to the internet, with no source IP address restrictions, and Kubernetes nodes are properly tagged with the k8s cluster name, you can create an inbound firewall rule like this:

VPC_NETWORK=$(gcloud container clusters describe $K8S_CLUSTER_NAME --region $GCLOUD_REGION --format='value(network)')NODE_POOLS_TARGET_TAGS=$(gcloud container clusters describe $K8S_CLUSTER_NAME --region $GCLOUD_REGION --format='value[terminator=","](nodePools.config.tags)' --flatten='nodePools[].config.tags[]' | sed 's/,\{2,\}//g')gcloud compute firewall-rules create "allow-elassandra-inbound" \
  --allow tcp:39000-39002 \
  --network="$VPC_NETWORK" \
  --target-tags="$NODE_POOLS_TARGET_TAGS" \
  --description="Allow elassandra inbound" \
  --direction INGRESS

GKE CoreDNS installation

GKE is provided with KubeDNS by default, which does not allows to configure host aliases to resolve public IP addresses to internal Kubernetes node IP addresses (required by the Cassandra AddressTranslator to connect to Elassandra nodes using the internal IP address). So we need to install CoreDNS configured to import custom configuration (see CoreDNS import plugin), and configure KubeDNS with a stub domain to forward to CoreDNS.

helm install --name coredns --namespace=kube-system -f coredns-values.yaml stable/coredns

The used coredns-value.yaml is available here

Once CoreDNS is installed, add a stub domain to forward request for domain internal.strapdata.com to the CoreDNS service, and restart KubeDNS pods. The internal.strapdata.com is just a dummy DNS domain used to resolve public IP addresses to Kubernetes nodes internal IP addresses.

Prepare your Kubernetes clusters

Once your AKS and GKE clusters are running, we need to deploy and configure additional services in these two clusters.

Elassandra Operator

Install the Elassandra operator in the default namespace:

helm install --namespace default --name elassop --wait strapdata/elassandra-operator

Configure CoreDNS

The Kubernetes CoreDNS is used for two reasons:

Resolve DNS name of you DNS zone from inside the Kubernetes cluster using DNS forwarders to your DNS zone.
Reverse resolution of the broadcast Elassandra public IP addresses to Kubernetes nodes private IP.

You can deploy the CodeDNS custom configuration with the strapdata coredns-forwarder HELM chart to basically install (or replace) the coredns-custom configmap, and restart coreDNS pods.

If your Kubernetes nodes does not have the ExternalIP set (like AKS), public node IP address should be available through the custom label elassandra.strapdata.com/public-ip.

Then configure the CoreDNS custom configmap with your DNS name servers and host aliases. In the following example, this is Azure DNS name servers:

kubectl delete configmap --namespace kube-system coredns-custom
helm install --name coredns-forwarder --namespace kube-system \
    --set forwarders.domain="${DNS_DOMAIN}" \
    --set forwarders.hosts[0]="40.90.4.8" \
    --set forwarders.hosts[1]="64.4.48.8" \
    --set forwarders.hosts[2]="13.107.24.8" \
    --set forwarders.hosts[3]="13.107.160.8" \
    --set nodes.domain=internal.strapdata.com \
    --set $HOST_ALIASES \
    strapdata/coredns-forwarder

Then restart CoreDNS pods to reload our configuration, but this depends on coreDNS deployment labels !

On AKS:

kubectl delete pod --namespace kube-system -l k8s-app=kube-dns

On GKE:

kubectl delete pod --namespace kube-system -l k8s-app=coredns

Check the CoreDNS custom configuration:

kubectl get configmap -n kube-system coredns-custom -o yaml
apiVersion: v1
data:
  dns.server: |
    test.strapkube.com:53 {
        errors
        cache 30
        forward $DNS_DOMAIN 40.90.4.8 64.4.48.8 13.107.24.8 13.107.160.8
    }
  hosts.override: |
    hosts nodes.hosts internal.strapdata.com {
        10.132.0.57 146-148-117-125.internal.strapdata.com 146-148-117-125
        10.132.0.58 35-240-56-87.internal.strapdata.com 35-240-56-87
        10.132.0.56 34-76-40-251.internal.strapdata.com 34-76-40-251
        fallthrough
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2020-06-26T16:45:52Z"
  name: coredns-custom
  namespace: kube-system
  resourceVersion: "6632"
  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns-custom
  uid: dca59c7d-6503-48c1-864f-28ae46319725

Deploy a dnsutil pod:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: dnsutils
  namespace: default
spec:
  containers:
  - name: dnsutils
    image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always
EOF

Test resolution of public IP names to internal Kubernetes node IP address:

kubectl exec -ti dnsutils -- nslookup 146-148-117-125.internal.strapdata.com
Server:             10.19.240.10
Address:    10.19.240.10#53Name:       146-148-117-125.internal.strapdata.com
Address: 10.132.0.57

ExternalDNS

The ExternalDNS is used to automatically update your DNS zone and create an A record for the Cassandra broadcast IP addresses. You can use it with a public or a private DNS zone, and with any DNS provider supported by ExternalDNS. In the following setup, we will use a DNS zone hosted on Azure.

Deploy Elassandra DC1

Deploy the first datacenter dc1 of the Elassandra cluster cl1 in the Kubernetes cluster kube1, with Kibana and Cassandra Reaper available through the Traefik ingress controller.

Once the Elassandra datacenter is deployed, you get 3 Elassandra pods from 3 StatefulSets:

Once the datacenter is ready, check the cluster status:

Then get the generated TLS certificates and the Cassandra admin password (Because using the default cassandra user is not recommended, the Elassandra operator automatically creates an admin superuser role):

Connect to the Elassandra/Cassandra node from the internet:

SSL_CERTFILE=cl1-cacert.pem bin/cqlsh --ssl -u admin -p $CASSANDRA_ADMIN_PASSWORD cassandra-cl1-dc1-0-0.$DNS_DOMAIN 39001
Connected to cl1 at cassandra-cl1-dc1-0-0.test.strapkube.com:39001.
[cqlsh 5.0.1 | Cassandra 3.11.6.1 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
admin@cqlsh>

Finally, you can check the Elassandra datacenter status (The CRD managed by the Elassandra Operator):

Deploy Elassandra DC2

First, we need to copy cluster secrets from the Elassandra datacenter dc1 into the Kubernetes kube2 running on GKE.

for s in elassandra-cl1 elassandra-cl1-ca-pub elassandra-cl1-ca-key elassandra-cl1-kibana; do
 kubectl get secret $s — context kube1 — export -n default -o yaml | kubectl apply — context kube2 -n default -f -
done

Then deploy the Elassandra datacenter dc2 into the GKE cluster2, using the same ports.

The TRAEFIK_FQDN should be something like traefik-cluster2.$DNS_DOMAIN.
The cassandra.remoteSeeds must include the DNS names of dc1 seed nodes, the first node of each rack StatefulSet with index 0.

Once dc2 Elassandra pods are started, you get a running Elassandra cluster in AKS and GKE.

The datacenter dc2 started without streaming data, and we now setup keyspace replication before rebuilding the datacenter from dc1 using an Elassandra task CRD. This task automatically includes Cassandra system keyspaces (system_auth, system_distributed, system_traces, and elastic_admin if Elasticsearch is enabled).

The edctl utility allow to wait on conditions on Elassandra datacenters or tasks. We now rebuild dc2 from dc1 by streaming the data:

If Elasticsearch is enabled in dc2, you need to run restart Elassandra pods to update the Elasticsearch cluster state since data have been populated by streaming data from dc1.

kubectl delete pod --namespace default -l app=elassandra,elassandra.strapdata.com/datacenter=dc2

Finally, check you can connect on dc2:

SSL_CERTFILE=cl1-cacert.pem bin/cqlsh --ssl -u admin -p $CASSANDRA_ADMIN_PASSWORD cassandra-cl1-dc2-0-0.$DNS_DOMAIN 39001
Connected to cl1 at cassandra-cl1-dc2-0-0.test.strapkube.com:39001.
[cqlsh 5.0.1 | Cassandra 3.11.6.1 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
admin@cqlsh>

Check the Elasticsearch cluster status on dc2. The kibana index was automatically created by the deployed kibana pod running in the Kubernetes cluster kube2:

Conclusion

Here you get a multi-cloud Elassandra cluster running in multiple Kubernetes clusters. The Elassandra Operator gives you the flexibility to deploy on the cloud or on premise, in a public or private network. You can scale up/scale down, park/unpark your datacenters, you can loose a kubernetes node, a persistent volume or event a zone, the Elassandra datacenter remains up and running and you don’t have to manage any sync issue between your database and your Elasticsearch cluster.

In next the articles, we’ll see how the Elassandra Operator deploys Kibana for data visualisation and Cassandra Reaper to manage continuous Cassandra repairs. We’ll also see how to setup the Prometheus Operator with Grafana dashboards to monitor the Elassandra Operator, the Elassandra nodes and Kubernetes resources.

Have fun with this Elassandra Operator and thanks in advance for your feedbacks !