Set up Multi-Datacenter Cassandra Clusters in GKE with K8ssandra and Cloud DNS
Author: Jeff Carpenter
This is the second post that tests varying schemes for using K8ssandra to set up Cassandra clusters with different deployment topologies. In this post, we focus on developing a multi-data center Cassandra cluster working on Kubernetes clusters in multiple regions. We’ll be using Google Cloud DNS instead of hardcoded IPs for higher scalability and ease-of-use.
Recently, we’ve been examining different patterns for deploying Apache Cassandra® clusters on Kubernetes (K8s) using K8ssandra with different deployment topologies. We’ve explored how to deploy a Cassandra cluster with two data centers in a single cloud region, using separate K8s namespaces to isolate operational and analytics workloads.
We also deployed a Cassandra cluster across K8s clusters in multiple Google Cloud Platform (CGP) regions. One of the shortcomings of this approach was how we handled the networking. Specifically, we used hardcoded IPs of Cassandra nodes in the first data center to bootstrap the second data center using what Cassandra calls “seed nodes”. Using hardcoded IPs is a more labor-intensive solution and doesn’t scale well. What we really need is to be able to locate seed nodes with a domain name server (DNS).
Here, you’ll learn how to create a Cassandra cluster spanning Google Kubernetes Engine (GKE) clusters in two Google Cloud regions using Google’s Cloud DNS Service to provide name resolution between the two GKE clusters. For the purpose of this exercise, we want to use the same network. So, we’ll create GKE clusters in two separate regions under the same Google Cloud project.
A quick note on terminology. The word “cluster” is used a lot in distributed technology and can get confusing when we have two different distributed technologies working together.
- A Cassandra cluster can span multiple data centers and presents itself as a single entity to the external user, no matter which data center is being accessed.
- A Kubernetes cluster is considered as a local data center and requires a network interconnect between data centers to make it multi-cluster.
Combining these two concepts, a single Cassandra cluster will span multiple Kubernetes clusters. Hopefully, that helps clarify some of the terminologies. With that, let’s get started.
Preparing the first GKE cluster
First, you’re going to need a K8s cluster in which you can create the first Cassandra data center. We’ll show how to do this with the
gcloud command line. For the commands in this post, we’ll assume your context is set to have a current project, region, and zone. You can check what values are set by running something like this:
First, you’ll need a network:
Then, to create your first GKE cluster, you’ll need a subnet in the zone we plan to use. I chose
Now you can create a GKE cluster using that subnet and compute specs that meet the K8ssandra minimum requirements. Here’s what my command looked like using zones in the
--cluster-dns* attributes, which configure the new cluster to use the Cloud DNS service with virtual private cloud (VPC) level scoping and the domain
cluster1, which we’ll be able to use below for name resolution. For more information on this configuration, see the documentation for using Cloud DNS for GKE.
This should change your
kubectl context to the new cluster, but you can make sure by checking the output of
kubectl config current-context.
Note: If you’d rather use Terraform scripts to create your GKE cluster, the K8ssandra project documentation includes instructions for K8ssandra on Google Kubernetes Engine (GKE), which reference sample scripts provided as part of the K8ssandra GCP Terraform Example.
Creating the first Cassandra data center
Now you’re ready to create the first Cassandra data center. First, you’ll create Cassandra administrator credentials. Create a namespace for the first data center and add a secret within the namespace:
The next step is to create a K8ssandra deployment for the first data center. You’ll need Helm installed for this step, as described on the K8ssandra GKE docs page. Create the configuration for the first data center in a file called
dc1.yaml, making sure to change the affinity labels to match the zones used in your GKE cluster:
In addition to requesting three nodes in the data center, this configuration specifies an appropriate storage class for the GKE environment (
standard-rwo), and uses affinity to specify how the racks are mapped to GCP zones. Make sure to change the referenced zones to match your configuration. For more details, please refer to the first blog in this series.
Now, deploy the release using this command:
This causes the K8ssandra release named
k8ssandra to be installed in the namespace
As would be the case for any Cassandra cluster deployment, you’ll want to wait for the first data center to be completely up before adding a second data center. Since you’ll now be creating additional infrastructure for the second data center, you don’t need to wait. But if you’re interested, one simple way to make sure the data center is up is to watch until the Stargate pod shows as initialized since it depends on Cassandra being ready:
This is a great point to get some information you’ll need below to configure the second Cassandra data center: seeds. Conveniently for us, K8ssandra creates a headless K8s service called the seed service, which points to a couple of the Cassandra nodes that can be used to bootstrap new nodes or data centers into a Cassandra cluster:
If you’re curious, you can get a quick look at the IP addresses of the pods behind this service that are labeled as seed nodes using the same selector that the service uses:
Which produces output that looks like this:
10.240.0.8 10.240.2.6 10.240.1.10
Preparing the second GKE cluster
Now you’ll need a second K8s cluster that will be used to host the second Cassandra data center in a different region. For example, I chose the
us-central1 region for my second cluster. First, I explicitly created a subnet in that region as part of the
Then, I created the second GKE cluster using that network and the same compute specs as the first cluster:
Make sure the
kubectl context has changed to the second data center.
Enabling traffic between Kubernetes clusters
Next, you’ll need to create a firewall rule to allow traffic between the two clusters. Recall the IP space of the subnets you defined above (I used
10.0.0.0/20,10.2.0.0/20), and then obtain the IP spaces of each GKE cluster. For example:
Use these IP spaces to create a rule to allow all traffic:
If you want, you can create a more targeted rule to only allow transmission control protocol (TCP) traffic between ports used by Cassandra.
Adding a second Cassandra data center
Let’s start by creating a namespace for the new data center matching the GCP region name. We also need to create administrator credentials to match those created for the first data center, since the secrets are not automatically replicated between clusters.
Now you’ll create a configuration to deploy an additional Cassandra data center
dc2 in the new GKE cluster. For the nodes in
dc2 to be able to join the Cassandra cluster, a few steps are required:
- The first is one you’ve already taken care of: using the same Google Cloud network for both GKE clusters means the nodes in the new data center will be able to communicate with nodes in the original data center.
- Second, make sure to use the same Cassandra cluster name as for the first data center.
- Finally, you’ll need to provide the fully qualified name of the seed service so that nodes in the new data center know how to contact nodes in the first data center to join the cluster.
This last step is where Google Cloud DNS does the work for us. According to the Kubernetes documentation, the fully qualified domain name (FQDN) of a service follows the pattern:
For our purposes, the service name is
multi-region-seed-service, the namespace is
k8ssandra, and the domain is the same DNS domain we assigned the GKE cluster in Google Cloud DNS:
cluster1. So, the FQDN we need is:
Now create a configuration in a file called
dc2.yaml. Here’s what my file looked like with the FQDN for the seed service. Make sure to change the affinity labels appropriately for your chosen region and zones:
Similar to the configuration for
dc1, this configuration also uses affinity. A similar allocation of racks can be used to make sure Cassandra nodes are evenly spread across the remaining workers. Deploy the release using a command like this:
If you look at the resources in this namespace, using a command such as
kubectl get services,pods you’ll note that there are a similar set of pods and services as for
dc1, including Stargate, Prometheus, Grafana, and Reaper. Depending on how you wish to manage your application, this may or may not be to your liking, but you’re free to tailor the configuration to disable any components you don’t need.
Configuring Cassandra Keyspaces
Once the second data center comes online, you’ll want to configure Cassandra keyspaces to replicate across both clusters.
Important: You’ll likely need to first change your
kubectl context back to the first GKE cluster, for example, using the
kubectl config use-context command. You can list existing contexts using
kubectl config get-contexts.
To update keyspaces, connect to a node in the first data center and execute
DESCRIBE KEYSPACES to list the keyspaces and the
DESCRIBE KEYSPACE <name> command to identify those using the
NetworkTopologyStrategy. For example:
Typically you’ll find that the
system_auth, system_traces, and
system_distributed keyspaces use
NetworkTopologyStrategy, as well as
data_endpoint_auth if you’ve enabled Stargate. You can then update the replication strategy to ensure data is replicated to the new data center. You’ll execute something like the following for each of these keyspaces:
Important: Remember to create or alter the replication strategy for any keyspaces you need for your application so you have the desired number of replicas in each data center.
cqlsh, make sure existing data is properly replicated to the new data center with the
nodetool rebuild command.
Important: Remember to change your
kubectl context back to the second GKE cluster.
Rebuild needs to be run on each node in the new data center. For example:
Repeat for the other nodes
Testing the configuration
Let’s verify the second data center has joined the cluster. To do this, pick a Cassandra node to execute the
nodetool status command against. Then, execute the
nodetool command against the node:
This will produce output similar to the following:
If everything has been configured correctly, you’ll be able to see both data centers in the cluster output. Here’s a picture that depicts what you’ve just deployed, focusing on the Cassandra nodes and networking:
We’re hard at work building a K8s operator for K8ssandra, which will help support multiple topologies beyond the one we’ve described here, and to do so more simply. If you’re interested in learning more about deploying Cassandra on K8s, or getting involved in the project, we encourage you to check out the K8ssandra project on GitHub, or the K8ssandra blog for other tutorials. Also, feel free to reach out with any questions you have on the forum or our Discord channel.
- Apache Cassandra
- Using Cloud DNS for GKE
- K8ssandra on Google Kubernetes Engine (GKE)
- Kubernetes Documentation: DNS for Services and Pods
- Apache Cassandra Documentation: What are seeds?
- K8ssandra: Deploy a multi-data center Apache Cassandra cluster in Kubernetes
- K8ssandra: Why we decided to build a K8ssandra operator
- Github Documentation: K8ssandra GCP Terraform Example