GKE Multi-Cluster Services (MCS): Feels like magic — at first

Daniel Strebel
Google Cloud - Community
12 min readFeb 5, 2024

Multi-Cluster Services (MCS) provide a solution to the common problem of how to allow communication between workloads on one GKE cluster and a service that is backed by pods running on one or more different GKE clusters. In this post we want to take a look at how Multi-Cluster Services are configured, how they are built and which components are involved at the cluster and surrounding infrastructure levels. Ultimately this should help us gain a deeper understanding of the service and increase our confidence with troubleshooting.

If you are looking for a general introduction to Fleets and MCS I would recommend the excellent two-part blog post by Kishore Jagannath here:

In this blog post we are going to do things slightly differently and instead dissect the building blocks of MCS. If we’re lucky this might even bring back childhood memories of technology feeling like magic until you take it apart, break a few things here and there and put it back together. The good news is that this time around your parents won’t be mad at you for breaking the family’s calculator.

Image Credit https://unsplash.com/photos/a-calculator-sitting-on-top-of-a-white-table-FUdBQeW209Y

First Things First — When to use MCS

Before we dive deeper into the inner workings of MCS, let us first take a step back and explore the space of cross-cluster communication. As shown in the diagram below, MCS is only one of several possible solutions to allow consumption of services that run in a Kubernetes or specifically in a GKE cluster.

MCS is a relatively simple solution that focuses on the task of allowing workloads in one GKE cluster to talk to services that are backed by pods that run in another cluster of the same fleet. The same cross-cluster communication can also be achieved by multi-cluster features of a service-mesh like Service Mesh on GKE or Istio or network level tooling like Cilium. If you already are using any of these approaches or plan to use capabilities like traffic management, transparent authentication or telemetry on top of the multi-cluster communication, then most likely MCS is too simplistic for your use cases and you’ll want to look at using a service mesh instead.

Preparing our MCS Demo

If you want to follow along with the practical explorations in this blog post, then you can perform the following steps that will enable the required APIS and create two GKE autopilot clusters for us to play with MCS. If you prefer to use standard GKE clusters instead, the examples provided here will work just as well.

export PROJECT_ID=<your project id here>

gcloud services enable \
compute.googleapis.com \
container.googleapis.com \
multiclusterservicediscovery.googleapis.com \
gkehub.googleapis.com \
cloudresourcemanager.googleapis.com \
trafficdirector.googleapis.com \
dns.googleapis.com \
--project=$PROJECT_ID

gcloud container clusters create-auto "test-us-cluster" \
--region "us-central1" --enable-master-authorized-networks \
--network "default" --subnetwork "default" \
--services-ipv4-cidr 10.99.0.0/20 \
--async --project "$PROJECT_ID"

gcloud container clusters create-auto "test-eu-cluster" \
--region "europe-west1" --enable-master-authorized-networks \
--network "default" --subnetwork "default" \
--services-ipv4-cidr 10.99.16.0/20 \
--async --project "$PROJECT_ID"

When the clusters are ready, we can enable the multi-cluster features on our fleet and add our newly created clusters to the fleet.

gcloud container fleet multi-cluster-services enable --project $PROJECT_ID

gcloud container fleet memberships register test-us-cluster \
--gke-cluster us-central1/test-us-cluster \
--enable-workload-identity \
--project $PROJECT_ID

gcloud container fleet memberships register test-eu-cluster \
--gke-cluster europe-west1/test-eu-cluster \
--enable-workload-identity \
--project $PROJECT_ID

Let’s look at our cluster and see what has happened so far. For this we connect to our cluster in europe-west1:

gcloud container clusters get-credentials test-eu-cluster --region europe-west1 --project $PROJECT_ID

And list our namespace resources to see if we can find any trails of MCS already:

kubectl get ns

The list of namespaces should include a newly created namespace called “gke-mcs”. The namespace name already hints that it is most likely related to the MCS feature that is enabled on our fleet and also the creation of the namespace matches the time when we registered our cluster with the fleet.

Let’s explore the gke-mcs namespace in a bit more detail and find out if something is already running in there:

kubectl get all -n gke-mcs # This shows a deployment for gke-mcs-importer

kubectl logs -n gke-mcs -l k8s-app=gke-mcs-importer --tail -1 # To get the logs of the importer

In the logs we will see some permission error because the importer isn’t yet authorized to access the traffic director API:

Handler error: receiving ADS response over stream: permission denied: 
rpc error: code = PermissionDenied desc = Permission
'trafficdirector.networks.getConfigs' denied on resource
'//trafficdirector.googleapis.com/projects/...' (or it may not exist).

We can fix it by using the workload identity of the importer to assign it the network viewer role that is used to obtain the information from Cloud DNS:

gcloud projects add-iam-policy-binding $PROJECT_ID \
--member "serviceAccount:$PROJECT_ID.svc.id.goog[gke-mcs/gke-mcs-importer]" \
--role "roles/compute.networkViewer"

When we run the above logs statement again we should now see that the polling works as expected but that there were no zones created for us. Our clusters are now ready for the deployment of our multi-cluster service.

Deploy our Demo Application

To make the context switching between our two GKE clusters a bit easier we first rename our cluster contexts. Alternatively we could also use the kubectx shorthand that comes pre-installed with Cloud Shell.

gcloud container clusters get-credentials test-us-cluster --region us-central1 --project $PROJECT_ID
kubectl config rename-context "$(kubectl config current-context)" mcs-us

gcloud container clusters get-credentials test-eu-cluster --region europe-west1 --project $PROJECT_ID
kubectl config rename-context "$(kubectl config current-context)" mcs-eu

In this example we don’t need an overly fancy application, afterall we only want to show when we can or cannot reach a specific service running on another GKE cluster. For demo purposes we therefore deploy the existing hello-web example on our EU cluster and expose it as a classical ClusterIP service within our cluster.

kubectl create ns shared-services --context mcs-eu 

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/kubernetes-engine-samples/main/quickstarts/hello-app/manifests/helloweb-deployment.yaml -n shared-services --context mcs-eu

kubectl expose deployment/helloweb --port 8080 -n shared-services --context mcs-eu

Once the service is created we can successfully call it from within the same cluster as expected with the automatically created k8s cluster.local DNS name:

kubectl run test-curl --image=curlimages/curl -it --rm --pod-running-timeout=4m --context mcs-eu -- curl -v http://helloweb.shared-services.svc.cluster.local:8080

An interesting side note and important detail for later is that our GKE cluster uses Cloud DNS. We can therefore also see the automatically created A record for our service in the Cloud DNS UI in the Google Cloud Console. The DNS zone explicitly indicated though that this Cloud DNS Zone is only available in a specific GKE cluster. This DNS zone is not a typical private Zone because it is not attached to any VPCs.

The goal of MCS will be to allow consumption of this service from the US cluster. Before we expose our service let’s explore what happens with the default services. As you would probably expect, the DNS record that we used on the EU cluster isn’t available on the US cluster and even the Service IP can’t be reached from the US cluster because even though we were using RFC 1918 ranges when we created the clusters, they can’t be called from the US cluster.

# Fails with an error that the hostname can't be resolved
kubectl run test-curl --image=curlimages/curl -it --rm --pod-running-timeout=4m --context mcs-us -- curl -v http://helloweb.shared-services.svc.cluster.local:8080

# Fails with an a timeout
EU_SERVICE_IP="$(kubectl get svc -l app=hello -n shared-services --context mcs-eu -ojsonpath='{.items[*].spec.clusterIP}')"
kubectl run test-curl --image=curlimages/curl -it --rm --pod-running-timeout=4m --context mcs-us -- curl -v "http://$EU_SERVICE_IP:8080"

The problem we have isn’t that our workload in the US cluster can’t reach the workload in the EU cluster as we can demonstrate below by calling the Pod IPs directly. At the network level this works because we use VPC native cluster networking where the Pod IPs are routable from within the VPC. In the example below we use the fact that we can talk to the API servers on both clusters to our advantage and use it to demonstrate cross-cluster communication between them.

EU_POD_IP="$(kubectl get po -l app=hello -n shared-services --context mcs-eu -ojsonpath='{.items[*].status.podIP}')"

kubectl run test-curl --image=curlimages/curl -it --rm --pod-running-timeout=4m --context mcs-us -- curl -v http://$EU_POD_IP:8080

Of course explicitly queryingt the Kubernetes API for a pod IP is not a scalable solution. This is why we need to turn to MCS to automate the service discovery across clusters.

Export the Service with MCS

To export the service that is running in the EU cluster for MCS, we create a ServiceExport resource on the EU cluster. We also create the corresponding namespace on the US cluster where we ultimately want the service to be imported:

kubectl create ns shared-services --context mcs-us 

kubectl apply --context mcs-eu -f - <<EOF
apiVersion: net.gke.io/v1
kind: ServiceExport
metadata:
namespace: shared-services
name: helloweb
EOF

We can now explore the MCS importer logs on the US cluster where we should see log entries related to the service we just exported:

kubectl logs -n gke-mcs -l k8s-app=gke-mcs-importer --tail=25 --context mcs-us

This should indicate that the US cluster received information about the workload running in the EU and that an endpoint was automatically created for this:

ADS response received (europe-west1-d), type: type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment
Update from europe-west1-d with 1 negs
Creating Endpoints "gke-mcs-..."

Let’s explore the resources that were automatically created via the MCS importer in the US cluster

kubectl get ServiceImport -n shared-services --context mcs-us

kubectl get Service -n shared-services --context mcs-us

kubectl get Endpoints -n shared-services --context mcs-us

As you can see the endpoints listed in the last command contain the pod ip of the workload running in our EU cluster. This means that the ServiceExport resource basically automated the cross cluster resource identification that we provided manually in the previous section. Let’s test the service and endpoints that we found above and call it from within our US cluster:

SVC_NAME=$(kubectl get service -o=jsonpath='{.items[?(@.metadata.annotations.net\.gke\.io/service-import=="helloweb")].metadata.name}' -n shared-services --context mcs-us)


kubectl run test-curl --image=curlimages/curl -it --rm \
--pod-running-timeout=4m --context mcs-us -- \
curl -v http://$SVC_NAME.shared-services.svc.cluster.local:8080

This worked great. This time around we didn’t have to talk to the API server on the EU cluster to figure out the IP of the pod that is running our workload because the Service Import already synchronized that information. The only remaining challenge is that the service that is automatically created by the MCS importer is of the format of “gke-mcs-<hash>” and can’t be easily known beforehand. In the example above we again used the API server to retrieve the correct service name. In a real-world use case we obviously don’t want the workloads to query the Kubernetes API server before they can make a call to the remote service. This would break the abstraction and require unnecessary permissions on the Pod’s service account.

The curious case of the clusterset.local hostname

To solve the problem of the auto-generated and non-memorable service name for the service import MCS provides a handy DNS-based solution. For every imported service it creates a DNS entry in the form of “SERVICE_EXPORT_NAME.NAMESPACE.svc.clusterset.local”. With this hostname we now have a deterministic way of calling our service without the extra step of identifying the generated service name. We can just use the values in our ServiceExport to compose the hostname of the imported service and call it from within our pod from either one of our clusters:

kubectl run test-curl --image=curlimages/curl -it --rm \
--pod-running-timeout=4m --context mcs-eu -- \
curl http://helloweb.shared-services.svc.clusterset.local:8080

kubectl run test-curl --image=curlimages/curl -it --rm \
--pod-running-timeout=4m --context mcs-us -- \
curl http://helloweb.shared-services.svc.clusterset.local:8080

Note: If you get an error for either of the requests above this might be due to DNS caching. If the managed DNS zone lists your clusterset.local DNS A record it will eventually be picked up by your pod.

One interesting thing that might be unexpected at first is that the DNS zone of clusterset.local is not exposed in the UI of Cloud DNS. Even though there is an underlying managed DNS zone with a record set for our hostname as you can see by running the following commands:

gcloud dns managed-zones list --location=us-central1-b

gcloud dns managed-zones describe <name of the zone from above> --location=us-central1-b

gcloud dns record-sets list --location=us-central1-b --zone <name of the zone from above

If we want to see the clusterset.local hostname in the Cloud console we can go to Traffic Director. When we open up the Traffic Director UI in the GCP console we can see in the routing rule maps tab that we have a routing rule that matches our fleet and has our helloweb service listed as an associated service. It also shows the forwarding rule that is associated with it.

If we click on the name of the routing rule we can see the forwarding rule and the list of hostnames that are used in routing rules to forward to a Network Endpoint Group (NEG) backed backend.

[Optional Part] Breaking the Calculator and Putting it Back together

If that is the case, we could be tempted to assume that our hostname is actually handled in the forwarding rule and then sent to the associated NEG just like for an internal load balancer. If this is correct we shouldn’t need the Kubernetes resources for the service and endpoint in our US cluster to talk to the EU service. Here’s the point where we come back to our starting point and the analogy of breaking the calculator to validate our understanding.

To verify this assumption we could try to delete the shared-services namespace on the US cluster which deletes all the namespace resources including the service and the endpoints that we have looked at before. Finally we want to run the curl again from the pod in the US cluster.

kubectl delete ns shared-services --context mcs-us

kubectl run test-curl --image=curlimages/curl -it --rm \
--pod-running-timeout=4m --context mcs-us -- \
curl http://helloweb.shared-services.svc.clusterset.local:8080

The above curl will fail and confirms that the resources that the MCS importer created in the shared-services namespace were actually required despite the fact that the hostname is configured in the route rules outside the cluster. Let’s put the calculator back together and re-create the namespace such that the MCS importer can re-create the resources.

kubectl create ns shared-services --context mcs-us

Once the importer re-created the service and endpoint resources, we run the curl command again in verbose export. Here we can see why the above failed us before when we deleted the service.

kubectl run test-curl --image=curlimages/curl -it --rm \
--pod-running-timeout=4m --context mcs-us -- \
curl http://helloweb.shared-services.svc.clusterset.local:8080

Here we can see that the hostname was resolved to the Cluster IP of our service so the service resource is still required for accessing our MCS.

Conclusion

With all of these experiments and explorations we can complete our mental picture of our understanding of MCS. We now understand the involved components and have a better picture of what enables the functionality that seemed somewhat magic at first.

In this post we explored MCS not only by doing but by breaking things and putting them back together. We’ve created a service on one cluster and then consumed it from a workload running in another cluster by building up MCS step by step and looking under the hood of the resources that enable the communication between the workloads.

If you’re interested in continuing your own multi-cluster services journey then take a look at the MCS examples in the documentation and make sure you also consider alternative implementations like Service Mesh that offer additional features on top of the multi-cluster communication.

--

--