Multi-Kubernetes cluster connectivity with OKE and Cilium for stateful workloads on Oracle Cloud

Ali Mukadam
Oracle Developers
Published in
16 min readJan 19, 2024
Photo by Wilhelm Gunkel on Unsplash

In a previous article, I described how you can switch your CNI from the default flannel to Cilium. Cilium has many interesting features and capabilities in-built, including Network Policy, its use of eBPF, enhanced performance, observability and security. We also added it as a community enhancement to the Terraform OKE module.

As I mentioned in my recent posts on the Terraform OKE module series, our users keep bringing us interesting challenges and among these are running multiple, connected Kubernetes (both OKE and OCNE) clusters in geographically separate regions for various reasons, but particularly for high availability or for migration from on-premise clusters. You can connect your Kubernetes clusters using a service mesh like Istio or Linkerd. But Istio seems to have a problem when dealing with stateful workloads that require headless Kubernetes services.

Now, I previously demonstrated how to address this by using Submariner and deploying Oracle Coherence, a distributed data grid on OKE. You can go through the whole series, including:

Submariner is fantastic and you can combine it with Istio to handle the cases that Istio cannot handle. However, Kubernetes is hard to maintain on its own, even though we do the heavy lifting for you in OKE. Now, throw in Istio and Submariner and this quickly becomes tool heavy and I’ve not even come to compatibility with various Kubernetes versions etc. I haven’t seen any particular issue but they all have different release cadences and there’s a possibility, however unlikely, that you need to upgrade, say your Kubernetes cluster but 1 of Submariner or Istio doesn’t support it yet.

So, ever since I found out that Cilium can also handle multi-cluster networking, I’ve been eyeing it rather lustfully for multi-cluster networking.

To see if it works, we’ll have 2 use cases that we want to test:

  1. a stateless workload that can be invoked cross-cluster. We’ll use the example in Cilium.
  2. a stateful workload that can be invoked cross-cluster. In this case, we’ll use Coherence.

Let’s get started.

Create your OKE clusters

We’ll use the Terraform OKE module to create the clusters. I‘ve published a separate post to illustrate how you can use the Terraform OKE module to create multiple clusters in different regions and peer their VCNs. Create 2 OKE clusters, enable their DRGs and peer their RPCs. The Terraform OKE module can create all of this for you and all you would need to do is just establish a connection between the RPCs. We’ll need a bastion and the operator host in 1 region only. The diagram below illustrates what we want to achieve in terms of infrastructure:

2 OKE clusters in different OCI regions

If you are not sure how to do that, follow this article’s section on establishing connectivity with RPCs using the OCI console.

Verify you have connectivity from the operator host to both clusters:

## cluster 1 - paris
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
10.1.124.240 Ready node 13m v1.27.2
10.1.68.8 Ready node 13m v1.27.2

## cluster 2 - amsterdam
kubectl get nodes
NAME STATUS ROLES AGE VERSION
10.2.73.222 Ready node 13m v1.27.2
10.2.91.83 Ready node 13m v1.27.2

For convenience, I use kubectx to rename the Kubernetes contexts to something more user friendly:

kubectx c1=<replace-with-paris-cluster-context>
kubectx c2=<replace-with-amsterdam-cluster-context>

Add the Cilium helm repo and generate the manifest locally for editing:

helm repo add cilium https://helm.cilium.io/
helm show values cilium/cilium > c1-cilium.yaml

When using Cilium multi-cluster, it’s important that each cluster has its own name and id. Modify the c1-cilium.yaml and change the default to the following values:

cluster:
name: c1
id: 1
containerRuntime:
integration: crio
hubble:
tls:
enabled: false
hubble:
relay:
enabled: true
hubble:
ui:
enabled: true
ipam:
mode: "kubernetes"

Copy the c1-cilium.yaml to c2-cilium.yaml and change the name and id:

cluster:
name: c2
id: 2

We can now deploy Cilium in both OKE clusters:

for c in c1 c2; do
kubectx $c
helm install cilium cilium/cilium --namespace=kube-system -f $c-cilium.yaml
done

While Cilium is being deployed, install the Cilium cli on the operator host:

CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/master/stable.txt)
CLI_ARCH=amd64
if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}

Check Cilium’s status in both clusters:

for c in c1 c2; do
kubectx $c
cilium status
done

Delete the pods unmanaged by Cilium:

for c in c1 c2; do
kubectx $c
kubectl delete pod --namespace kube-system -l k8s-app=kube-dns
kubectl delete pod --namespace kube-system -l k8s-app=hubble-relay
kubectl delete pod --namespace kube-system -l k8s-app=hubble-ui
kubectl delete pod --namespace kube-system -l k8s-app=kube-dns-autoscaler
done

And finally delete the flannel DaemonSet:

for c in c1 c2; do
kubectx $c
kubectl delete -n kube-system daemonset kube-flannel-ds
done

At this point, both clusters are using Cilium as CNI, albeit disconnected from each other.

OKE clusters with Cilium, in disconnected mode

Connecting the clusters using Cilium’s clustermesh

When using clustermesh, there are a few requirements, but mainly:

  1. All clusters must have the same datapath. In our case, we have chosen Encapsulation for both.
  2. The Pod CIDR ranges in all clusters and all nodes must be non-conflicting and unique IP addresses.
  3. Nodes in all clusters must have IP connectivity between each other using the configured InternalIP for each node. This requirement is typically met by establishing peering or VPN tunnels between the networks of the nodes of each cluster. We have already achieved this by creating DRGs, RPCs and the necessary routing rules.
  4. The network between clusters must allow the inter-cluster communication. The exact ports are documented in the Firewall Rules section.

For the purpose of this exercise, add ingress and egress rules to the worker NSGs in both clusters to allow all protocols but in an actual deployment, you should have a tighter set of rules. We’ll address this in a future post too.

Modify the cilium helm values files and change the following:

clustermesh:
useAPIServer: true

and redeploy cilium in both clusters:

for c in c1 c2; do
kubectx $c
helm upgrade cilium cilium/cilium --namespace=kube-system -f $c-cilium.yaml
done

Check cluster mesh:

for c in c1 c2; do
kubectx $c
cilium clustermesh status
done

Switched to context "c1".
✅ Service "clustermesh-apiserver" of type "LoadBalancer" found
✅ Cluster access information is available:
- 10.1.2.22:2379
✅ Deployment clustermesh-apiserver is ready
🔌 No cluster connected
🔀 Global services: [ min:0 / avg:0.0 / max:0 ]
Switched to context "c2".
✅ Service "clustermesh-apiserver" of type "LoadBalancer" found
✅ Cluster access information is available:
- 10.2.2.29:2379
✅ Deployment clustermesh-apiserver is ready
🔌 No cluster connected
🔀 Global services: [ min:0 / avg:0.0 / max:0 ]

You can now connect the clusters:

cilium clustermesh connect --context c1 --destination-context c2

✅ Detected Helm release with Cilium version 1.14.6
✨ Extracting access information of cluster c2...
🔑 Extracting secrets from cluster c2...
ℹ️ Found ClusterMesh service IPs: [10.2.2.29]
✨ Extracting access information of cluster c1...
🔑 Extracting secrets from cluster c1...
ℹ️ Found ClusterMesh service IPs: [10.1.2.22]
⚠️ Cilium CA certificates do not match between clusters. Multicluster features will be limited!
ℹ️ Configuring Cilium in cluster 'cluster-cwy5kuwgblq' to connect to cluster 'cluster-c73jeogg4ha'
ℹ️ Configuring Cilium in cluster 'cluster-c73jeogg4ha' to connect to cluster 'cluster-cwy5kuwgblq'
✅ Connected cluster cluster-cwy5kuwgblq and cluster-c73jeogg4ha!

Check the cluster mesh again:

for c in c1 c2; do
kubectx $c
cilium clustermesh status
done

Switched to context "c1".
✅ Service "clustermesh-apiserver" of type "LoadBalancer" found
✅ Cluster access information is available:
- 10.1.2.22:2379
✅ Deployment clustermesh-apiserver is ready
✅ All 2 nodes are connected to all clusters [min:1 / avg:1.0 / max:1]
🔌 Cluster Connections:
- c2: 2/2 configured, 2/2 connected
🔀 Global services: [ min:0 / avg:0.0 / max:0 ]
Switched to context "c2".
✅ Service "clustermesh-apiserver" of type "LoadBalancer" found
✅ Cluster access information is available:
- 10.2.2.29:2379
✅ Deployment clustermesh-apiserver is ready
✅ All 2 nodes are connected to all clusters [min:1 / avg:1.0 / max:1]
🔌 Cluster Connections:
- c1: 2/2 configured, 2/2 connected
🔀 Global services: [ min:0 / avg:0.0 / max:0 ]

Run the pod connectivity test between the clusters:

cilium connectivity test --context c1 --multi-cluster c2

........

✅ All 45 tests (292 actions) successful, 19 tests skipped, 1 scenarios skipped.

This will take some time. At this point, our clusters are now connected.

2 OKE clusters connected via Cilium

Stateless workload load balancing across clusters

Follow the example in Cilium’s documentation and you should be able to reach the services from the local and remote cluster both from the Paris and Amsterdam clusters:

kubectx c1
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.14.6/examples/kubernetes/clustermesh/global-service-example/cluster1.yaml

kubectx c2
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.14.6/examples/kubernetes/clustermesh/global-service-example/cluster2.yaml

From either cluster and you should see reply from both clusters:

kubectx c1
kubectl exec -ti deployment/x-wing -- curl rebel-base
{"Galaxy": "Alderaan", "Cluster": "Cluster-2"}

kubectl exec -ti deployment/x-wing -- curl rebel-base
{"Galaxy": "Alderaan", "Cluster": "Cluster-1"}


kubectx c2
kubectl exec -ti deployment/x-wing -- curl rebel-base
{"Galaxy": "Alderaan", "Cluster": "Cluster-1"}
[opc@o-sqpycn ~]$ kubectl exec -ti deployment/x-wing -- curl rebel-base
{"Galaxy": "Alderaan", "Cluster": "Cluster-2"}

At this point, clients can invoke services across clusters.

Invoking stateless services within and cross-cluster

Testing a headless service using Coherence across clusters

This part is a lot more challenging as there are several layers, software and protocols involved.

You might want to go through the primer on Coherence and running it on Kubernetes first with the Oracle Coherence Operator. The main challenge is that Coherence uses a headless service and headless services do not have an IP address. Instead, they return the IP addresses of the pods that match their selectors.

Now, when we used Submariner, the latter would change the CoreDNS default configuration. When running the soak test, we found that OKE would at some point reconcile the default CoreDNS configuration and thus the Coherence pods in the different OKE clusters would lose connection with each other. We also mentioned that OKE also makes it possible to provide and persist additional CoreDNS configuration. But when using Submariner, the service discovery was being handled by the Lighthouse DNS Server. Essentially, in each OKE cluster, the Coherence cluster would be given the other region’s Well Known Address service. It would look that up using DNS and since it is a headless service, the Coherence pods IP Address would be returned. How to achieve the same thing with Cilium?

Another user was looking to do the same thing with Kubernetes, vSphere and CoakroachDB and his method basically consisted of chaining the DNS between the 2 clusters. In order to achieve this, kube-dns must be exposed as a Load Balancer. We’ll adopt the same approach but with a bit of variation that OKE and OCI allows us to:

  1. We’ll use the OCI Network Load Balancer since DNS uses UDP traffic. Another user doing a similar experiment also used NodePort and DNS over TCP. I wasn’t too fond of that. Also, we don’t need to reconfigure the cluster’s own internal DNS. However, his rewrite suggestion would prove to be useful for CoreDNS configuration and Coherence’s Well Known Address service.
  2. We’ll use an internal load balancer. This will ensure that all DNS traffic will not traverse over the public internet. Instead, they’ll traverse via the Remote Peering Connection established between the 2 VCNs.
  3. We can place our network load balancer in a private subnet and choose the private load balancer NSG for an enhanced security posture.

To achieve this, we’ll use OCI CCM annotations, for both c1 and c2. Create 2 files called kubedns-c1.yaml and kubedns-c2.yaml and add the following:

# c1
apiVersion: v1
kind: Service
metadata:
annotations:
oci.oraclecloud.com/load-balancer-type: "nlb"
oci-network-load-balancer.oraclecloud.com/internal: "true"
oci-network-load-balancer.oraclecloud.com/subnet: "ocid1.subnet.oc1.eu-paris-1.aaaaaaaauchf2ivo4fixwffqich7u2wxp3j466y4ovhvwwfua7dev5jsalya"
oci-network-load-balancer.oraclecloud.com/oci-network-security-groups: "ocid1.networksecuritygroup.oc1.eu-paris-1.aaaaaaaatu7cubdluvvru425bi6j32udxp22osz3f4rmkvlevg2oy3bjohea"
labels:
k8s-app: kube-dns
name: kube-dns-lb
namespace: kube-system
spec:
ports:
- name: dns
port: 53
protocol: UDP
targetPort: 53
selector:
k8s-app: kube-dns
sessionAffinity: None
type: LoadBalancer

Make sure you use appropriate subnet and NSG OCIDs for both regions. Once done, you can deploy them:

for c in c1 c2; do
kubectx $c
kubectl apply -f kubedns-$c.yaml
done

Now that our NLB are created, note their private IP addresses. We are interested in their External-IPs:

for c in c1 c2; do
kubectx $c
kubectl -n kube-system get svc kube-dns-lb
done

Switched to context "c1".
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns-lb LoadBalancer 10.101.72.168 10.1.2.11 53:30660/UDP 35s
Switched to context "c2".
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns-lb LoadBalancer 10.102.131.206 10.2.2.11 53:30047/UDP 31s

We’ll now use these IP addresses to provide persistent CoreDNS configuration to each cluster. c1’s External-IP will go into c2’s CoreDNS configuration and vice-versa.

Create CoreDNS configuration for c1:

# coredns-c1.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns-custom
namespace: kube-system
data:
c1.server: | # All custom server files must have a “.server” file extension.
cluster.c2:53 {
rewrite name substring cluster.c2 svc.cluster.local
# Change to c2 load balancer private ip
forward . 10.2.2.11
}

Create CoreDNS config for c2:

# coredns-c2.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns-custom
namespace: kube-system
data:
c2.server: | # All custom server files must have a “.server” file extension.
cluster.c1:53 {
rewrite name substring cluster.c1 svc.cluster.local
# Change to c1 load balancer private ip
forward . 10.1.2.11
}

We can now create the CoreDNS ConfigMap for each cluster:

for c in c1 c2; do
kubectx $c
kubectl apply -f coredns-$c.yaml
done

By default, OKE’s CoreDNS automatically look for a ConfigMap called “coredns-custom”. If you haven’t defined it, it’s not a big deal and it will assume you don’t have any additional CoreDNS configuration. But if it finds one, it will merge it with its own default configuration.

To apply the additional DNS configuration, we must reload the CoreDNS pods:

for c in c1 c2; do
kubectx $c
kubectl delete pod --namespace kube-system -l k8s-app=kube-dns
done
Cross-cluster DNS lookup via private NLBs

For good measure, let’s check cilium multi-cluster status again:

for c in c1 c2; do
kubectx $c
cilium clustermesh status
done

Switched to context "c1".
✅ Service "clustermesh-apiserver" of type "LoadBalancer" found
✅ Cluster access information is available:
- 10.1.2.22:2379
✅ Deployment clustermesh-apiserver is ready
✅ All 2 nodes are connected to all clusters [min:1 / avg:1.0 / max:1]
🔌 Cluster Connections:
- c2: 2/2 configured, 2/2 connected
🔀 Global services: [ min:2 / avg:2.0 / max:2 ]
Switched to context "c2".
✅ Service "clustermesh-apiserver" of type "LoadBalancer" found
✅ Cluster access information is available:
- 10.2.2.29:2379
✅ Deployment clustermesh-apiserver is ready
✅ All 2 nodes are connected to all clusters [min:1 / avg:1.0 / max:1]
🔌 Cluster Connections:
- c1: 2/2 configured, 2/2 connected
🔀 Global services: [ min:2 / avg:2.0 / max:2 ]

We are now ready to deploy Coherence.

Deploying a stretched Coherence cluster

Let’s first install the Coherence operator in both clusters:

helm repo add coherence https://oracle.github.io/coherence-operator/charts
helm repo update

for cluster in c1 c2; do
kubectx $cluster
helm install coherence-operator --namespace coherence-operator coherence/coherence-operator --create-namespace
done

And the namespace for Coherence in both OKE clusters:

for cluster in c1 c2; do
kubectx $cluster
kubectl create ns storage
done

Then, let’s create a Coherence Well Known Address (wka) service for both clusters:

# coherence-wka.yaml
apiVersion: v1
kind: Service
metadata:
name: coherence-wka
annotations:
service.cilium.io/global: "true"
namespace: storage
labels:
coherenceCluster: australis
coherenceComponent: coherenceWkaService
coherenceDeployment: storage
coherenceRole: storage
spec:
type: ClusterIP
clusterIP: None
ports:
- name: tcp-coherence
port: 7
protocol: TCP
targetPort: 7
- name: coh-local
port: 7575
protocol: TCP
targetPort: 7575
- name: coh-cluster
port: 7574
protocol: TCP
targetPort: 7574
- name: health
port: 6676
protocol: TCP
targetPort: 6676
publishNotReadyAddresses: true
selector:
coherenceCluster: australis
coherenceComponent: coherencePod
coherenceWKAMember: "true"

Apply the manifest to create namespace and the WKA service in each cluster:

for c in c1 c2; do
kubectx $c
kubectl apply -f coherence-wka.yaml
done

Let’s now define our Coherence cluster in c1:

# coherence-cluster-c1.yaml
apiVersion: coherence.oracle.com/v1
kind: Coherence
metadata:
name: storage-c1
namespace: storage
spec:
cluster: australis
replicas: 3
readinessProbe:
initialDelaySeconds: 30
jvm:
args:
- "-Dcoherence.wka=coherence-wka,coherence-wka.storage.cluster.c2"

and create it:

kubectx c1
kubectl apply -f coherence-cluster-c1.yaml

Check the logs:

kubectl -n storage logs -f storage-c1-0

In the logs, you should find this:

2024-01-19 06:19:26.787/1.779 Oracle Coherence CE 22.06.6 <Info> (thread=Coherence, member=n/a): The ConfigurableAddressProvider is skipping the unresolvable address "coherence-wka.storage.cluster.c2:0".
2024-01-19 06:19:26.790/1.782 Oracle Coherence CE 22.06.6 <Info> (thread=Coherence, member=n/a): The ConfigurableAddressProvider is skipping the unresolvable address "coherence-wka.storage.cluster.c2:0".

That’s because at this point in time, although the coherence-wka.cluster.c2 has been created, it does not resolve to any pod IP yet. But the Coherence cluster should create without any problem:

WellKnownAddressList(
10.201.0.117
10.201.0.237
10.201.0.95
)

MasterMemberSet(
ThisMember=Member(Id=3, Timestamp=2024-01-19 06:19:28.906, Address=10.201.0.117:7575, MachineId=58919, Location=site:EU-PARIS-1-AD-1,rack:EU-PARIS-1-AD-1-FAULT-DOMAIN-2,machine:10.1.124.240,process:48,member:storage-c1-0, Role=storage-c1)
OldestMember=Member(Id=1, Timestamp=2024-01-19 06:19:25.53, Address=10.201.0.237:7575, MachineId=25176, Location=site:EU-PARIS-1-AD-1,rack:EU-PARIS-1-AD-1-FAULT-DOMAIN-1,machine:10.1.68.8,process:48,member:storage-c1-1, Role=storage-c1)
ActualMemberSet=MemberSet(Size=3
Member(Id=1, Timestamp=2024-01-19 06:19:25.53, Address=10.201.0.237:7575, MachineId=25176, Location=site:EU-PARIS-1-AD-1,rack:EU-PARIS-1-AD-1-FAULT-DOMAIN-1,machine:10.1.68.8,process:48,member:storage-c1-1, Role=storage-c1)
Member(Id=2, Timestamp=2024-01-19 06:19:28.847, Address=10.201.0.95:7575, MachineId=58919, Location=site:EU-PARIS-1-AD-1,rack:EU-PARIS-1-AD-1-FAULT-DOMAIN-2,machine:10.1.124.240,process:48,member:storage-c1-2, Role=storage-c1)
Member(Id=3, Timestamp=2024-01-19 06:19:28.906, Address=10.201.0.117:7575, MachineId=58919, Location=site:EU-PARIS-1-AD-1,rack:EU-PARIS-1-AD-1-FAULT-DOMAIN-2,machine:10.1.124.240,process:48,member:storage-c1-0, Role=storage-c1)
)
MemberId|ServiceJoined|MemberState|Version|Edition
1|2024-01-19 06:19:25.53|JOINED|22.06.6|CE,
2|2024-01-19 06:19:28.847|JOINED|22.06.6|CE,
3|2024-01-19 06:19:28.906|JOINED|22.06.6|CE
RecycleMillis=1200000
RecycleSet=MemberSet(Size=0
)
)

Let’s now define our cluster in c2:

# coherence-cluster-c2.yaml
apiVersion: coherence.oracle.com/v1
kind: Coherence
metadata:
name: storage-c2
namespace: storage
spec:
cluster: australis
replicas: 3
readinessProbe:
initialDelaySeconds: 30
jvm:
args:
- "-Dcoherence.wka=coherence-wka,coherence-wka.storage.cluster.c1"

And create it:

kubectx c2
kubectl apply -f coherence-cluster-c2.yaml

When the Coherence pods have started in c2, you should now be able to see 3 more members for a total of 6 members in the Coherence cluster:

WellKnownAddressList(
10.201.0.117
10.202.0.194
10.202.0.86
10.201.0.237
10.202.0.155
10.201.0.95
)

MasterMemberSet(
ThisMember=Member(Id=4, Timestamp=2024-01-19 06:20:51.752, Address=10.202.0.86:7575, MachineId=44043, Location=site:eu-amsterdam-1-AD-1,rack:eu-amsterdam-1-AD-1-FAULT-DOMAIN-1,machine:10.2.73.222,process:48,member:storage-c2-0, Role=storage-c2)
OldestMember=Member(Id=1, Timestamp=2024-01-19 06:19:25.53, Address=10.201.0.237:7575, MachineId=25176, Location=site:EU-PARIS-1-AD-1,rack:EU-PARIS-1-AD-1-FAULT-DOMAIN-1,machine:10.1.68.8,process:48,member:storage-c1-1, Role=storage-c1)
ActualMemberSet=MemberSet(Size=4
Member(Id=1, Timestamp=2024-01-19 06:19:25.53, Address=10.201.0.237:7575, MachineId=25176, Location=site:EU-PARIS-1-AD-1,rack:EU-PARIS-1-AD-1-FAULT-DOMAIN-1,machine:10.1.68.8,process:48,member:storage-c1-1, Role=storage-c1)
Member(Id=2, Timestamp=2024-01-19 06:19:28.847, Address=10.201.0.95:7575, MachineId=58919, Location=site:EU-PARIS-1-AD-1,rack:EU-PARIS-1-AD-1-FAULT-DOMAIN-2,machine:10.1.124.240,process:48,member:storage-c1-2, Role=storage-c1)
Member(Id=3, Timestamp=2024-01-19 06:19:28.906, Address=10.201.0.117:7575, MachineId=58919, Location=site:EU-PARIS-1-AD-1,rack:EU-PARIS-1-AD-1-FAULT-DOMAIN-2,machine:10.1.124.240,process:48,member:storage-c1-0, Role=storage-c1)
Member(Id=4, Timestamp=2024-01-19 06:20:51.752, Address=10.202.0.86:7575, MachineId=44043, Location=site:eu-amsterdam-1-AD-1,rack:eu-amsterdam-1-AD-1-FAULT-DOMAIN-1,machine:10.2.73.222,process:48,member:storage-c2-0, Role=storage-c2)
)
MemberId|ServiceJoined|MemberState|Version|Edition
1|2024-01-19 06:19:25.53|JOINED|22.06.6|CE,
2|2024-01-19 06:19:28.847|JOINED|22.06.6|CE,
3|2024-01-19 06:19:28.906|JOINED|22.06.6|CE,
4|2024-01-19 06:20:51.752|JOINED|22.06.6|CE
RecycleMillis=1200000
RecycleSet=MemberSet(Size=0
)
)

....
2024-01-19 06:20:59.289/9.410 Oracle Coherence CE 22.06.6 <Info> (thread=Proxy:$SYS:ConcurrentProxy, member=4): Member 6 joined Service $SYS:ConcurrentProxy with senior member 2
.
.
.
2024-01-19 06:21:01.288/11.409 Oracle Coherence CE 22.06.6 <Info> (thread=DistributedCache:$SYS:Concurrent, member=4): Partition ownership has stabilized with 6 nodes

That’s because when we started the c2 cluster, we instruct it to to join an existing cluster “australis” and it will find exising Coherence members at the WKA addresses:

cluster: australis
args:
- "-Dcoherence.wka=coherence-wka,coherence-wka.storage.cluster.c1"

“coherence-wka” points to the headless service defined in its own namespace and cluster whereas. When this service is looked up, this corresponds to the following Pod IP addresses: 10.202.0.194, 10.202.0.86, 10.202.0.155.

With “coherence-wka.storage.cluster.c1”, a few things happened

cluster.c1:53 {
rewrite name substring cluster.c1 svc.cluster.local
# Change to c1 load balancer private ip
forward . 10.1.2.11
}
  1. CoreDNS will apply the rewrite directive. Thus, “coherence-wka.storage.cluster.c1” becomes “coherence-wka.storage.svc.cluster.local”
  2. the query to look up “coherence-wka.storage.svc.cluster.local” is then forwarded to 10.1.2.11 which is the Network Load Balancer exposing the CoreDNS service in c1.

This query then returns the following IP addresses: 10.201.0.117, 10.201.0.237, 10.201.0.95. If you interested in the full gory mechanics, here is how it works:

Many folks tend to prefer simplified diagrams and there’s certainly a place for them e.g. when time is short or you are want to focus on certain aspects of a problem only. However, it is also critical we understand all the layers in play, how they interact and depend on each other. An eon ago, when I did Newtonian mechanics in high school, my Maths teachers insisted I show and explain all the forces in play with a diagram before working out the actual answer. Not only would it make solving the problem simpler but it would also smoke out any flaw in the solution. With the detailed diagram, we similarly smoke out the challenges at the different software layers in play and how to effectively solve them. So, hat-tip to my Maths teachers.

Combined together, this gives us the full list of Coherence members in both clusters:

WellKnownAddressList(
10.201.0.117
10.202.0.194
10.202.0.86
10.201.0.237
10.202.0.155
10.201.0.95
)

And we can now see 6 members in total in the cluster:

....
2024-01-19 06:20:59.289/9.410 Oracle Coherence CE 22.06.6 <Info> (thread=Proxy:$SYS:ConcurrentProxy, member=4): Member 6 joined Service $SYS:ConcurrentProxy with senior member 2
.
.
.
2024-01-19 06:21:01.288/11.409 Oracle Coherence CE 22.06.6 <Info> (thread=DistributedCache:$SYS:Concurrent, member=4): Partition ownership has stabilized with 6 nodes

Effectively, the c2 Coherence members join the existing “australis” cluster consisting of members running in different Kubernetes clusters in different regions:

Simplified stretched Coherence cluster architecture in multiple OCI regions

From here, it would be fascinating to run a performance test similar to what we did before when we used Flannel+Submariner and compare Coherence’s latency in a multi-cluster scenario when using Cilium.

Another area that would be intesting to investigate with Cilium’s Clustermesh is what if we want to add a 3rd OKE cluster, say, in Frankfurt (c3)? Do we need to connect c1-c2(already done in this article) and only add c1-c3? Or, do we also need to connect c2 and c3? We’ll explore all of this in future posts too.

Conclusion

In this article, we used Cilium’s multi-cluster capability together with OKE, OCI’s Network Load Balancer to achieve cross-cluster high availability for cloud native workloads, including stateful ones. In a future post, we’ll look into cross-region performance and compare this with Submariner’s result.

I would like to thank my colleagues Shaun Levey, Jared Greenwald, Julian Ortiz, Tim Middleton and Jonathan Knight for their contributions to this article.

I would also like to acknowledge the following articles which were very helpful for to better understanding Cilium’s clustermesh, especially its behaviour with stateful workloads better:

--

--