Deploying a stretched Coherence Cluster across multiple OCI regions with OKE

Published in

Oracle Developers

15 min readMar 31, 2023

I mentioned before that there’s a very strong momentum with OKE, Oracle’s managed Kubernetes service in the cloud. Recent enhancements to support this momentum include serverless Kubernetes, workload identity, increased support of up to 2000 nodes (twice as before) as well as integration with OCI KMS via Secrets Store. Each of these deserves a post on their own and I’ll tackle them in due time. What is driving many of these enhancements is the continued and strong adoption of OKE as a managed Kubernetes service as well as the increasingly challenging problems our customers are trying to solve with OKE.

One of these is whether it is possible to distribute a stateful application across multiple OCI regions and clusters. The applications in question are Oracle’s Coherence, né Tangosol, and HashiCorp’s Vault. For the purpose of this article, I’ll use Coherence but you can apply the same approach for your stateful application.

A quick primer on Coherence

Coherence is an in-memory data grid that allows you to store data in serialized key-value pairs. It supports a number of caching strategies (e.g. read-through, write-through, write-behind and refresh ahead) as well as different types of caches (distributed, replicated, optimistic, near, view local, remote). It’s scalable, supports asynchronous event streaming to help you build event-driven architectures using different programming languages (Java, C++, C#, JavaScript) and multi-site data federation but above all, it’s fast. Really fast. You can use it as a fast data access layer in your micro-services application with any of the above languages or using REST as well as for in-memory session management.

It’s a little jewel in Oracle’s middleware stack. Customers, especially those in the financial and telecommunications industry where latency is paramount, swear by it. You can read this article by the Coherence team for a more detailed introduction.

Coherence on Kubernetes

A couple of years ago, Oracle also open-sourced it by releasing a Community Edition as well as a Kubernetes Operator for it. Deploying Coherence via its operator on Kubernetes has several advantages. Besides not having to figure out the mechanics required to make it work and scale in Kubernetes, it also comes with a few cloud native bells and whistles including monitoring via Prometheus and Grafana, logging via fluentd and OpenSearch as well as tracing via OpenTracing and Jaeger.

Additionally, you can configure several of Coherence’s behaviour via plain YAML manifests and use of all Kubernetes’ scaling features, including leveraging Horizontal Pod Autoscaler (HPA) for autoscaling your Coherence clusters, which means you can include it as part of your deployment pipeline. The Coherence Operator is also packaged as part of the Verrazzano Container Platform. We make it Istio-ready and include a few neat Grafana dashboards too that tells you how well your Coherence cluster is performing. All you need to do is create a Coherence cluster and start using it to give your micro-services a speed boost. There are a couple of examples of how to do this on Oracle’s GitHub page, including examples for Springboot, Micronaut and Helidon.

As you probably want your data to persist beyond the lifetime of your pods, you can use PersistentVolumes for Coherence’s data persistence. Given the need to maintain state and to discover peer members of the Coherence cluster, we deploy Coherence in Kubernetes using a StatefulSet and access it via a Headless service.

Coming back to our customers’ question: we want to use Kubernetes to deploy a Coherence cluster but we also want to make the Coherence cluster span multiple Kubernetes clusters, preferably in different geographical locations and maybe even cloud providers.

In order to achieve this, we need to:

deploy Coherence members in different Kubernetes clusters. As you have probably figured by now, I have a soft spot for OKE, so for the purpose of this article, we’ll use an all-OKE set of clusters in different regions.
make the new members all join the same Coherence cluster
make the Coherence services discoverable across clusters

We evaluated using Istio in multi-primary, multi-networks model to make the Coherence services discoverable across clusters. While this model deploys and works well with OKE in multiple clusters for normal ClusterIP services, unfortunately the model itself doesn’t work that well when headless services are used.

So, to achieve service discovery across clusters, we are going to use the Submariner project.

Quick primer on Submariner

From the horse’s mouth:

Submariner enables direct networking between Pods and Services in different Kubernetes clusters, either on-premises or in the cloud.

Very conveniently, Submariner is also open source, a CNCF-project (albeit still at sandbox level) and is CNI-agnostic. It provides Layer 3 connectivity, service discovery across clusters, encryption with both IPSec and Wireguard and can even connect clusters with overlapping CIDRs. It includes an automated verification suite that tests both usual ClusterIP and Headless services. For monitoring, it exports Prometheus metrics which you can then visualize in Grafana. In the list of CNCF projects, it’s categorized as a CNI. After evaluating it, I tend to think of it more as a kind of service mesh for networks.

Let’s set it up.

Creating 3 OKE clusters and peering the VCNs

For the purpose of our test, we are going to set up 3 OKE Clusters in 3 different regions with the following network parameters:

Specify 2 node pools for each cluster:

1 node pool for the Submariner gateways
1 node pool for the Coherence workload

You can use the Terraform OKE module to set up your clusters.

To further secure our inter-cluster traffic and not rely on just encryption, we’ll also peer the VCNs in a mesh architecture using OCI Remote Peering Connection (RPC). This means we need to create 2 RPCs for each DRG and peer them e.g. in London:

In order to route traffic to each cluster, we also need to add routing rules so that each cluster can route to each other. You can use the OKE module to add the custom routing. However, for the RPCs, you need to add them manually today. That’s something we are working to add to the DRG and the OKE modules soon.

Once the VCNs are peered, we also need to add a few security rules to each Default security list to allow Submariner traffic. I’m being a bit liberal here for the purpose of this article but you should tighten your rules, especially with respect to Source and Destination CIDRs:

Obtain the kubeconfigs of each cluster and combine them. If you use the Terraform OKE module, you only need to enable the bastion and operator in 1 region. You can then use the OCI console to obtain the command to retrieve the kubeconfig.

Once you’ve retrieved the kubeconfigs, take the opportunity to rename the context and cluster to simplify the usage. You’ll then be able to refer to your contexts and clusters more easily as well as use tools such as kubectx:

$ kubectx
amsterdam
london
newport

We can now install Submariner.

Installing Submariner

Install the Submariner CLI:

curl -Ls https://get.submariner.io | bash
export PATH=$PATH:~/.local/bin
echo export PATH=\$PATH:~/.local/bin >> ~/.profile

Deploy the Submariner broker in 1 of the clusters:

subctl --context london deploy-broker

Join the clusters:

for cluster in london newport amsterdam; do
  subctl join --context $cluster broker-info.subm --cluster $cluster --clusterid $cluster --air-gapped 
done

Let’s verify the connectivity between the clusters:

subctl show connections
Cluster "london"
 ✓ Showing Connections
GATEWAY                          CLUSTER     REMOTE IP      NAT   CABLE DRIVER   SUBNETS                        STATUS      RTT avg.
oke-c37pum5rb4a-nd5uryg3qya-sn   newport     10.15.114.98   no    libreswan      10.115.0.0/16, 10.215.0.0/16   connected   4.209608ms
oke-c7rif3blr5q-n2pknt5pjlq-sf   amsterdam   10.9.124.98    no    libreswan      10.109.0.0/16, 10.209.0.0/16   connected   6.249836ms

Cluster "newport"
 ✓ Showing Connections
GATEWAY                          CLUSTER     REMOTE IP      NAT   CABLE DRIVER   SUBNETS                        STATUS      RTT avg.
oke-coacvk6yuha-ni5mc52dlba-sm   london      10.0.100.212   no    libreswan      10.100.0.0/16, 10.200.0.0/16   connected   4.349712ms
oke-c7rif3blr5q-n2pknt5pjlq-sf   amsterdam   10.9.124.98    no    libreswan      10.109.0.0/16, 10.209.0.0/16   connected   11.295183ms

Cluster "amsterdam"
 ✓ Showing Connections
GATEWAY                          CLUSTER   REMOTE IP      NAT   CABLE DRIVER   SUBNETS                        STATUS      RTT avg.
oke-coacvk6yuha-ni5mc52dlba-sm   london    10.0.100.212   no    libreswan      10.100.0.0/16, 10.200.0.0/16   connected   6.191235ms
oke-c37pum5rb4a-nd5uryg3qya-sn   newport   10.15.114.98   no    libreswan      10.115.0.0/16, 10.215.0.0/16   connected   11.315007ms

$ subctl show gateways
Cluster "amsterdam"
 ✓ Showing Gateways
NODE                             HA STATUS   SUMMARY
oke-c7rif3blr5q-n2pknt5pjlq-sf   active      All connections (2) are established

Cluster "london"
 ✓ Showing Gateways
NODE                             HA STATUS   SUMMARY
oke-coacvk6yuha-ni5mc52dlba-sm   active      All connections (2) are established

Cluster "newport"
 ✓ Showing Gateways
NODE                             HA STATUS   SUMMARY
oke-c37pum5rb4a-nd5uryg3qya-sn   active      All connections (2) are established

We can also verify the endpoints and networks:

$ subctl show endpoints
Cluster "amsterdam"
 ✓ Showing Endpoints
CLUSTER     ENDPOINT IP    PUBLIC IP   CABLE DRIVER   TYPE
amsterdam   10.9.124.98                libreswan      local
london      10.0.100.212               libreswan      remote
newport     10.15.114.98               libreswan      remote

Cluster "london"
 ✓ Showing Endpoints
CLUSTER     ENDPOINT IP    PUBLIC IP   CABLE DRIVER   TYPE
london      10.0.100.212               libreswan      local
newport     10.15.114.98               libreswan      remote
amsterdam   10.9.124.98                libreswan      remote

Cluster "newport"
 ✓ Showing Endpoints
CLUSTER     ENDPOINT IP    PUBLIC IP   CABLE DRIVER   TYPE
newport     10.15.114.98               libreswan      local
london      10.0.100.212               libreswan      remote
amsterdam   10.9.124.98                libreswan      remote

$ subctl show networks
Cluster "amsterdam"
 ✓ Showing Network details
    Discovered network details via Submariner:
        Network plugin:  flannel
        Service CIDRs:   [10.109.0.0/16]
        Cluster CIDRs:   [10.209.0.0/16]

Cluster "london"
 ✓ Showing Network details
    Discovered network details via Submariner:
        Network plugin:  flannel
        Service CIDRs:   [10.100.0.0/16]
        Cluster CIDRs:   [10.200.0.0/16]

Cluster "newport"
 ✓ Showing Network details
    Discovered network details via Submariner:
        Network plugin:  flannel
        Service CIDRs:   [10.115.0.0/16]
        Cluster CIDRs:   [10.215.0.0/16]

Submariner also includes a battery of verification tests that will check the ability to discover and access the different service types across clusters. 1 issue is that the verification command cannot seem to use combined kubeconfig file. So, you’ll need to download them again, make sure to keep them separate, rename the context and set the KUBECONFIG variable again:

export KUBECONFIG=.kube/london:.kube/amsterdam:.kube/newport

So, let’s use them to verify:

subctl verify --context london --tocontext newport --tocontext amsterdam --only connectivity,service-discovery --verbose
.
.
.
Ran 25 of 44 Specs in 637.649 seconds
SUCCESS! -- 25 Passed | 0 Failed | 0 Pending | 19 Skipped

This will take a bit of time. Grab a beverage of your choice and relax. Once, all the tests have completed successfully, we can now deploy Coherence.

Deploying Coherence in 3 OKE clusters

We’ll use the Coherence Operator to deploy our clusters so let’s first install it in each cluster using its helm chart:

export KUBECONFIG=~/.kube/config
helm repo add coherence https://oracle.github.io/coherence-operator/charts
helm repo update

for cluster in london newport amsterdam; do
  kubectx $cluster 
  helm install coherence-operator --namespace coherence-operator coherence/coherence-operator --create-namespace
done

Once the Coherence Operator is successfully deployed, create a namespace for our Coherence cluster deployment in each cluster:

for cluster in london newport amsterdam; do
  kubectx $cluster
  kubectl create ns coherence-test
done

We are now ready to deploy our cluster. Since we also want the Coherence service to be discoverable across different clusters, we’ll create the Coherence Well-Known Address service first and “export” it using Submariner. Exporting the service makes it discoverable from different clusters.

apiVersion: v1
kind: Service
metadata:
  name: submariner-wka-${CLUSTER}
  namespace: coherence-test
  labels:
    coherenceCluster: storage
    coherenceComponent: coherenceWkaService
    coherenceDeployment: storage
    coherenceRole: storage
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - name: tcp-coherence
    port: 7
    protocol: TCP
    targetPort: 7
  publishNotReadyAddresses: true
  selector:
    coherenceCluster: storage
    coherenceComponent: coherencePod
    coherenceWKAMember: "true"

Create the WKA service in each cluster:

for cluster in london newport amsterdam; do
  kubectx $cluster
  CLUSTER=$cluster envsubst < submariner-wka-svc.yaml | kubectl apply -f -
done

And export them to make the Coherence service discoverable:

for cluster in london newport amsterdam; do
  subctl --context $cluster export service --namespace coherence-test submariner-wka-$cluster
done

We can now verify if all the Coherence WKA services has been imported (i.e. are discoverable) from other clusters:

for cluster in london newport amsterdam; do
  kubectx $cluster
  kubectl -n submariner-operator get serviceimports
done

✔ Switched to context "london".
NAME                                                TYPE       IP    AGE
submariner-wka-amsterdam-coherence-test-amsterdam   Headless         1h
submariner-wka-coherence-test-london                Headless         1h
submariner-wka-london-coherence-test-london         Headless         1h
submariner-wka-newport-coherence-test-newport       Headless         1h
✔ Switched to context "newport".
NAME                                                TYPE       IP    AGE
submariner-wka-amsterdam-coherence-test-amsterdam   Headless         1h
submariner-wka-coherence-test-london                Headless         1h
submariner-wka-london-coherence-test-london         Headless         1h
submariner-wka-newport-coherence-test-newport       Headless         1h
✔ Switched to context "amsterdam".
NAME                                                TYPE       IP    AGE
submariner-wka-amsterdam-coherence-test-amsterdam   Headless         1h
submariner-wka-coherence-test-london                Headless         1h
submariner-wka-london-coherence-test-london         Headless         1h
submariner-wka-newport-coherence-test-newport       Headless         1h

Now that all the WKA services have been created and are discoverable, we can finally create the Coherence clusters. Let’s define them first by adding the exported WKA as a JVM argument:

apiVersion: coherence.oracle.com/v1
kind: Coherence
metadata:
  name: storage
  namespace: coherence-test
spec:
  replicas: 3
  readinessProbe:
    initialDelaySeconds: 30
  jvm:
    args:
      - "-Dcoherence.wka=submariner-wka-london.coherence-test.svc.clusterset.local,submariner-wka-newport.coherence-test.svc.clusterset.local,submariner-wka-amsterdam.coherence-test.svc.clusterset.local"

And create the first cluster in London:

kubectx london
kubectl apply -f coherence-cluster.yaml

Wait for all the pods to become ready:

$ kubectl -n coherence-test get pods -w

NAME        READY   STATUS    RESTARTS   AGE
storage-0   0/1     Running   0          40s
storage-1   0/1     Running   0          40s
storage-2   0/1     Running   0          40s

And look at the logs:

kubectl -n coherence-test logs -f storage-0

Notice how it says 3 members and they are all based in London:

MasterMemberSet(
  ThisMember=Member(Id=3, Timestamp=2023-03-28 13:29:50.443, Address=10.200.0.35:7575, MachineId=26505, Location=site:UK-LONDON-1-AD-2,rack:FAULT-DOMAIN-1,machine:10.0.100.212,process:36,member:storage-0, Role=storage)
  OldestMember=Member(Id=1, Timestamp=2023-03-28 13:29:47.019, Address=10.200.0.190:7575, MachineId=48457, Location=site:UK-LONDON-1-AD-2,rack:FAULT-DOMAIN-1,machine:10.0.66.192,process:37,member:storage-1, Role=storage)
  ActualMemberSet=MemberSet(Size=3
    Member(Id=1, Timestamp=2023-03-28 13:29:47.019, Address=10.200.0.190:7575, MachineId=48457, Location=site:UK-LONDON-1-AD-2,rack:FAULT-DOMAIN-1,machine:10.0.66.192,process:37,member:storage-1, Role=storage)
    Member(Id=2, Timestamp=2023-03-28 13:29:50.419, Address=10.200.0.36:7575, MachineId=26505, Location=site:UK-LONDON-1-AD-2,rack:FAULT-DOMAIN-1,machine:10.0.100.212,process:37,member:storage-2, Role=storage)
    Member(Id=3, Timestamp=2023-03-28 13:29:50.443, Address=10.200.0.35:7575, MachineId=26505, Location=site:UK-LONDON-1-AD-2,rack:FAULT-DOMAIN-1,machine:10.0.100.212,process:36,member:storage-0, Role=storage)
    )
  MemberId|ServiceJoined|MemberState|Version|Edition
    1|2023-03-28 13:29:47.019|JOINED|22.06.3|CE,
    2|2023-03-28 13:29:50.419|JOINED|22.06.3|CE,
    3|2023-03-28 13:29:50.443|JOINED|22.06.3|CE
  RecycleMillis=1200000
  RecycleSet=MemberSet(Size=0
    )
  )

Let’s now deploy a second Coherence cluster in Cardiff, wait for it to become ready and then look at the logs:

kubectx newport
✔ Switched to context "newport".

kubectl apply -f coherence-cluster.yaml
kubectl -n coherence-test get pods -w
kubectl -n coherence-test logs -f storage-0

We should now see 3 other members have joined:

MasterMemberSet(
  ThisMember=Member(Id=6, Timestamp=2023-03-28 13:32:20.576, Address=10.215.0.148:7575, MachineId=64308, Location=site:UK-CARDIFF-1-AD-1,rack:FAULT-DOMAIN-1,machine:10.15.114.98,process:37,member:storage-0, Role=storage)
  OldestMember=Member(Id=1, Timestamp=2023-03-28 13:29:47.019, Address=10.200.0.190:7575, MachineId=48457, Location=site:UK-LONDON-1-AD-2,rack:FAULT-DOMAIN-1,machine:10.0.66.192,process:37,member:storage-1, Role=storage)
  ActualMemberSet=MemberSet(Size=6
    Member(Id=1, Timestamp=2023-03-28 13:29:47.019, Address=10.200.0.190:7575, MachineId=48457, Location=site:UK-LONDON-1-AD-2,rack:FAULT-DOMAIN-1,machine:10.0.66.192,process:37,member:storage-1, Role=storage)
    Member(Id=2, Timestamp=2023-03-28 13:29:50.419, Address=10.200.0.36:7575, MachineId=26505, Location=site:UK-LONDON-1-AD-2,rack:FAULT-DOMAIN-1,machine:10.0.100.212,process:37,member:storage-2, Role=storage)
    Member(Id=3, Timestamp=2023-03-28 13:29:50.443, Address=10.200.0.35:7575, MachineId=26505, Location=site:UK-LONDON-1-AD-2,rack:FAULT-DOMAIN-1,machine:10.0.100.212,process:36,member:storage-0, Role=storage)
    Member(Id=4, Timestamp=2023-03-28 13:32:18.729, Address=10.215.0.33:7575, MachineId=35321, Location=site:UK-CARDIFF-1-AD-1,rack:FAULT-DOMAIN-1,machine:10.15.98.16,process:37,member:storage-1, Role=storage)
    Member(Id=5, Timestamp=2023-03-28 13:32:20.509, Address=10.215.0.149:7575, MachineId=64308, Location=site:UK-CARDIFF-1-AD-1,rack:FAULT-DOMAIN-1,machine:10.15.114.98,process:37,member:storage-2, Role=storage)
    Member(Id=6, Timestamp=2023-03-28 13:32:20.576, Address=10.215.0.148:7575, MachineId=64308, Location=site:UK-CARDIFF-1-AD-1,rack:FAULT-DOMAIN-1,machine:10.15.114.98,process:37,member:storage-0, Role=storage)
    )
  MemberId|ServiceJoined|MemberState|Version|Edition
    1|2023-03-28 13:29:47.019|JOINED|22.06.3|CE,
    2|2023-03-28 13:29:50.419|JOINED|22.06.3|CE,
    3|2023-03-28 13:29:50.443|JOINED|22.06.3|CE,
    4|2023-03-28 13:32:18.729|JOINED|22.06.3|CE,
    5|2023-03-28 13:32:20.509|JOINED|22.06.3|CE,
    6|2023-03-28 13:32:20.576|JOINED|22.06.3|CE
  RecycleMillis=1200000
  RecycleSet=MemberSet(Size=0
    )
  )

Let’s complete creating the 3rd Coherence cluster in Amsterdam:

kubectx amsterdam
✔ Switched to context "amsterdam".

kubectl apply -f coherence-cluster.yaml
kubectl -n coherence-test get pods -w
kubectl -n coherence-test logs -f storage-0

We should now have 9 members:

 MasterMemberSet(
  ThisMember=Member(Id=9, Timestamp=2023-03-28 13:33:31.239, Address=10.209.0.148:7575, MachineId=26366, Location=site:eu-amsterdam-1-AD-1,rack:FAULT-DOMAIN-1,machine:10.9.124.98,process:36,member:storage-0, Role=storage)
  OldestMember=Member(Id=1, Timestamp=2023-03-28 13:29:47.019, Address=10.200.0.190:7575, MachineId=48457, Location=site:UK-LONDON-1-AD-2,rack:FAULT-DOMAIN-1,machine:10.0.66.192,process:37,member:storage-1, Role=storage)
  ActualMemberSet=MemberSet(Size=9
    Member(Id=1, Timestamp=2023-03-28 13:29:47.019, Address=10.200.0.190:7575, MachineId=48457, Location=site:UK-LONDON-1-AD-2,rack:FAULT-DOMAIN-1,machine:10.0.66.192,process:37,member:storage-1, Role=storage)
    Member(Id=2, Timestamp=2023-03-28 13:29:50.419, Address=10.200.0.36:7575, MachineId=26505, Location=site:UK-LONDON-1-AD-2,rack:FAULT-DOMAIN-1,machine:10.0.100.212,process:37,member:storage-2, Role=storage)
    Member(Id=3, Timestamp=2023-03-28 13:29:50.443, Address=10.200.0.35:7575, MachineId=26505, Location=site:UK-LONDON-1-AD-2,rack:FAULT-DOMAIN-1,machine:10.0.100.212,process:36,member:storage-0, Role=storage)
    Member(Id=4, Timestamp=2023-03-28 13:32:18.729, Address=10.215.0.33:7575, MachineId=35321, Location=site:UK-CARDIFF-1-AD-1,rack:FAULT-DOMAIN-1,machine:10.15.98.16,process:37,member:storage-1, Role=storage)
    Member(Id=5, Timestamp=2023-03-28 13:32:20.509, Address=10.215.0.149:7575, MachineId=64308, Location=site:UK-CARDIFF-1-AD-1,rack:FAULT-DOMAIN-1,machine:10.15.114.98,process:37,member:storage-2, Role=storage)
    Member(Id=6, Timestamp=2023-03-28 13:32:20.576, Address=10.215.0.148:7575, MachineId=64308, Location=site:UK-CARDIFF-1-AD-1,rack:FAULT-DOMAIN-1,machine:10.15.114.98,process:37,member:storage-0, Role=storage)
    Member(Id=7, Timestamp=2023-03-28 13:33:29.57, Address=10.209.0.33:7575, MachineId=2499, Location=site:eu-amsterdam-1-AD-1,rack:FAULT-DOMAIN-1,machine:10.9.98.174,process:36,member:storage-1, Role=storage)
    Member(Id=8, Timestamp=2023-03-28 13:33:31.157, Address=10.209.0.149:7575, MachineId=26366, Location=site:eu-amsterdam-1-AD-1,rack:FAULT-DOMAIN-1,machine:10.9.124.98,process:36,member:storage-2, Role=storage)
    Member(Id=9, Timestamp=2023-03-28 13:33:31.239, Address=10.209.0.148:7575, MachineId=26366, Location=site:eu-amsterdam-1-AD-1,rack:FAULT-DOMAIN-1,machine:10.9.124.98,process:36,member:storage-0, Role=storage)
    )
  MemberId|ServiceJoined|MemberState|Version|Edition
    1|2023-03-28 13:29:47.019|JOINED|22.06.3|CE,
    2|2023-03-28 13:29:50.419|JOINED|22.06.3|CE,
    3|2023-03-28 13:29:50.443|JOINED|22.06.3|CE,
    4|2023-03-28 13:32:18.729|JOINED|22.06.3|CE,
    5|2023-03-28 13:32:20.509|JOINED|22.06.3|CE,
    6|2023-03-28 13:32:20.576|JOINED|22.06.3|CE,
    7|2023-03-28 13:33:29.57|JOINED|22.06.3|CE,
    8|2023-03-28 13:33:31.157|JOINED|22.06.3|CE,
    9|2023-03-28 13:33:31.239|JOINED|22.06.3|CE
  RecycleMillis=1200000
  RecycleSet=MemberSet(Size=0
    )
  )

More importantly, we can also see their locations and their pod IP addresses:

3 in London
3 in Cardiff
and 3 in Amsterdam

You might have noticed that in every region/OKE cluster, we are using the Coherence operator to create a new Coherence cluster. But we have also effectively deployed them as a global cluster that stretches over 3 Kubernetes clusters on OCI in 3 different regions. It’s helpful to think of the combination of Kubernetes and Coherence like a pod in Kubernetes. This has several theoretical advantages:

In Kubernetes, you can set compute and memory limits on pods. Likewise, you can set how big each cluster (I’ll refer to the combination of Kubernetes and Coherence by using cluster in italic) can be.
In Kubernetes, you can upgrade your application in the pod by deploying a newer version of the image. Likewise, you don’t need to do in-place upgrades of your cluster if you have to upgrade either the underlying Kubernetes or Coherence versions. All you need to do is deploy a new version of your cluster, let the the data sync and retire the old version. Effectively, your upgrade path looks like the following without any downtime:

Upgrading your Kubernetes and Coherence clusters

In Kubernetes, you can scale out by adding more pods to a cluster. Likewise, you can scale out by adding more clusters e.g. if your company is expanding internationally and you need to have your service closer to your new customers, you can just add more clusters.

In Kubernetes, when a pod dies, another one replaces it. Likewise, it’s not a big deal if you lose your cluster. The only difference is that in a Kubernetes, this is handled automatically for you whereas in for your clusters, we’ll need some higher level constructs to monitor the health of each cluster and provision new ones when necessary.

If you have several instances of the application globally, data read and write will be considerably faster as your application will use the local Coherence.
Potentially, this approach also lends itself well to multi-cloud and hybrid deployment. Since there’s a defined method to deploy Coherence into Kubernetes, the only thing that you need to figure out from a multi-cloud perspective is the connectivity. There’s now no need to fear outages in your AWS region anymore.

Different deployment architectures are also possible e.g. 1 cluster per region, multiple clusters per OCI region depending on your requirement. If certain regions have higher traffic, you can either deploy more clusters or bigger clusters.

I did say theoretical because we haven’t formally tested all these scenarios yet to truly ascertain said advantages and I have a particular aversion to recommend something without having actually tested it. But functionally, this works well.

We also haven’t thrown any kind of stress or soak tests at it yet to see how this type of distributed system behaves under duress. However, this post is long enough already. So, if you are interested in monitoring the environment, hop over to Part 2.

Summary

Customers are bringing increasingly challenging workloads to run on OCI and OKE. This article looks at how to deploy a stateful application such as Coherence using StatefulSets and Headless services on OKE, stretch across different OCI regions and Kubernetes clusters to improve high availability and disaster recovery capabilities.

With this, I would like to thank my colleagues Jonathan Knight and Julian for their contributions to this article.

Join the conversation on our Developer Slack! Or give OCI a spin with our Free Tier!