Deploying multiple Kubernetes cluster on Oracle Cloud

Ali Mukadam
Oracle Developers
Published in
8 min readJan 25, 2024

--

Photo by JJ Ying on Unsplash

In my previous article on deploying Cilium Cluster Mesh on OKE, I mentioned that I would write about using the Terraform OKE module to create the necessary infrastructure for multiple connected clusters.

Broadly speaking, there are two topologies you can employ when connecting Kubernetes clusters:

  1. Star topology
  2. Mesh topology

A star topology consists of a central cluster acting as a hub. All the other clusters are connected to the central cluster but not to each other.

A mesh topology is one where the clusters are connected to each other.

You can also have several permutations and combinations of the above e.g.

  • A hub with a VCN, Bastion and operator host only. The clusters are isolated from each other and there is no management cluster ClusterAPI-style in the hub.
  • A hub with a VCN, Bastion and operator host only. The clusters are connected to each other in the form of a mesh and there is no management cluster ClusterAPI-style in the hub.
  • A cluster in the hub, ClusterAPI-style with the workload clusters isolated.
  • A cluster in the hub, ClusterAPI-style with the workload clusters connected in a mesh.

To start with, let’s outline our needs in anticipation of the next Cilium Cluster Mesh installment when we’ll try to compare the Submariner + Flannel performance vs. Cilium’s Cluster Mesh:

  1. Three OKE clusters whose pods and worker nodes are able to communicate with each other. This means the clusters must be connected in a mesh.
  2. One bastion and one operator host from where we can configure and deploy all three clusters. We don’t want to run duplicate bastion and operator hosts for efficiency purposes so we’ll use one of the clusters as a de-facto hub.
  3. Public and private load balancers. Public load balancers will be used to make our workload accessible externally whereas private load balancers would be used to access some cross-cluster services e.g. DNS, Thanos etc.

The diagram below illustrates our three clusters connected in a mesh architecture.

Connecting the underlying infrastructure of 3 regions

To begin with, let’s create a map of clusters:

# variables.tf
variable "clusters" {
description = "A map of cidrs for vcns, pods and services for each region"
type = map(any)
default = {
c1 = { region = "paris", vcn = "10.1.0.0/16", pods = "10.201.0.0/16", services = "10.101.0.0/16", create_drg = true }
c2 = { region = "amsterdam", vcn = "10.2.0.0/16", pods = "10.202.0.0/16", services = "10.102.0.0/16", create_drg = true }
c3 = { region = "frankfurt", vcn = "10.3.0.0/16", pods = "10.203.0.0/16", services = "10.103.0.0/16", create_drg = true } }
}

To avoid hardcoding regions, let’s define a local variable that we can then use to look up the OCI region identifiers:

# locals.tf 
regions = {

# Europe
amsterdam = "eu-amsterdam-1"
frankfurt = "eu-frankfurt-1"
london = "uk-london-1"
madrid = "eu-madrid-1"
marseille = "eu-marseille-1"
milan = "eu-milan-1"
newport = "uk-cardiff-1"
paris = "eu-paris-1"
stockholm = "eu-stockholm-1"
zurich = "eu-zurich-1"

# Middle East

}

We can now also define our list of providers:

# home region needed for IAM
provider "oci" {
fingerprint = var.api_fingerprint
private_key_path = var.api_private_key_path
region = var.home_region
tenancy_ocid = var.tenancy_id
user_ocid = var.user_id
alias = "home"
ignore_defined_tags = ["Oracle-Tags.CreatedBy", "Oracle-Tags.CreatedOn"]
}

# other regions loaded by looking up the region identifier
provider "oci" {
fingerprint = var.api_fingerprint
private_key_path = var.api_private_key_path
region = local.regions["paris"]
tenancy_ocid = var.tenancy_id
user_ocid = var.user_id
alias = "paris"
ignore_defined_tags = ["Oracle-Tags.CreatedBy", "Oracle-Tags.CreatedOn"]
}

provider "oci" {
fingerprint = var.api_fingerprint
private_key_path = var.api_private_key_path
region = local.regions["amsterdam"]
tenancy_ocid = var.tenancy_id
user_ocid = var.user_id
alias = "amsterdam"
ignore_defined_tags = ["Oracle-Tags.CreatedBy", "Oracle-Tags.CreatedOn"]
}

We can now create our first cluster (“c1”) in Paris:

# Copyright (c) 2024 Oracle Corporation and/or its affiliates.
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl

module "c1" {

source = "oracle-terraform-modules/oke/oci"
version = "5.1.1"

home_region = var.home_region
region = lookup(local.regions, lookup(lookup(var.clusters, "c1"), "region"))

tenancy_id = var.tenancy_id

# general oci parameters
compartment_id = var.compartment_id

# ssh keys
ssh_private_key_path = var.ssh_private_key_path
ssh_public_key_path = var.ssh_public_key_path

# networking
create_drg = true
drg_display_name = "c1"

remote_peering_connections = {
for k, v in var.clusters : "rpc-to-${k}" => {} if k != "c1"
}

nat_gateway_route_rules = [
for k, v in var.clusters :
{
destination = lookup(v, "vcn")
destination_type = "CIDR_BLOCK"
network_entity_id = "drg"
description = "Routing to allow connectivity to ${title(k)} cluster"
} if k != "c1"
]

vcn_cidrs = [lookup(lookup(var.clusters, "c1"), "vcn")]
vcn_dns_label = "c1"
vcn_name = "c1"

#subnets
subnets = {
bastion = { newbits = 13, netnum = 0, dns_label = "bastion" }
operator = { newbits = 13, netnum = 1, dns_label = "operator" }
cp = { newbits = 13, netnum = 2, dns_label = "cp" }
int_lb = { newbits = 11, netnum = 16, dns_label = "ilb" }
pub_lb = { newbits = 11, netnum = 17, dns_label = "plb" }
workers = { newbits = 2, netnum = 1, dns_label = "workers" }
pods = { newbits = 2, netnum = 2, dns_label = "pods" }
}

# bastion host
create_bastion = true
bastion_allowed_cidrs = ["0.0.0.0/0"]
bastion_upgrade = false

# operator host
create_operator = true
operator_upgrade = false
create_iam_resources = true
create_iam_operator_policy = "always"
operator_install_k9s = true

# oke cluster options
cluster_name = "c1"
cluster_type = var.cluster_type
cni_type = var.preferred_cni
control_plane_is_public = var.oke_control_plane == "public"
control_plane_allowed_cidrs = [local.anywhere]
kubernetes_version = var.kubernetes_version
pods_cidr = lookup(lookup(var.clusters, "c1"), "pods")
services_cidr = lookup(lookup(var.clusters, "c1"), "services")


# node pools
allow_worker_ssh_access = true
kubeproxy_mode = "iptables"
worker_pool_mode = "node-pool"
worker_pools = var.nodepools
worker_cloud_init = var.worker_cloud_init
worker_image_type = "oke"

# oke load balancers
load_balancers = "both"
preferred_load_balancer = "public"

allow_rules_internal_lb = merge(
{
for c in var.clusters : format("Allow TCP ingress from cluster %v", c) => {
protocol = local.tcp_protocol, port= -1, source = lookup(c, "vcn"), source_type = local.rule_type_cidr,
} if c != "c1"
},
{
for c in var.clusters : format("Allow UDP ingress from cluster %v", c) => {
protocol = local.udp_protocol, port =53, source = lookup(c, "vcn"), source_type = local.rule_type_cidr,
} if c != "c1"
}
)

allow_rules_public_lb = {
for p in local.public_lb_allowed_ports :

format("Allow ingress to port %v", p) => {
protocol = local.tcp_protocol, port = p, source = "0.0.0.0/0", source_type = local.rule_type_cidr,
}
}

user_id = var.user_id

providers = {
oci = oci.paris
oci.home = oci.home
}
}

Let’s break the above down, with a particular attention on a couple of new features we recently added:

create_drg       = true
drg_display_name = "c1"

remote_peering_connections = {
for k, v in var.clusters : "rpc-to-${k}" => {} if k != "c1"
}

nat_gateway_route_rules = [
for k, v in var.clusters :
{
destination = lookup(v, "vcn")
destination_type = "CIDR_BLOCK"
network_entity_id = "drg"
description = "Routing to allow connectivity to ${title(k)} cluster"
} if k != "c1"
]

We implement each cluster by reusing the entire module and since we want them connected privately, we’ll create the Dynamic Routing Gateway in each. We want the clusters to be connected in a mesh topology instead of star so we loop over the clusters variable and create a Remote Peering Connection for each key-value pair except for itself. Similarly, we need to establish routing between worker nodes and pods in different regions.

As we want to use Paris as the de-facto hub, we also create the bastion and operator host there:

  # bastion host
create_bastion = true

# operator host
create_operator = true
create_iam_resources = true
create_iam_operator_policy = "always"
operator_install_k9s = true

We also want pods to communicate over TCP and in the case of headless service, perform cross-cluster DNS lookup:

  allow_rules_internal_lb = merge(
{
for c in var.clusters : format("Allow TCP ingress from cluster %v", c) => {
protocol = local.tcp_protocol, port= -1, source = lookup(c, "vcn"), source_type = local.rule_type_cidr,
} if c != "c1"
},
{
for c in var.clusters : format("Allow UDP ingress from cluster %v", c) => {
protocol = local.udp_protocol, port =53, source = lookup(c, "vcn"), source_type = local.rule_type_cidr,
} if c != "c1"
}
)

Lastly, we need to let the c1 module know which provider it’s going to use:

  providers = {
oci = oci.paris
oci.home = oci.home
}

Dynamic loading of provider is unfortunately not possible yet in Terraform so we have to explicitly provide them.

We can now create the c2 (Amsterdam) and c3 (Frankfurt) clusters in a similar way by doing just a few minimal changes. We don’t need bastion and operator in c2 and c3 and we only to change references from c1 to c2 and c3 respectively.

Run terraform apply and 3 OKE clusters along with the necessary resources (VCN, subnets, DRGs, RPCs etc) will be created.

Once the terraform run is completed, all you need to do is log into the OCI console and establish the connections between the various RPCs. At the beginning, in each region, you’ll see two unpeered RPCs:

Copy the RPC ocid of rpc-to-c2 and switch to c2’s region, in this case: Amsterdam. The OCI Console will switch back to the DRG page. Navigate your way into the DRG > rpc-to-c1 and establish a connection:

Repeat the above steps to establish connections between c2 and c3 and between c3 and c1. In the OCI Network Visualizer, you should be able to see the VCN in each region connected:

When all RPC connections have been established, ssh to the operator and check if you have connectivity:

for c in c1 c2 c3; do
kubectx $c
kubectl get nodes
done

You should now be able to see the following:

Switched to context "c1".
NAME STATUS ROLES AGE VERSION
10.1.103.134 Ready node 43m v1.27.2
10.1.65.39 Ready node 43m v1.27.2
Switched to context "c2".
NAME STATUS ROLES AGE VERSION
10.2.102.41 Ready node 43m v1.27.2
10.2.125.168 Ready node 43m v1.27.2
Switched to context "c3".
NAME STATUS ROLES AGE VERSION
10.3.118.128 Ready node 38m v1.27.2
10.3.75.61 Ready node 38m v1.27.2

Conclusion

In this article, we showed how the Terraform module for OKE can be used and reused to simultaneously create multiple clusters in different regions and how they can all be controlled from a single operator host. We showed how the infrastructure heavy lifting has been done for you and how you can quickly connect the VCNs of these clusters in either a star or a mesh topology. In order to ensure connectivity, set up the required routing and security rules.

I hope you find this article useful.

--

--