Syncing Objects from one Kubernetes cluster to another Kubernetes cluster

Takumi Yanagawa
11 min readMay 1, 2023

--

About KubeSteller Syncer developed in KubeSteller project

By Takumi Yanagawa, Yuji Watanabe

Introduction

Recently, Edge computing is becoming a more popular way for organizations to deploy and manage their IT infrastructure. Increasing the number of edge gateway and edge compute nodes, it becomes more complex to manage them due to the following restrictions unique to edge (referring to the blog posted by Andy Anderson.)

  1. Disconnected Operations
  2. Constrained Edge Clusters
  3. Scalability with Large Numbers of Small Edge Clusters
  4. Interoperability
  5. Complexity of Deployments
  6. Consistent Deployment Interfaces

We are challenging how to efficiently manage a large number of edge gateways and edge compute nodes while addressing these restrictions in KubeStellar project. Especially, we are focusing on such an edge cluster as Kuberetens is running on and we call this kind of edge “edge location” in this blog.

KubeStellar project

KubeStellar project consists of Edge-MultiCluster (Edge-MC) that is the code of platform and module sets for supporting management of a large number of edge locations. In the Edge-MC, we have adapted Kubernetes-based technologies because Kubernetes API machinery is a good foundation for creating reliable distributed systems. To manage the edge locations, we utilize Kubernetes-like Control Plane (KCP), which is a pure Kubernetes control plane where container runtime and deployment controller do not exist.

KCP can host multiple Kubernetes control planes and each control plane is called a “workspace”. First of all, we would like to show the rough schematic diagram in Figure 1. The full picture comes in a later section. In Users define workloads, each is a collection of Kubernetes API objects. Users create additional API objects that describe which workloads go where. There is a workspace for each edge cluster and the workloads are projected into the relevant workspaces. The workload in the workspace is synced to the associated edge location by KubeStellar Syncer. The KubeStellar Syncer also syncs back generated Kubernetes objects in edge location to the workspace. Edge-MC service provides a mechanism for users to be able to collect synced back objects.

Figure 1. Schematic diagram of Edge-MC. Workload is a set of Kubernetes objects such as deployments and configmaps. Result is also a set of Kubernetes objects born in edge location.

KubeStellar Syncer

To sync Kubernetes objects between workspace and edge location, we developed a new type of syncing agent “KubeStellar Syncer”. Since the edge location is Kubernetes and the workspace presents the same interface as a Kubernetes, KubeStellar Syncer can be generalized so that it’s not specific to KCP but can be used for syncing Kubernetes objects between any Kubernetes clusters.

Figure 2 shows what operations are required in KubeStellar Syncer. The requirements are to downsync workloads such as deployment and configmap from workspace to edge location downstream, upsync objects generated in edge location to workspace, and update workload statuses to the workspace.

Figure 2. Required operations in KubeStellar Syncer

How to use KubeStellar Syncer

Let’s briefly introduce the developed KubeStellar Syncer. The KubeStellar Syncer runs in a regular Kubernetes Deployment in the edge location. The minimum required parameters are connection information for both Kubernetes clusters (a.k.a. kubeconfig). We developed a command line tool to generate the manifest yaml to bootstrap KubeStellar Syncer in https://github.com/yana1205/kcp.

The command line tool creates the identity and authorization on a workspace that you want to sync, and produces yaml manifests to install KubeStellar Syncer on the downstream Kubernetes cluster. The command line tool runs with a kubeconfig that manipulates the workspace as follows.

KUBECONFIG=workspace.kubeconfig kubectl kcp workload edge-sync cluster1 --syncer-image quay.io/kcpedge/syncer:dev-2023-04-18 -o edge-symcer.yaml

KubeStellar Syncer is installed by applying the generated manifests (edge-syncer.yaml) to the downstream Kubernetes cluster.

KUBECONFIG=edge-location.kubeconfig kubectl apply -f edge-symcer.yaml

KubeStellar Syncer watches the SyncerConfig object in the workspace cluster, which describes what resources are downsynced and what resources are upsynced. Here is the sample SyncerConfig.

apiVersion: edge.kcp.io/v1alpha1
kind: SyncerConfig
metadata:
name: syncer-config
spec:
namespaceScope:
namespaces:
- ns1
resources:
- apiVersion: v1
group: ""
resource: configmaps
- apiVersion: v1
group: apps
resource: deployments
clusterScope:
- apiVersion: v1
group: apiextensions.k8s.io
resource: customresourcedefinitions
objects:
- samples.my.domain
upsync:
- apiGroup: my.domain
resources:
- samples
namespaces:
- ns1

KubeStellar Syncer interprets it as follows:

  • Downsync any configmaps, and deployments in namespace ns1, and CRD named samples.my.domain from upstream to downstream
  • Upsync any samples (samples.my.domain) in namespace ns1 from downstream to upstream

Once this SyncerConfig is deployed to the upstream cluster, these resource objects will be synced.

For example, let’s try to deploy this workload manifest to the upstream cluster.

KUBECONFIG=upstream.kubeconfig kubectl apply -f - << EOF
apiVersion: v1
kind: Namespace
metadata:
name: ns1
---
apiVersion: v1
kind: ConfigMap
metadata:
name: sample-configmap
namespace: ns1
data:
id: abc
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: sample-deploy
namespace: ns1
spec:
replicas: 1
selector:
matchLabels:
app: busybox
template:
metadata:
labels:
app: busybox
spec:
containers:
- name: busybox
image: busybox
args: ["tail", "-f", "/dev/null"]
terminationGracePeriodSeconds: 3
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: samples.my.domain
spec:
group: my.domain
names:
kind: Sample
listKind: SampleList
plural: samples
singular: sample
scope: Namespaced
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
properties:
apiVersion:
type: string
kind:
type: string
metadata:
type: object
spec:
properties:
foo:
type: string
type: object
type: object
served: true
storage: true
---
EOF

Let’s check them getting downsynced to downstream

$ KUBECONFIG=downstream.kubeconfig kubectl get cm,deploy,crd,sample -n ns1

NAME DATA AGE
configmap/kube-root-ca.crt 1 1s
configmap/sample-configmap 1 1s

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/sample-deploy 0/1 1 0 1s

NAME CREATED AT
customresourcedefinition.apiextensions.k8s.io/samples.my.domain 2023-04-21T06:55:22Z

And, let’s try out the upsync feature by deploying a sample CR to downstream.

KUBECONFIG=downstream.kubeconfig kubectl apply -f - << EOL
apiVersion: my.domain/v1alpha1
kind: Sample
metadata:
name: sample
namespace: ns1
spec:
foo: bar
EOL

The sample CR will be upsynced to upstream.

$ KUBECONFIG=upstream.kubeconfig kubectl sample -n ns1
NAME AGE
sample.my.domain/sample 50s

KubeStellar Syncer in Edge-MC

So far we have talked one-to-one sync. This can be extended to one-to-many sync by Edge-MC platform. Edge-MC platform is an infrastructure to manage a large number of edge locations. Using the Edge-MC platform, you can easily distribute workloads to multiple clusters and collect the results from them.

In Edge-MC, instead of creating SyncerConfig in the previous section, EdgePlacement is used. The EdgePlacement describes what Kubernetes resources are downsynced and upsynced, and what edge location they should go. Once the user deploys workloads with EdgePlacement object to a workspace, then Edge-MC automatically distributes the workload to multiple edge locations. We’ll briefly introduce what Edge-MC does behind the scene.

Figure 3 is the detailed diagram about how Edge-MC distributes workloads to each workspace with EdgeSyncer. In Edge-MC, there are two major kinds of workspaces. One is workload management workspace and the other is mailbox workspace. Users deploy workloads and EdgePlacement to workload management workspace. Edge-mc maintains the SyncerConfig objects that are derived from the EdgePlacement objects and workload objects. Edge-MC finally distributes the selected workloads with SyncerConfig to workspace synced with edge location by EdgeSyncer. That workspace is the mailbox workspace.

Figure 3. The detail diagram about how Edge-MC distributes workloads to each workspace

Edge-MC use case: Automate compliance check on multiple clusters

Leveraging Edge-MC, you can automate compliance check of multiple clusters. In Kubernetes world, there are tools like Kyverno, OPA Gatekeeper, and Compliance Operator. Basic usage of these tools is to install the tools by Helm and deploy their own Kubernetes resource objects called “policy” in the cluster. Then, the tool starts to scan, continually audits the Kubernetes cluster, and generates reports. Isn’t Edge-MC well fitting with this use case? I will show a short introduction.

Now let’s assume that we have 2 edge locations, both of which are installed with KubeStellar Syncer and associated with a mailbox workspace, following by Edge-MC tutorial. In the shell commands in all the following steps it is assumed that kcp is running and $KUBECONFIG is set to the .kcp/admin.kubeconfig that kcp produces, except where explicitly noted that physical clusters is being accessed. We want to install Kyverno on both edge locations and deploy Kyverno policies to both locations. Figure 4 is the diagram of that.

Figure 4. Enable Kyverno and deploy Kyverno policies on multiple edge locations by Edge-MC

Firstly, we deploy Kyverno install manifests and Kyverno policies as workload to workload management workspace. The Kyverno install manifests are deployed by Helm so we utilize it as follows.

$ kubectl ws root:wmw
Current workspace is "root:wmw".
$ helm install kyverno --set replicaCount=1 --namespace kyverno --create-namespace kyverno/kyverno
NAME: kyverno
LAST DEPLOYED: Fri Apr 21 17:11:12 2023
NAMESPACE: kyverno
STATUS: deployed
REVISION: 1
NOTES:
Chart version: 2.6.5
Kyverno version: v1.8.5

You can check Kubernetes objects are deployed to the workspace.

$ kubectl get ns,deploy,crd -n kyverno
NAME STATUS AGE
namespace/default Active 55s
namespace/kyverno Active 28s

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kyverno 0/1 0 0 28s

NAME CREATED AT
customresourcedefinition.apiextensions.k8s.io/admissionreports.kyverno.io 2023-04-24T09:19:07Z
customresourcedefinition.apiextensions.k8s.io/backgroundscanreports.kyverno.io 2023-04-24T09:19:07Z
customresourcedefinition.apiextensions.k8s.io/clusteradmissionreports.kyverno.io 2023-04-24T09:19:07Z
customresourcedefinition.apiextensions.k8s.io/clusterbackgroundscanreports.kyverno.io 2023-04-24T09:19:07Z
customresourcedefinition.apiextensions.k8s.io/clusterpolicies.kyverno.io 2023-04-24T09:19:07Z
customresourcedefinition.apiextensions.k8s.io/clusterpolicyreports.wgpolicyk8s.io 2023-04-24T09:19:07Z
customresourcedefinition.apiextensions.k8s.io/generaterequests.kyverno.io 2023-04-24T09:19:07Z
customresourcedefinition.apiextensions.k8s.io/policies.kyverno.io 2023-04-24T09:19:07Z
customresourcedefinition.apiextensions.k8s.io/policyreports.wgpolicyk8s.io 2023-04-24T09:19:07Z
customresourcedefinition.apiextensions.k8s.io/updaterequests.kyverno.io 2023-04-24T09:19:07Z

Secondly, we have to create EdgePlacement so that these resources get delivered to edge locations. All resources to be deployed by Helm are obtained by “--dryrun” option in Helm command. We use the go program to extract the resource lists and create EdgePlacement against the output of the “ — dryrun” as follows.

git clone -b helmconverter https://github.com/yana1205/edge-mc.git
cd edge-mc
helm template kyverno --set replicaCount=1 --namespace kyverno --create-namespace kyverno/kyverno --dry-run --debug > /tmp/kyverno-helm-install.yaml
go run cmd/syncer/helm/converter.go --path-to-helm-template /tmp/kyverno-helm-install.yaml
apiVersion: edge.kcp.io/v1alpha1
kind: EdgePlacement
metadata:
name: edge-placement
spec:
locationSelectors:
- matchLabels: {"env":"prod"}
namespaceSelector:
matchLabels: {"name":"kyverno"}
nonNamespacedObjects:
- apiGroup: apiextensions.k8s.io
resources:
- customresourcedefinitions
resourceNames:
- admissionreports.kyverno.io
- backgroundscanreports.kyverno.io
- clusteradmissionreports.kyverno.io
- clusterbackgroundscanreports.kyverno.io
- clusterpolicies.kyverno.io
- clusterpolicyreports.wgpolicyk8s.io
- generaterequests.kyverno.io
- policies.kyverno.io
- policyreports.wgpolicyk8s.io
- updaterequests.kyverno.io
- apiGroup: rbac.authorization.k8s.io
resources:
- clusterroles
resourceNames:
- kyverno:admin-policies
- kyverno:admin-policyreport
- kyverno:admin-reports
- kyverno:admin-generaterequest
- kyverno:admin-updaterequest
- kyverno
- kyverno:userinfo
- kyverno:policies
- kyverno:view
- kyverno:generate
- kyverno:events
- kyverno:webhook
- apiGroup: rbac.authorization.k8s.io
resources:
- clusterrolebindings
resourceNames:
- kyverno
- apiGroup: kyverno.io
resources:
- clusterpolicies
resourceNames:
- "*"
- apiGroup: apis.kcp.io
resources:
- apibindings
resourceNames:
- "bind-kube"
upsync:
- apiGroup: wgpolicyk8s.io
resources:
- policyreports
- clusterpolicyreports
namespaces:
- "*"
names:
- "*"

This covers all resources for Kyverno. The detail spec is described in EdgePlacement schema.

Once it’s deployed to the workload management workspace, the workloads are started to be distributed.

Let’s check if the Kyverno is installed in each edge location or not.

$ KUBECONFIG=~/.kube/config kubectl get --context kind-florin pod -n kyverno
NAME READY STATUS RESTARTS AGE
kyverno-7c444878f7-zlzdp 1/1 Running 0 93s
$ KUBECONFIG=~/.kube/config kubectl get --context kind-guilder pod -n kyverno
NAME READY STATUS RESTARTS AGE
kyverno-7c444878f7-rlsp8 1/1 Running 0 2m18s

Now Kyverno is enabled in each edge location. Let’s deploy the Kyverno policy to the workload management workspace to apply the policy to each edge location.

kubectl apply -f - << EOL
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: sample-cluster-policy
spec:
background: true
validationFailureAction: enforce
rules:
- name: sample-cluster-check-for-labels-in-configmap
match:
any:
- resources:
kinds:
- ConfigMap
validate:
message: "label 'app.kubernetes.io/name' is required"
pattern:
metadata:
labels:
app.kubernetes.io/name: "?*"
EOL

Let’s check if the policy is deployed to edge locations.

$ KUBECONFIG=~/.kube/config kubectl get --context kind-florin clusterpolicies 
NAME BACKGROUND VALIDATE ACTION READY
sample-cluster-policy true enforce true
$ KUBECONFIG=~/.kube/config kubectl get --context kind-guilder clusterpolicies
NAME BACKGROUND VALIDATE ACTION READY
sample-cluster-policy true enforce true

It’s done. Let’s see the reports generated by Kyverno too.

$ KUBECONFIG=~/.kube/config kubectl get --context kind-florin policyreports -A
NAMESPACE NAME PASS FAIL WARN ERROR SKIP AGE
default cpol-sample-cluster-policy 0 1 0 0 0 3m16s
kcp-edge-syncer-florin-1zcapm34 cpol-sample-cluster-policy 0 1 0 0 0 3m16s
kube-node-lease cpol-sample-cluster-policy 0 1 0 0 0 3m16s
kube-public cpol-sample-cluster-policy 0 2 0 0 0 3m16s
kube-system cpol-sample-cluster-policy 0 6 0 0 0 3m16s
kyverno cpol-sample-cluster-policy 2 1 0 0 0 3m16s
local-path-storage cpol-sample-cluster-policy 0 2 0 0 0 3m16s
$ KUBECONFIG=~/.kube/config kubectl get --context kind-guilder policyreports -A
NAMESPACE NAME PASS FAIL WARN ERROR SKIP AGE
default cpol-sample-cluster-policy 0 1 0 0 0 3m36s
kcp-edge-syncer-guilder-1yenui3z cpol-sample-cluster-policy 0 1 0 0 0 3m36s
kube-node-lease cpol-sample-cluster-policy 0 1 0 0 0 3m36s
kube-public cpol-sample-cluster-policy 0 2 0 0 0 3m36s
kube-system cpol-sample-cluster-policy 0 6 0 0 0 3m36s
kyverno cpol-sample-cluster-policy 2 1 0 0 0 3m36s
local-path-storage cpol-sample-cluster-policy 0 2 0 0 0 3m36s

Let’s check if these reports are upsynced to mailbox workspaces. This needs several steps.

  1. Find to mailbox workspaces
$ kubectl ws root:espw
Current workspace is "root:espw".
$ kubectl get Workspace -o "custom-columns=NAME:.metadata.name,SYNCTARGET:.metadata.annotations['edge\.kcp\.io/sync-target-name']"
NAME SYNCTARGET
root-mb-89b4f0d7-b6fc-4ff1-89c2-2ee59d790df6 florin
root-mb-9ad9315f-4975-4f80-a261-19b49acef2e0 guilder

2. Go to either workspace

$ kubectl ws root-mb-89b4f0d7-b6fc-4ff1-89c2-2ee59d790df6
Current workspace is "root:espw:root-mb-89b4f0d7-b6fc-4ff1-89c2-2ee59d790df6" (type root:universal).

3. View policy reports

$ kubectl get policyreports -A
NAMESPACE NAME PASS FAIL WARN ERROR SKIP AGE
default cpol-sample-cluster-policy 0 1 0 0 0 5m10s
kcp-edge-syncer-florin-1zcapm34 cpol-sample-cluster-policy 0 1 0 0 0 5m10s
kube-node-lease cpol-sample-cluster-policy 0 1 0 0 0 5m10s
kube-public cpol-sample-cluster-policy 0 2 0 0 0 5m10s
kube-system cpol-sample-cluster-policy 0 6 0 0 0 5m10s
kyverno cpol-sample-cluster-policy 2 1 0 0 0 5m10s
local-path-storage cpol-sample-cluster-policy 0 2 0 0 0 5m10s

4. They are there. Let’s check another mailbox too.

$ kubectl ws root:espw
Current workspace is "root:espw".
$ kubectlws root-mb-9ad9315f-4975-4f80-a261-19b49acef2e0
Current workspace is "root:espw:root-mb-9ad9315f-4975-4f80-a261-19b49acef2e0" (type root:universal).
$ kubectl get policyreports -A
NAMESPACE NAME PASS FAIL WARN ERROR SKIP AGE
default cpol-sample-cluster-policy 0 1 0 0 0 5m30s
kcp-edge-syncer-guilder-1yenui3z cpol-sample-cluster-policy 0 1 0 0 0 5m30s
kube-node-lease cpol-sample-cluster-policy 0 1 0 0 0 5m30s
kube-public cpol-sample-cluster-policy 0 2 0 0 0 5m30s
kube-system cpol-sample-cluster-policy 0 6 0 0 0 5m30s
kyverno cpol-sample-cluster-policy 2 1 0 0 0 5m30s
local-path-storage cpol-sample-cluster-policy 0 2 0 0 0 5m30s

Perfect!

This is one of the Compliance-To-Policy project activities, where the above processes are automated by C2P operator. If you are interesting in it, feel free to contact me or check out Compliance-To-Policy. We also have an incredible movie prepared for an end-to-end demonstration. I highly encourage you to check it out!

Continuous compliance checking by Kyverno seamlessly spanning multi-clusters with KubeStellar

Conclusion

In this blog post, we discuss what’s the KubeStellar Syncer and how it’s utilized behind the scenes of Edge-MC platform for a large number of Edge locations.

The KubeStellar Syncer has been actively developed and we will continue to share our activity and the results in future blogs.

This blog is part of a series of posts from the KubeStellar community regarding challenges related to multi-cloud and edge. You can learn more about the challenges and read posts from other members of the KubeStellar community on edge and multi-cloud topics: Navigating the Edge: Overcoming Multi-Cloud Challenges in Edge Computing with KubeStellar (by Andy Anderson); Seven Ways to Stub Your Toes on The Edge. (by Mike Spreitzer); Toward Building a Kubernetes Control Plane for the Edge (by Paolo Dettori). You can also join our community by attending our bi-weekly meetings.

--

--

Takumi Yanagawa

Software Developer at IBM Research - Tokyo. Views are my own.