Cluster-Ception: Integration Testing Cluster Interconnection with Cluster API

Alessandro Olivero
The Liqo Blog
Published in
4 min readMay 4, 2023
Photo by Ash from Modern Afflatus on Unsplash

What is Liqo?

Liqo is an open-source project that enables seamless Kubernetes multi-cluster infrastructures. It allows users to create a single Kubernetes (virtual) cluster spanning multiple environments, such as public clouds, on-premises data centers, and edge environments. Liqo simplifies the management of workloads across multiple clusters and provides a secure, reliable way to access resources across different domains.

Why integration testing?

Integration testing is essential for Liqo, ensuring that the distributed Kubernetes clusters are interoperating correctly and that the user experience is consistent across all environments. Integration tests can help identify potential issues with the system, such as regressions during feature development or incompatibility with different Kubernetes distributions or CNIs.

The problem

At Liqo, we have several things to test to achieve our goal: that we are compatible with Kubernetes APIs at the control plane level, that our VPN solution is correctly set up at the cross-cluster level, and that we can correctly route the data plane traffic at the in-cluster level. For these reasons, tests are resource-hungry since requiring multiple full Kubernetes clusters with multiple nodes to be executed.

Previously, integration testing has been performed on large virtual machines running KinD (see next section), requiring constant processing power. This approach wastes resources, as the VMs consume a significant amount of energy and computing power and need to be allocated during the entire day. Fortunately, more efficient and cost-effective ways exist to perform integration testing, such as containerization or cloud-based testing solutions.

How we solved the problem

GitHub Action Runners Controller

The GitHub Action Runner Controller is a great way to limit resource consumption when running continuous integration (CI) tests running them inside Kubernetes Pods. Depending on the workload, these runners can scale up or down their number and resources used to run tests. This helps to ensure that resources are used effectively and that tests can be completed on time.

But… How to create our Kubernetes clusters? We explored two possible solutions: KinD and Cluster API.

KinD

Kubernetes in Docker is a system that allows you to deploy and run Kubernetes clusters inside Docker containers, using them as nodes, and, of course, it was the first choice to create clusters for our CI jobs fastly. But we met the first big stopper: KinD requires full-node permission to bootstrap clusters. And second, even if these permissions are granted, we had some not-well-identified issues where our CI pods were randomly killed in out-of-memory (OOM), although we had a lot of free memory in our physical nodes. These two reasons forced us to find a different way to go.

Cluster API

Cluster API is an open-source Kubernetes project that provides declarative APIs and tools to simplify the deployment, management, and operations of Kubernetes clusters. Cluster API is designed to be cloud-agnostic, allowing users to deploy, manage, and operate clusters on different cloud providers. However, Cluster API is not enough, we also need something else that provides the resources where to create our clusters.

We identified this component in KubeVirt, an open-source project that enables users to run virtual machines on top of Kubernetes clusters. It provides an API for managing virtual machines and storage. Hence, Cluster API can be used with KubeVirt to create, manage, and operate virtual machines on our bare-metal Kubernetes cluster. Since Cluster API can also be used to define the desired state of the virtual machines and KubeVirt will ensure that the desired state is maintained, we were able to use the above VMs to set up full Kubernetes clusters with KubeAdm in a few seconds, ready to be used for our integration tests.

With this new environment, we have clusters made of real Virtual Machines that behave very similarly to clusters you can find in any other cloud environment. More importantly, we had no more docker-in-docker-related issues!

Final solution including ClusterAPI-provided clusters with KubeVirt
Final solution including ClusterAPI-provided clusters with KubeVirt

Conclusion

In conclusion, Liqo’s integration testing with Cluster API provides a robust and cost-effective method for testing Liqo’s features. Liqo and Cluster API offer a powerful combination for efficiently testing and managing Kubernetes clusters.

We monitor in our Grafana dashboard how the CPU and memory utilization has changed with this new testing environment.

Daily consumption of memory for three runners. Each one runs two Kubernetes clusters. (6 k8s clusters in total)
Daily consumption of CPU for three runners. Each one runs two Kubernetes clusters. (6 k8s clusters in total)

As expected, the graphs show us that we have resource consumption only when we are performing tests; for the remaining part of the day, the consumption is zero. We had an average reduction of more than 50% in memory consumption during the day and more than 70% in CPU consumption, even if we moved from two nodes KinD clusters to three nodes full-fledged Kubernetes clusters.

Check out on Liqo GitHub repository to learn more about the project!

--

--