Mastering Kubernetes: Journey with Cluster API

Published in

hepsiburadatech

7 min readJan 17, 2024

Introduction

Kubernetes, although it offers a strong and flexible platform for managing containers, also presents specific challenges, particularly in on-premise environments and when you’re tasked with managing hundreds of clusters. Let’s talk about how at Hepsiburada, we efficiently manage hundreds of Kubernetes clusters that directly handle about 95% of our over 100 million monthly visitor traffic. We’ll delve into the complexities of managing multiple clusters and discuss the strategies we employ to tackle these challenges.

We looked closely at how much work and operational effort it took to keep up with the hundreds of Kubernetes clusters we had. We realized we needed to change our approach and find a solution using our engineering skills and open-source projects.

This led us on a challenging but fun journey, which ended with us creating our own Kubernetes Engine.

Is Multi Cluster Management Nightmare?

Let’s explain what multi-cluster management is and its challenges.

Setup: There are always new teams, squads, and projects in our developer groups. When we set up a new Kubernetes cluster, using Terraform for instances and running Kubespray playbooks takes a lot of time.

Upgrade: As the number of clusters grows, we need to constantly update them to stay current. At one point, even if we are updating one cluster per day, the first one we have updated would be more than a year old!

Scalability: We’re use GKE clusters for some projects. GKE’s autoscaling feature works great for handle large traffic and than keeping costs down, but it’s impossible to do this with on-premise vanilla Kubernetes. Especially during big sales periods like Black Friday, we need to scale up and down all the time. In short, managing resources in large settings gets complex.

Configuration and Version Management: Keeping installations, versions, and configurations of standard apps in clusters consistent and up to date is crucial.

Error Management and Sustainability: Keeping problems in one cluster from affecting others and maintaining overall system health is a big challenge. In large environments, an error can happen in any cluster at any time, and we have to be ready for that.

These are just some of the many challenges you might face so yes, managing hundreds of clusters can become a nightmare if you don’t change strategy to multi-cluster management tools.

Solution With Cluster API ❤

Our main focus has always been on solutions that are declarative, Kubernetes-style, and open-source. It was at this point that we came across Cluster API.

Cluster API is a project within the Kubernetes ecosystem that provides a standard for managing Kubernetes clusters using Kubernetes itself. This project aims to automate cluster lifecycle management processes using Kubernetes’ own APIs and model. Cluster API extends Kubernetes’ “infrastructure as code” approach by simplifying and standardizing processes like creating, updating, upgrading, and deleting clusters.

In essence, Cluster API runs as an operator on Kubernetes.. It works with various providers. We have set up a cross-platform, multi-cluster management infrastructure using the OpenStack (CAPO), vSphere (CAPV), and Kubeadm providers.

Hepsiburada Kubernetes Engine (HKE)

Cluster API provides a standard for managing Kubernetes clusters. However, this doesn’t mean it can be used in its raw form in a structure where you manage hundreds or thousands of clusters. Therefore, by combining a total of six open-source projects, including the Cluster API project, and adding our own developed software, we created a platform and named it Hepsiburada Kubernetes Engine. Here are all the features.

Declarative Management: Those who have previously worked with Cluster API know that standard workload clusters can be created on the management cluster with a command like “clusterctl generate cluster”. However, to make this process declarative and manage it according to the Infrastructure as Code (IAC) philosophy, we wrote a Helm Chart for the all Cluster API resource templates. Using ArgoCD, we enabled various operations like new cluster setup, scaling, adding and removing node pools, to be executed according to the IAC approach with just a commit to a repository.

Auto Provisioning: Using the Cluster API provider OpenStack (CAPO) and Cluster API provider vSphere (CAPV) projects, we have gained the ability to automatically create and delete instances in virtualization environments (vSphere, OpenStack) and continuously monitor their health status.
Auto Upgrade: Using a structure similar to the rolling update mechanism between Kubernetes versions, it first adds a new node with the latest version to the cluster and then takes each node into maintenance mode (drain) one by one, providing an uninterrupted zero downtime automatic upgrade feature.
Auto Healing: When something goes wrong in control plane and worker nodes, it automatically intervenes by adding a new node with the same configuration and removing the old one.
Auto Scaling: By using the Cluster Autoscaler project, when Kubernetes resources become insufficient, it calculates the required number of nodes and automatically adds them to the cluster within seconds.
Auto Discovering: Through our self-developed and open-sourced software, which we’ve integrated into Kubernetes components, we dynamically learn about Kubernetes resources. This includes discovering their health status, creating our inventory, and ensuring the IPs of worker nodes in the upstreams of Envoy load balancers are updated in response to potential autoscaling scenarios. In summary, by developing our own internal Load Balancer platform, we enable auto scalable Kubernetes nodes to be automatically added behind the Load Balancer, functioning similarly to Google Cloud Load Balancer or Amazon Elastic Load Balancing.
Native Networking: We’ve replaced Kubernetes’ default component for handling internal and external traffic, the kube-proxy service which is based on nginx, by adopting the Kubernetes without Kubeproxy approach. By leveraging eBPF, network traffic down to the Linux kernel level, enabling enhancements such as sidecar-less service mesh, network monitoring, troubleshooting, security, and performance improvements.
Event Logging: Using the Kubernetes Event Exporter project, we automatically gather events from all Kubernetes clusters in our infrastructure, including newly established ones, into a centralized Elasticsearch. This allows us to track past internal events within the clusters.

Challenges

Under normal circumstances, Cluster API project tests in development environments -like docker provider- proceed smoothly and without issues. However, when you try to integrate it with experimental and complex virtualization platforms that are using a provider, some challenges come along with it.

I should first mention that many of the challenges we experienced on the OpenStack side can be attributed to our use of OpenStack with BGP and our use of DVR Routers.

Octavia: CAPO creating an Octavia LB and using a tenant network as LB network. The Virtual IP which allocated to the LB is not advertised via BGP. After some source code investigation we have seen that the port owner must be “compute:nova” and port must be attached to an instance. But these are not possible for octavia case.
DVR Router: We noticed that if you add an allowed address pair to a neutron port, the DVR router will receive a permanent ARP entry for the IP configured in the allowed address pair. What I have noticed is that the permanent ARP entry learned by the DVR router will be of the latest updated allowed address pair.
For example, if you add allowed address pair with IP X.X.X.X to neutron port 1, the DVR router will have a permanent ARP entry for IP X.X.X.X with the MAC address of neutron port 1.
Then, if you add the same IP X.X.X.X as an allowed address pair to neutron port 2, the DVR router will now have a permanent ARP entry for IP X.X.X.X with MAC address of neutron port 2.
In a way it makes sense since you cannot have two ARP entries for the same IP address but the problem that can occur is that the actual VIP could be on neutron port 1.
We reported this issue to the Octavia and Neutron community. To solve this problem, we developed a software that detects any access issues with any Octavia Load Balancer VIP on OpenStack. It then executes a series of API requests to update the allowed address pair, thereby correcting the ARP entry.
BGP Peering: We noticed that when we tried to create a cluster with a large number of worker nodes, the provisioning phase of these instances caused BGP neighborships to drop between OpenStack and our other virtualization environments (NSX-T), leading to significant issues. While OpenStack has the capacity to provision many instances at once, we investigated why this error occurred. We discovered that during the creation of new instances, the BGP update process entered a loop and was ignored by OpenStack BGP router.
When our network teams stopped rerouting from the NSX-T platform to the OpenStack, the problem was resolved.

Result

In conclusion, the development and implementation of the Hepsiburada Kubernetes Engine represent a significant milestone in our journey to efficiently manage a large-scale Kubernetes environment. By overcoming various challenges and integrating innovative solutions, we have not only enhanced our ability to handle massive traffic loads but also improved the scalability, reliability, and performance of our infrastructure. This journey, filled with unique challenges and learning opportunities, demonstrates our commitment to leveraging cutting-edge technology to deliver the best possible service. The Hepsiburada Kubernetes Engine is not just a testament to our team’s technical expertise but also a beacon of our dedication to continuous improvement and excellence in the ever-evolving world of Kubernetes and cloud computing.