Episode-II The Grid

Authors: Rimma Iontel Chief Architect, Fatih E. NAR Chief Architect

Fatih Nar
Open 5G HyperCore
6 min readJul 26, 2021

--

Introduction

Kubernetes (aka k8s) is a proven base platform for hosting micro-services. It facilitates cloud native approach to application development , coupled with DevOps and GitOps tooling, it has become a “de-facto platform” for containerized services across multiple industries.

However, k8s is not an end-game, i.e. it may not be enough by itself to address all the needs for application development and the post-deployment operational tasks which manage the mature, reliable and predictable execution of these applications.

We have seen complementary, i.e. not replacing but enriching, solutions in the market that fill the gaps or close the weaknesses on platforms where k8s is the underlying engine. They come in the form of k8s native solution packages, known as k8s operators, available on open source operator hub. Some of the examples are GitOps and DevOps pipelines, service mesh, performance monitoring tools, multi-cluster management, etc.

End-to-end technology stack is a good starting point, but the end goal is designing the deployment model to reach the consumers and support backend systems wherever they are and offering them high performance and cost-effective outcomes. This is the way to convert technology stack into a successful business solution. When the service targets are located at the edge, edge computing becomes crucial for media and communication services and offerings.

In this article we will dive into an analysis of k8s deployment models for edge applications. We’ll address enabling North-South (external consumers) and East-West (backend systems) communication between different infrastructure types hosting the same application platform for developer and operational consistency.

The Need & Possible Solutions

Being at close proximity to consumers has great benefits, including but not limited to low latency response and data locality. However, there are also multiple challenges. One of the key challenges with the k8s deployment model is the placement of the k8s control plane which is used to manage the workers which comprise the resource pools consumed by the applications and services. There are two main options to handle the control plane placement:

(A) Deploy fully fledged cluster(s), complete with control (aka master nodes) and worker nodes, everywhere you need your applications to be accessible.

(B) Deploy only worker nodes at the edge and connect them to the central location hosting the control plane.

Figure-1. Option-(A) Full Cluster Model

Option-(A) can be simplified with innovative deployment models:

Compact High Availability (HA) cluster with a minimum of three nodes accommodating both control plane and worker node roles.

All-in-one single node stand-alone cluster.

The overhead of having a dedicated control plane exists in both compact deployment models.

Option-(B) eliminates the overhead of having a dedicated control plane at each location, but it may not be feasible if there is a significant latency or a lack of sufficient bandwidth for cluster internal services/operations between the k8s control plane and the worker locations.

Figure-2. Option-B Remote Worker Approach

If the network connectivity between the core cluster hosting the k8s control plane and remote worker nodes meets performance requirements, for example the latency is below the k8s node-status-update-frequency, remote worker node (RWN) can be used to cost-optimize the distributed application platform solution. We refer to this approach as “grid-platform” (aka grid8s) where the central site performs control and management tasks, while remote sites deliver a platform with consumable resources.

Grid8s Overview

While we are making application platform available wherever it’s necessary, we need to secure the traffic between applications hosted on k8s cluster and also make sure the break-out traffic to/from consumers is optimally placed to ensure performance, low cost, and secure communication path.

Figure-3. High level view on remote worker nodes

Central k8s clusters get deployed in selected geo-locations where they can serve nearby end consumers. The remote workers expand the reach of the cluster to remote sites without affecting the integrity of the cluster control plane, maintaining its high availability and scalability.

Figure-4. Solution Topology : Central Cluster Expanded with RWN

In the distributed deployment model we have to remember that remote workers need to have access to the relevant cluster internal communications in order to be monitored and managed by the cluster control plane and to make them available for scheduling the workloads via cluster workload scheduler. Another key cluster facility the remote workers need to participate in is the cluster domain name service (cluster-dns) that is hosted by control plane nodes, this service also enables service discovery feature in service mesh solution across the whole cluster.

Networking Under The Hood

Networking is the key functionality in every distributed computing solution and hence it is a critical part of k8s clusters. Central cluster nodes (master/worker) share similar primary networking configurations, including network interface definitions, network bridges, routes, dns server configurations, etc., as they are running on the same infrastructure, however the remote workers are expected to be deployed on different infrastructures and therefore they would normally have site-specific networking configurations.

Figure-5. Network Configuration: Central Worker Node vs Remote Worker Node

Open Virtual Network (OVN) fabric with IPSec networking solution has been gaining popularity within the k8s community. It also offers IPSec Egress to be assigned to tenant namespaces on desired worker nodes via node labeling (i.e. traffic break out on-premise via remote worker node).

RWN approach should mainly be considered with long living control plane implementation where short-term loss of control plane would not cause any critical service outages. The distance between remote workers and the control plane nodes still needs to be within a latency range where keepalive timers will not time out, so that RWNs will not be marked as unhealthy/unreachable by the control plane.

Figure-6. Provision RWN with IPSec Egress for Specific Tenant

OVN-IPSec cluster networking allows assignment of cluster traffic (N/S and E/W) to exit clusters in the desired location via remote worker nodes performing the networking break-out. This can be achieved per tenant via tenant namespace label selectors while pointing where exactly the traffic will exit the cluster via which remote worker node by node label.

Figure-7. Workload Egress Traffic with IPSec Egress NodeIP

Allowing network break-out on remote worker nodes enables low-latency access to consumers and backend systems with secure access.

Conclusion

Telecommunications and media solutions are widely distributed systems over multiple geo-locations, allowing them to reach a greater consumer base, be it human subscribers or machine to machine systems.

K8s with its origins in an enterprise data center was not intended for deployment across distributed locations but it doesn’t mean it can’t grow and adjust. In this article we have reviewed possible solutions to expand the scale of a k8s cluster while constraining the failure domain.

We have shown the details of a distributed k8s cluster networking and talked about how it allows the building of k8s clusters that provide reliable low latency access across wide geographical areas, which could be of a significant value for many modern services, including 5G.

In the upcoming episodes we will evaluate how the blueprint discussed above can be applied to open source 5G core deployment. Also we will inspect the key attributes of being true cloud native versus faking the cloud promise.

--

--