Episode XI: Latency-Driven Service Orchestration

Fatih Nar
Open 5G HyperCore
Published in
4 min readFeb 2, 2023

Authors: Fatih E. Nar The Architect at Red Hat, Mario Vazquez Cebrian Principal Software Engineer at Red Hat, Alberto Morgante Medina Platform Product Owner at Inditex, Fatih Baltaci CTO at DDosify.

1.0 Introduction

In the previous episode, we visited the value and importance of latency for service performance, user experience, and impacts on business and the economy. Also, a great article on this matter has been written and published by our solution partners at DDosify; a new era driven by low latency.

Figure-1 Challenges, Approach & Benefits of Latency-Driven Service Orchestration

In this episode, we will present the actual engineering development being done to leverage the partnership between Red Hat and DDosify for latency-driven service/application orchestration to address the challenges with a well-crafted technical solution and harvest the benefits, as noted in Figure-1.

2.0 Background

As Kubernetes (k8s) being the agreed defacto application platform by cross-industries, and using k8s in a cloud-native way (i.e., reasonably sized, distributed, up-to-date ephemeral clusters) requires an over-seeing cluster management capabilities to perform;

  • Life cycle management (lcm) of these k8s cluster(s) on different infrastructure types with end-to-end visibility & control.
  • Distribution/placement & management of workloads (wherever & whenever & however needed) with lifecycle, security, and compliance wrt required legal/industry specifications from a single pane of view.
Figure-2 Hub vs Managed K8s Cluster Topology

With the open-cluster-management project, you can address the challenges marked above with the following capabilities:

  • Work across a range of application environments, including multiple data centers, private clouds, and public clouds that run Kubernetes clusters.
  • Easily create k8s clusters and offer lifecycle management via a single pane.
  • Enforce policies at the managed clusters using k8s-supported custom resource definitions (CRD).
  • Deploy and maintain day two operations of applications distributed/placed across a fleet of clusters.

Red Hat develops & maintains these capabilities with an open-source mindset and methodology and also packages & hardens them for enterprise needs under Red Hat Advanced Cluster Management (RH-ACM) product umbrella.

3.0 Solution & Testing

In our solution blueprint, we have leveraged RH-ACM and basic k8s constructs (such as label selectors) to implement latency-driven workload life cycle management (scheduling, placement, continuous monitoring, and auto-migration) across multiple managed k8s clusters.

We have created a new latency application placement operator to work with RH-ACM that integrates with DDosify latency API, performs latency measurement operations seamlessly in an automated way, and uses collected latency data for application scheduling.

Figure-3 Latency Driven Workload LCM Solution Architecture

Breakdown of latency operator workflow (Figure-3):

(1) The latency operator gets installed on the RH-ACM Hub cluster and configured with the desired point of origin interest per workload via location labels. The latency operator creates a latency measurement job on the DDosify cloud (via DDosify Latency API).

(2) DDosify cloud performs latency measurements from marked-down geo-location granularity (Global, Continent, Country, State, City) level.

(3) Collected fresh latency data gets pushed/pulled from the DDosify cloud to RH-ACM, and Latency Operator and Placement Rule lifecycle management run as described in Figure-4.

Figure-4 LatencyCheck Creation & Control Logic Flow

(4) Suppose the Placement Rule gets updated, and the new destination is another cluster. In that case, the application gets scheduled by RH-ACM into the new low-latency cluster and removes the workload from the previously selected cluster.

Figure-5 PlacementRule Update Logic Flow

As presented above, our solution blueprint covers auto-application migration continuously based on fresh latency metrics collected based on interest origin locations where application/service users/consumers reside.

4.0 Testbed & MVP Demonstration

In our testbed, we have recorded a demo for you where you will see/experience the following:

  • We have a hub cluster (name: local-cluster) and a managed cluster (name:sandbox01) where these clusters are labeled (label:ddosify) with geo-location value (EU.ES.MA, EU.ES.BCN) on RH-ACM.
  • At the demo application’s initial orchestration time (1st deployment), the cluster in EU.ES.BCN was observed with the lowest latency (41ms) versus EU.ES.MA with higher latency (43ms), and hence application deployed on tagged with EU.ES.BCN label -> sandbox01.
  • We have enforced (added fake latency overhead, how to do gist is here) override on latency measurements on the external latency API integration side, EU.ES.BCN (cluster: sandbox01) climbed to 45ms latency, and EU.ES.MA (cluster:local-cluster) stayed at 43ms. Hence, the local-cluster became the lowest latency cluster to be used for nearby service consumers.
  • RH-ACM initiated an application migration from sandbox01 (EU.ES.BCN) to the local-cluster (EU.ES.MA) cluster autonomously.

Demo:

5.0 Conclusion

Using an over-seeing platform & service orchestrator (RH-ACM) with programmable capabilities (k8s operator framework) allowed us to implement dynamic application management that leveraged continuously measured latency (i.e., user experience key factor).

Figure-6 Impact Summary

--

--