Episode-III Meshville

Fatih Nar
Open 5G HyperCore
Published in
6 min readAug 20, 2021

Authors: Doug Smith Principal SW Engineer, Fatih Nar Chief Architect, Vikas Grover Principal Solution Architect

Introduction

Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. In control theory, the observability and controllability of linear systems are mathematical duals; they are correlated. In Kubernetes (aka K8s) the basic facilities of a platform control system can help you determine certain states (readiness, healthiness, etc) of your workloads and act on collected data to reach workload’s desired declared state, however they are very limited and narrowly focussed.

In order to achieve a higher level of deterministic & declarative state control and management, we need higher volume, variety and velocity of data representing the workload / platform with their lifecycle, feeding into a better analytical engine that can estimate/derive different state characteristics of the workload / platform. In k8s workload / platform data can be in the form of logs, metrics and traces and so on. This data can be correlated in/out with external outputs for systems’ duality (ie how well the workloads are acting/performing wrt expected <-> ie declared state).

To implement better observability and workload management for telecom using service mesh, in coming sections we will delve into the fine details of Kubernetes networking — including pods, namespaces, processes & the use of iptables and discover what that can mean for telco workloads for better observability and management.

Background

K8s Service is an abstract concept, for allowing different applications to interact with each other without tight coupling. Application-A can talk/interact with Application-B over Application-B’s Service, where “Service” represents interaction boundary. In the usual k8s workload blueprint, there are multiple services talking to each other and delivering the “complete solution” serving the end consumer which can be a person or another system (ie Machine to Machine communication). In order to determine how well this grid of service is performing, as well as managing it (such as steering it) , it is aimed to be delivered by a concept called “Service Mesh”.

Despite there are various service mesh solutions available some of which are proprietary (ie vendor lock-in) and some are (or used to ) powered by open source communities, design approach is pretty much same across these different solutions which is based on using/implementing a broker/proxy (ie mesh enabler) component accompanying the workload so that all necessary data can be generated and collected in real time independently and individually.

With Service Mesh Enabled, you have two containers in the same pod — your application container and sidecar container. Pods in Kubernetes are composed of “sandboxes”, a type of container that has Linux namespaces that may contain multiple, logically grouped containers in order to share infracture, for example networking.

Figure-1 Containers vs System Constructs

These Containers inside a pod have different mount/mnt (both need different root filesystems) and Unix Time Sharing/UTS namespace, however they share other namespaces and thus can see each other’s processes, because they share the same network namespace, the pod offers the same ip to all containers inside, on top Istio Service Mesh sets up iptables for dynamic intervention within same shared network namespace (details will be shared in coming section below).

Figure-2 Basic Block of Service Mesh

In the original k8s network fabric design workloads expected to interact with each other over network fabric via a single network interface. However the inherited complexity of telecom solution stacks defined by ETSI, 3GPP etc, demands multiple interfaces in/out of each defined solution component (i.e. container/pod), hence Multus CNI was developed to plumb multiple interfaces into k8s pods. It’s notable that with Multus CNI, and the NetworkAttachmentDefinition specification, the original network interface (typically, eth0) remains in place as “the default network”, which provides the pod-to-pod connectivity as required by the Kubernetes specification, itself.

Container Networking Interface (CNI) Specification provides a pluggable application programming interface (API) to configure network interfaces in linux containers. Multus CNI is such a plug-in, and is referred to as a meta plug-in; a CNI plug-in that can run other CNI plug-ins. It works like a wrapper that calls other CNI plug-ins for attaching multiple network interfaces to pods in k8s. Multus CNI is the reference implementation for the Network Plumbing Working Group, which defines the specification for the NetworkAttachmentDefinition, a Kubernetes Custom Resource which is used to express the intent for which networks pods should be attached to.

Figure-3 Multus CNI

Networking Under The Hood

In order to leverage multiple networks with your containers (sitting in the same pod), first you need to plan, design, and then implement your networking fabric. Later you refer to those network parameters inside the k8s network definitions.

Figure-4 Multus Logical Implementation Topology. [Credits: Link]

As a tenant on a k8s cluster you can create and own your network attachment definition(s) within your tenant namespace or you can refer to a common network attachment definition defined under “default” and/or “openshift-multus” namespace which is/are owned by the cluster administrator.

Figure-5 Sample Multus Network Attachment Definition(s)

In our tests, which shared some snapshots below, we have used network attachment definitions from openshift-multus namespace.

Figure-6 Network Namespaces vs Interfaces vs Packet Routing.[Ref:Link]
Figure-7 Detailed Packet Logical Flow Diagram with Multus & Istio Sidecar. [Credits: Link]

Key Findings

So far what we have discovered which can translate to k8s tenant workload experience are as follows:

  1. Any traffic from any interface within POD’s network namespace is subject to packet processing via the same iptable rules within that particular pod network namespace.
  2. Any incoming traffic from any interface based on TCP transport protocol will end up going to the service mesh sidecar container -> come back and then go to the application container, also follow the same path on return.
  3. Any incoming traffic from any interface based on NON-TCP (ex SCTP) will end up going directly to the application container.

Closure

Pros

  • Service mesh with Multus; service mesh sidecar container intercepts ALL TCP inbound and outbound traffic from/to ALL POD Interfaces, providing observability (ie insights) and management abilities (steering) for workload traffic
  • Main application container does not need to do anything extra to get benefits of service mesh even with multiple interfaces delivered with Multus CNI.

Cons

  • Service mesh sidecar container receives all TCP traffic despite it can handle/use only certain application traffic for observability and management, ex; diameter/tcp will cross through sidecar container without having any benefit from it as diameter is not supported by istio.
  • Sidecar container injection introduces latency for round trip time (rtt) of a packet, which might be insignificant for telco control plane traffic however user plane traffic will/might notice such latency overhead with degraded user experience especially in real time communication services.

Thoughts/Prayers

  • [New Development Idea] POD network namespace ip table rules altered by service mesh can/may be enriched to exclude certain network interfaces based on source/destination address filtering, this might help multus cni to offer isolated user plane traffic via CNI-Plugins usage.
Figure-8 Possible Approach for Multus CNI Plugin Driven Add-On Interface Traffic Isolation
  • [Configuration Management] Containers that are involved in telco user plane traffic can/shall disable service mesh; ie sidecar.istio.io/inject: “false” set in pod deployment annotation. We know this is kind a “Nuke’m All” approach but hey it is a solution :-).

--

--