East-West (service-to-service) Communication — What is Service Mesh? And why do we need it over Kubernetes?

As per the CNCF survey published in May 2022 — Service meshes are on the rise, but greater understanding is required. This write-up attempts to clarify how service meshes supports interservice communication & why are they so desired.

Published in

CodeX

5 min readJul 31, 2022

A service mesh (Linkerd) dashboard showing service calls

In my previous article “East-West (service-to-service) Communication in Kubernetes — How do services communicate within a cluster?” I wrote about Kubernetes' native support for service-to-service communication using service discovery, durable abstraction of pods & basic load balancing. While it may serve the purpose for some simpler workloads, it may not be a desirable solution for many others where security, performance & availability are crucial. Therefore, many organisations either build their own platforms or go for an off-the-shelf product.

As cloud computing leader Kelsey Hightower puts it — “Kubernetes is a platform for building platforms. Its a better place to start; not the endgame.”

So let’s see how does service mesh make it the endgame in service-to-service communication.

What is Service Mesh?

As the name suggests, it’s a network/mesh for the services. It makes the network smarter by taking away the connectivity concerns from services. Thus the services can focus on their main business without bothering about concerns such as traffic management, security in transit, resilience and observability. Consider it a programmable network.

In terms of layers, it can be thought of as sitting between the network layer and the application programming layer.

**Programmable network —** layered placement of service mesh

How does a Service Mesh work?

A service mesh works over distributed proxies (as opposed to edge proxy or central proxy), where a proxy is running alongside each instance of application/service (or microservice). The purpose of this proxy is to intercept all traffic entering and leaving the application/service. If all the communication within the services takes place via such proxies, then all the communication is observable and actionable.

Service Mesh Proxies intercept traffic to & from services to make it secure, observable & actionable.

In a containerised world (such as Kubernetes), where each instance of a service is running inside its independent container, and the container itself is inside a Pod. The service mesh injects its proxy container (called sidecar proxy) along the service container in the same pod. This process is called meshing a Pod.

A meshed Pod — Service Mesh sidecar proxy acting as an ambassador intercepting incoming and outgoing traffic.

Sidecar proxy is co-located, co-managed and has the same lifetime as the service itself. But it gives rise to an elegant “out-of-process” architecture, therefore the proxy is transparent to the service. This also enables the proxy to be implemented in a completely different technology from the service (for example envoy proxy is used in AWS App mesh and Istio is implemented in C++ & Linkerd’s micro proxy is implemented in Rust).

If you are interested in “In-process” vs “Out-of-process” design, I addressed it in an earlier article here.

The distributed proxies make up the data plane of a service mesh and the governance is provided by the Control plane.

Why do we need it?

A service mesh decouples traffic management from Kubernetes by running proxies, thus managing inter-service traffic, security and observability by providing an abstraction closer to the service/application layer.

Security — In certain business domains (like banking) it’s a security obligation that both the source and the destination service can validate each other’s identity (even if they are in the same cluster) while keeping the communication encrypted. Service mesh eases such requirements using a mutual TLS out of the box and service-to-service access control policies.
Performance — More network hops don’t mean more latency, Magic! Meshed pods at source and destination introduce 2 additional hops in Pod-to-Pod communication. But contrary to the expectation of making it slow, experiments show that a service mesh with an intelligent routing algorithm can be faster. Here is an experiment https://linkerd.io/2017/02/01/making-things-faster-by-adding-more-steps/. And this cannot be compared to the round-robin load balancing from Kubernetes.
Audits and Monitoring — As all the traffic is flowing through the side-car proxies. All traffic is observable. A service mesh can deliver these matrices to dashboards to show real-time and historic behaviour.
Fine grain traffic management — In an agile microservice world with frequent deployments & canary releases, service mesh enables precise & gradual traffic shifting from an old version to a new version of the service.
Multi-cluster & hybrid-cluster — With Service Mesh, we do need to stitch the networks. Once the network layer is connected, a service mesh abstracts clusters from services (This is a topic which I plan to address in my future writeups).
Resilient communication — Service mesh comes with retry policies which can overcome faults and make distributed services work.

Considerations

More Memory & CPUs utilisation — Each sidecar proxy needs resources. While the resources for one sidecar may be small. Due consideration is needed when thousands of proxies are expected to run. It’s an apt place to think if one service mesh product should be chosen over another considering the needs, future predictions & NFRs. Also, certain events may trigger temporary spikes in the resource utilization for each side-car proxy.

The Envoy proxy uses 0.35 vCPU and 40 MB memory per 1000 requests per second going through the proxy. The Envoy proxy adds 2.65 ms to the 90th percentile latency. https://istio.io/latest/docs/ops/deployment/performance-and-scalability/

Benchmarking Linkerd and Istio: 2021 Redux

Earlier this year, we published Linkerd vs Istio benchmarks comparing the performance and resource consumption of the…

linkerd.io

The other considerations for choosing one service mesh product over another can be the need for spanning multiple clusters, hybrid environments, and managed service offerings.

Summary

Kubernetes provides resources, and the life cycle of services (or microservices) whereas service mesh governs the communication between services. Service mesh is not a Kubernetes-specific technology.

It’s worth mentioning that as per the responses to some of the questions from KubeKon 2020 it seems that Kubernetes is likely to get richer L7 features, somewhat overlapping the feature set of a Service Mesh. Therefore it’s good to keep an eye on the evolution of Kubernetes. But it still means that if there are implementations used such as Istio or Linkerd, they will remain valid.

The CNCF Survey link is here.

I hope you liked the article, I tried making it concise while covering the breadth of the topic. Please let me know your thoughts.

My other writeups on Kubernetes

How do services communicate within a cluster?

Why do we need multi-container pods?

What are the possibilities to expose non-HTTP services outside a Kubernetes cluster?

North-South Communication in Kubernetes — How Does a Client Talk To Service Inside a Cluster?