Redefining service mesh with Cilium
You might’ve heard about Cilium in the CNI world before. You’ll also find it sitting among other CNIs in the CNCF landscape.
But, there’s more to it now. It has evolved into a service mesh, just a little different one from those you’re used to see. Before we start, if you are totally new to service mesh, I’d recommend visiting my old blog to understand what a basic service mesh is
Untangling the service mesh
Simplifying service mesh so that you can make the right networking decisions for your architecture
So what’s different this time?
We are used to service meshes which run a sidecar alongside your application pod and proxy all incoming and outgoing traffic. They manage TLS, service discovery, retries, load balancing, etc. All in all, they make our lives a lot easier since we don’t need to manage all this logic inside our application code. And given that they’re containerized, you can write your code independent of the sidecar code language.
So, if we look at how the packets travel here, it would look something like this
Here, we decoupled this code from the application and put it in a sidecar. This decoupling changed our lives. So, can it be decoupled further? Can it be placed in the kernel?
You’d save resources since you don’t have to run it per pod, but rather per node. And you’d also get lower latency and complexity since the traffic would move like this
And, if the we don’t need Layer 7 information for some traffic, we can entirely skip the hop at proxy pod and just directly go to the network interface
By taking out the complexities of a standard service mesh, you will face significant latency and throughput improvements and here’s the proof for it.
That sounds amazing, why didn’t we do it all this time?
Well, we actually came really close with kube-proxy which sits very close to the Linux kernel and not exactly, but kind of acted like a service mesh while relying on traditional network-based functionality implemented with iptables.
What we lacked till now was the Layer 7 context since kube-proxy only needs to operate exclusively on the network packet level. It couldn’t satisfy the modern application needs like application layer traffic management, tracing, authentication, etc.
The solution to this was brought through eBPF by Cilium.
What’s this eBPF?
Historically, the operating system has always been an ideal place to implement observability, security, and networking functionality due to the kernel’s privileged ability to oversee and control the entire system.
eBPF can safely and efficiently run sandboxed programs in a privileged context such as the operating system kernel. It extends the capabilities of the kernel without requiring changing kernel source code or load kernel modules. This is not only secure but also portable.
Got it, so how does this setup help?
With this setup, you get benefits like
- Robust and secure implementation of Load Balancing while leveraging BGP, XDP and eBPF
- Simple, high-performance cross-cluster connectivity
- Distributed, identity-aware observability using Cilium’s Hubble framework. It provides Prometheus compatible metrics for L3/L4 and L7 network flow data and exposes context like which application in a pod made these connections
- Support for almost any network protocol
- Faster pod startup time, since no additional sidecar needs to run
- Transparent encryption capabilities which use the highly efficient IPsec capabilities built into the Linux kernel to automatically encrypt communications between all workloads within, or between, Kubernetes clusters
- Go beyond basic Kubernetes Network Policy by enabling DNS-aware policies (e.g. allow to *.google.com) and application-aware policies (e.g. allow HTTP GET /foo)
Cilium is a new way of looking at service mesh and it comes with tons of added benefits. Surely, it is not as matured as the old players like Istio, but given their roadmaps, I believe there’s a new wave coming.