Kubernetes Network Security: Exploring Cilium and Istio Implementations
Within Kubernetes, common networking infrastructure and security features (such as encryption, mutual authentication, traffic control and visibility) are not implemented natively. Instead these are delegated to the Container Network Interface (CNI) and service mesh. In this article we’ll explore the architectures and implementations of both Cilium and Istio.
(This article was originally inspired by Christian Posta’s KubeCon talk: Comparing Sidecar-Less Service Mesh from Cilium and Istio — definitely worth a watch).
CNI. Whilst Kubernetes defines networking requirements, the implementations of these are left up to the Container Network Interface (CNI) plugin — an exchangeable component chosen based on cluster requirements. Different CNIs provide varying support for security-related features such as network policies, encryption and network monitoring.
Service Mesh. Built on top of the CNI, a service mesh implements a network infrastructure with its own control and data planes through which traffic is centrally managed. Implementations vary, but DaemonSets and / or sidecar containers are commonly used to proxy and route traffic for each node and pod.
A service mesh aims to abstract away common networking functionality from application workloads to implement common features (by default) such as:
- Mutual authentication (verifying the identity of networking endpoints)
- Encryption (for node-to-node communication)
- Request traffic control (for load-balancing, canary rollouts, locality-aware routing, header manipulation, failovers, rate-limiting, etc.)
- Authorization (verifying traffic at layers 3, 4 and 7)
- Metrics (info on traffic at layers 3, 4 and 7 such as request codes, latency, dropped packets, etc.)
- Visibility (logging intra-cluster traffic and external traffic, flow logs)
Relationship. Whilst a service mesh requires a CNI, the exact relationship between them is blurry. Implementing service mesh functionality can require it to intrude on the responsibilities of the primary CNI, to for example route traffic within the host network namespace. Without special consideration, this can lead to conflicts — maybe a network policy to be enforced by the low-level CNI breaks after the high-level service mesh has applied its processing.
Similar to how some clusters (particularly in managed environments such as EKS) have particular compatibility requirements between primary and chained CNIs, introducing a service mesh into your cluster also requires additional compatibility considerations.
Istio
Istio (Sidecar). One of the most common service mesh is Istio in its sidecar implementation. In this, sidecar containers are automatically injected (through an admission controller) into all workloads within the mesh. Each sidecar is responsible for proxying (TCP) traffic for its pod’s network namespace. The proxies themselves, known as the data plane, are based-off Envoy — a high-performance “event and communication bus” found in many open-source projects.
# Label namespace for Istio injection
$ kubectl label namespace in-mesh istio-injection=enabled
# Run test pod
$ kubectl -n in-mesh run --image curlimages/curl \
test-pod --command -- /bin/sleep infinity
# See the Istio initContainer pods
$ kubectl -n in-mesh describe pod test-pod
[...]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
[...]
Normal Created 11s kubelet Created container istio-init
Normal Started 11s kubelet Started container istio-init
[...]
Normal Created 9s kubelet Created container istio-proxy
Normal Started 8s kubelet Started container istio-proxy
# Two containers: curlyimages and the Istio sidecar
$ kubectl -n in-mesh get pod test-pod
NAME READY STATUS RESTARTS AGE
test-pod-496kj 2/2 Terminating 0 2mAdvantages of Sidecars. Sidecar implementations of networking infrastructure enable a number of natural advantages: (1) being directly tied to application workload lifecycles infrastructure naturally scales; (2) proxies are single tenant, better for configuration and storage of key material; (3) upgrades to all sidecars can be done serially, minimising downtime.
Disadvantages of Sidecars. However, on the other hand: (1) resource consumption can be higher than required due to the duplication of infrastructure; (2) applications have to be restarted in-tandem with proxy upgrades; (3) co-location of sidecars and workloads means a compromise in one directly compromises the other.
Istio (Ambient). To address these, alternative to per-pod proxy sidecars, Istio recently introduced Ambient mode. In this L4 and L7 processing are separated into two components: per-node L4 proxies (known as the ztunnel) implement core functionality (such as authentication, mTLS, L4 authorization) whereas L7 features are offloaded to envoy-based dedicated workloads (known as waypoint proxies).
$ kubectl get pods,daemonset -n istio-system
NAME READY STATUS RESTARTS AGE
pod/istio-cni-node-btbjf 1/1 Running 0 2m18s
pod/istiod-55b74b77bd-xggqf 1/1 Running 0 2m27s
pod/ztunnel-5m27h 1/1 Running 0 2m10s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/istio-cni-node 1 1 1 1 1 kubernetes.io/os=linux 2m18s
daemonset.apps/ztunnel 1 1 1 1 1 kubernetes.io/os=linux 2m10sDisadvantages of Per-Node. Traditional concerns with multi-tenant per-node components (such as the ztunnel) include: (1) difficulties in determining resources to allocate the per-node proxy due to unpredictability in Kubernetes scheduling; and (2) having a centralized and sensitive component storing credentials with high privileges.
Istio addresses these issues highlighting, compared to sidecars, or per-node L7 proxies, waypoint proxies enable computationally expensive L7 processing to be easily invoked only when necessary (enabled on workloads through the istio.io/use-waypoint label) and scaled independently of applications and L4 features.
Additionally, the ambient mesh security deep-dive points out that separating L7 functionality into waypoint proxies minimises the attack surface of the shared ztunnel, reducing the risk of vulnerability to less than that of in Istio sidecars (which process both L4 and L7).
istio-cni. Redirecting traffic between namespaces is the responsibility of the CNI. However, the ambient mesh requires routing traffic from pod network namespaces to the namespace of the shared ztunnel. The istio-cni component is responsible for this processing without conflicting with the underlying primary CNI.
A dedicated chained CNI — extending the functionality of the existing CNI — alerts the per-node istio-cni agent on the creation of new pods. This reaches into the workload network namespace, setting up iptables rules to route traffic to the proxy. The agent also alerts the per-node ztunnel of the new workload, and its network namespace, who then creates sockets within the workload namespace to which the written iptables rules route to.
Once set up, traffic within the mesh flows as in the following diagram. This implementation means network traffic is enriched with service mesh features, such as encryption, as soon as the traffic leaves the pod network namespace — something not necessarily true for all per-node mesh implementations. (Note that HBONE is Istio’s custom HTTP-based tunneling protocol used within the mesh)
eBPF. Traditionally CNI and service mesh components relied on iptables and kernel modules to route and manage traffic. However, eBPF is a more modern solution — improving performance and reducing maintenance. eBPF is an event-driven framework allowing sandboxed programs to be ran within the kernel in response to pre-defined (or custom) hooks, such as system calls, network events, etc.
Whilst eBPF cannot be used to implement all service mesh features, such as L7 processing which requires complex logic, Istio supports eBPF for traffic redirection to the ztunnel — improving throughput when compared to iptables.
mTLS. Istio implements mTLS directly between proxies (or the ztunnel in an ambient mesh), using keys and certificates generated and rotated by the Istio agent (hosted within the Envoy container) and Istiod control plane. Alternatively, Istio also supports integrating SPIRE as a CA — enabling more granular identities, additionally based on properties of a workload’s underlying node.
Observability. Metrics are exported by the control plane, ztunnel and envoy proxies (both sidecars and waypoints). Compared to Cilium, Istio’s visibility into network logging is limited, only exposed through Envoy access logs, providing information on TCP traffic.
Cilium
Originally a CNI that offered full compatibility with Istio, Cilium has more recently began implementing service mesh features within its own components — improving performance and reducing complexity. Similar to Istio, Cilium relies on a combination of eBPF and Envoy depending on L4 or L7 processing.
Cilium-Agent. The cilium-agent is the core control plane component, located on each node, responsible for watching the cluster state through the Kube API and injecting eBPF code locally. The CNI plugin interacts with the agent on pod events. For L7 processing, Envoy is launched as a separate process within the cilium-agent, or optionally as a separate cilium-envoy pod.
Identities. Cilium controls are decoupled from traditional network identifiers, such as IP addresses, to avoid having to update networking rules for each node whenever pods are started or stopped. Instead, identities are derived based on workload labels. In this implementation, a new pod is only delayed until its identity is determined as opposed to updating the networking rules on all nodes.
mTLS. Traditionally mTLS could be implemented at two levels: at the proxy sidecar / identity level or at the network / per-node level. Implementing at either granularity has its advantages and disadvantages. For example, implementing at the network level exposes a larger blast radius if keys are compromised whereas implementing at the proxy level typically only supports TCP.
Inspired by Google’s Application Layer Transport Security, Cilium implements a combination of the two options: authentication is implemented at the finer (per-service / identity) granularity whereas encryption is provided by the network layer — with WireGuard or IPSec.
Authentication. Cilium implements an eBPF auth cache describing flows previously authenticated. When a workload pushes a packet that is yet to be authenticated the kernel notifies the cilium-agent and drops the packets. The cilium-agent is then responsible for making the mTLS handshake out-of-band in user-space, retrieving the required key materials through a pluggable auth interface — commonly SPIFFE. Once complete the cache is updated and packets are forwarded.
In contrast to Istio, performing the mTLS out-of-band allows services to avoid the cost of the incurring the handshake for each connection.
Hubble. Hubble is Cilium’s component responsible for aggregating visibility and metrics on traffic collected through eBPF. The server is embedded within the cilium-agent and exposes gRPC APIs for interaction through the hubble cli. The standalone hubble-relay component aggregates the results of all servers, exposing its own API with cluster-wide visibility. (To monitor Cilium’s infrastructure, its agents, operator and envoy proxies additionally expose metrics.
$ hubble observe --pod deathstar --protocol http
May 4 13:23:40.501: default/tiefighter:42690 -> default/deathstar-c74d84667-cx5kp:80 http-request FORWARDED (HTTP/1.1 POST http://deathstar.default.svc.cluster.local/v1/request-landing)
May 4 13:23:40.502: default/tiefighter:42690 <- default/deathstar-c74d84667-cx5kp:80 http-response FORWARDED (HTTP/1.1 200 0ms (POST http://deathstar.default.svc.cluster.local/v1/request-landing))
May 4 13:23:43.791: default/tiefighter:42742 -> default/deathstar-c74d84667-cx5kp:80 http-request DROPPED (HTTP/1.1 PUT http://deathstar.default.svc.cluster.local/v1/exhaust-port)The per-node Hubble servers expose two services: the observer and the peer. The principal observer exposes endpoints to collect logs and metrics whereas the peer keeps track of other Hubble instances (for use by the relay). Events are collected using eBPF and pushed to a user-space ring buffer from which the peer reads from when its API is queried. Both are exposed as a Kubernetes service for access by the relay.
