An illustrated guide to Kubernetes Networking [Part 1]

Everything I learned about the Kubernetes Networking

Amanpreet Singh
ITNEXT

--

You’ve been running a bunch of services on a Kubernetes cluster and reaping the benefits. Or at least, you’re planning to. Even though there are a bunch of tools available to setup and manage a cluster, you’ve still wondered how it all works under the hood. And where do you look if it breaks? I know I did.

Sure Kubernetes is simple enough to start using it. But let’s face it — it’s a complex beast under the hood. There are a lot of moving parts, and knowing how they all fit in and work together is a must, if you want to be ready for failures. One of the most complex, and probably the most critical parts is the Networking.

So I set out to understand exactly how the Networking in Kubernetes works. I read the docs, watched some talks, even browsed the codebase. And here is what I found out.

Kubernetes Networking Model

At it’s core, Kubernetes Networking has one important fundamental design philosophy:

Every Pod has a unique IP.

This Pod IP is shared by all the containers in this Pod, and it’s routable from all the other Pods. Ever notice some “pause” containers running on your Kubernetes nodes? They are called “sandbox containers”, whose only job is to reserve and hold a network namespace (netns) which is shared by all the containers in a pod. This way, a pod IP doesn’t change even if a container dies and a new one in created in it’s place. A huge benefit of this IP-per-pod model is there are no IP or port collisions with the underlying host. And we don’t have to worry about what port the applications use.

With this in place, the only requirement Kubernetes has is that these Pod IPs are routable/accessible from all the other pods, regardless of what node they’re on.

Intra-node communication

The first step is to make sure pods on the same node are able to talk to each other. The idea is then extended to communication across nodes, to the internet and so on.

Kubernetes Node
(root network namespace)

On every Kubernetes node, which is a linux machine in this case, there’s a root network namespace (root as in base, not the superuser) — root netns.

The main network interface eth0is in this root netns.

Kubernetes Node
(pod network namespace)

Similarly, each pod has its own netns, with a virtual ethernet pair connecting it to the root netns. This is basically a pipe-pair with one end in root netns, and other in the pod netns.

We name the pod-end eth0, so the pod doesn’t know about the underlying host and thinks that it has its own root network setup. The other end is named something like vethxxx.
You may list all these interfaces on your node using ifconfig or ip a commands.

Kubernetes Node
(linux network bridge)

This is done for all the pods on the node. For these pods to talk to each other, a linux ethernet bridge cbr0 is used. Docker uses a similar bridge named docker0.

You may list the bridges using brctl show command.

Kubernetes Node
(same node pod-to-pod communication)

Assume a packet is going from pod1 to pod2.
1. It leaves pod1’s netns at eth0 and enters the root netns at vethxxx.
2. It’s passed on to cbr0, which discovers the destination using an ARP request, saying “who has this IP?”
3. vethyyy says it has that IP, so the bridge knows where to forward the packet.
4. The packet reaches vethyyy, crosses the pipe-pair and reaches pod2’s netns.

This is how containers on a node talk to each other. Obviously there are other ways, but this is probably the easiest, and what docker uses as well.

Inter-node communication

As I mentioned earlier, pods need to be reachable across nodes as well. Kubernetes doesn’t care how it’s done. We can use L2 (ARP across nodes), L3 (IP routing across nodes — like the cloud provider route tables), overlay networks, or even carrier pigeons. It doesn’t matter as long as the traffic can reach the desired pod on another node. Every node is assigned a unique CIDR block (a range of IP addresses) for pod IPs, so each pod has a unique IP that doesn’t conflict with pods on another node.

In most of the cases, especially in cloud environments, the cloud provider route tables make sure the packets reach the correct destination. The same thing could be accomplished by setting up correct routes on every node. There are a bunch of other network plugins that do their own thing.

Here we have two nodes, similar to what we saw earlier. Each node has various network namespaces, network interfaces and a bridge.

Kubernetes Nodes with route table
(cross node pod-to-pod communication)

Assume a packet is going from pod1 to pod4 (on a different node).

  1. It leaves pod1’s netns at eth0 and enters the root netns at vethxxx.
  2. It’s passed on to cbr0, which makes the ARP request to find the destination.
  3. It comes out of cbr0 to the main network interface eth0 since nobody on this node has the IP address for pod4.
  4. It leaves the machine node1 onto the wire with src=pod1 and dst=pod4.
  5. The route table has routes setup for each of the node CIDR blocks, and it routes the packet to the node whose CIDR block contains the pod4 IP.
  6. So the packet arrives at node2 at the main network interface eth0.
    Now even though pod4 isn’t the IP of eth0, the packet is still forwarded to cbr0 since the nodes are configured with IP forwarding enabled.
    The node’s routing table is looked up for any routes matching the pod4 IP. It finds cbr0 as the destination for this node’s CIDR block.
    You may list the node route table using route -n command, which will show a route for cbr0 like this:

7. The bridge takes the packet, makes an ARP request and finds out that the IP belongs to vethyyy.

8. The packet crosses the pipe-pair and reaches pod4 🏠

This is the foundation of Kubernetes Networking. So next time it breaks for you, be sure to check those bridges and route tables 😉

That’s all for now. In the next parts, we will see how the overlay networks work [Part 2], what networking changes happen as pods come and go, and how the outbound and inbound traffic flows [Part 3].

I’m still new to networking concepts in general, so I would appreciate feedback, especially if something is unclear or erroneous 🙂

What kind of networking or other issues have you encountered with Kubernetes?

Please leave a note here or hit me up on Twitter.

-

--

--