Understanding Kubernetes Networking — Part 2

Sumeet Kumar
Microsoft Azure
Published in
6 min readOct 20, 2020

This is the second post in the ongoing series on understanding Kubernetes Networking.

In this post, we will discuss on POD to POD communication.

If you have missed Part 1 on container to container communication, you can check it here.

2. POD to POD communication

  • There are 2 ways in which POD would communicate with other POD.

- Intra-Node communication of PODs.

- Inter-Node communication of PODs.

  • As mentioned in the previous post, namespaces are used to isolate resources for processes. Here, we will be concentrating on network namespace.
  • The VM’s primary NIC [eth0] is in the Root Network Namespace i.e. the network namespace of the VM/Node itself.
  • “ctr*” will be containers within PODs.
  • POD 1 and POD 2 are in their respective network namespace.
  • PODs will have their own network stack and their own “eth0” interface.

a.) Intra-Node communication of PODs:

  • Every POD in Kubernetes, will have an IP address.
  • Now, every PODs are in their own “ns”, it has “eth0” of its own. So, we need a way to communicate between our POD network namespace and Root network namespace.
  • Enters: veth [Virtual Ethernet devices]. It has 2 virtual interface that can spread over multiple network namespaces.
  • It will be deployed in Root Network Namespace and will have one interface in “root ns” and other interface in our “POD ns”.
  • Now we need to send the data from POD 1 to POD 2 via Root ns. So, we need a way to bridge our veth pairs.
  • Enters: Linux Ethernet Bridge.
  • Generally, you will see it as “cbr0”. It is L2 network device used to unite two or more network segments to connect two or more networks.
  • It maintains a “Forwarding Table” and uses ARP protocol to discover the MAC address associated with a given IP.

Now, if a packet needs to go from POD 1 to POD 2:

  • POD 1 sends a packet to its Ethernet device “eth0”.
  • POD 1’s, “eth0” is connected via a virtual Ethernet device to the root namespace, veth0.
  • The bridge “cbr0” is configured with “veth0” a network segment attached to it and “veth1” as another network segment.
  • Once the packet is on bridge, the bridge resolves the correct network segment (via ARP) to send the packet to — “veth1”.
  • When the packet reaches “veth1”, it is forwarded directly to POD 2’s namespace — the “eth0” device of POD 2’s namespace.
  • If login into to Azure Kubernetes Nodes you can see the bridge network.
  • Note: “docker0” is used by docker.
  • You can list all the bridge network on your node via command: brctl show

What is CNI?

  • Abbreviated for Container Networking Interface.
  • It is a specification, where all the networking implementation is done by Plugins.
  • It was developed to have simple contract between Container Runtime and networking implementation on containers.
Image Courtesy : https://thenewstack.io/container-networking-landscape-cni-coreos-cnm-docker/
  1. CNI plugins: which adhere to the CNI specification, designed for interoperability.
  2. Kubenet plugin: implements basic “cbr0” using the bridge and host-local CNI plugins
  • CNI comes with some default plugins like: bridge, MACvlan etc.
  • The CNI plugin is selected by passing Kubelet by the — network-plugin=cni command-line option. Kubelet reads a file from — cni-conf-dir (by default /etc/cni/net.d) and uses the CNI configuration from that file to set up each POD’s network. The CNI configuration file must match the CNI specification, and any required CNI plugins referenced by the configuration must be present in — cni-bin-dir (by default /opt/cni/bin).
  • There are multiple vendors that provides different capabilities for configuring network via CNI.
  • Example:

- “Flannel” — used for creating overlay network.

- “Calico” — uses BGP protocol for routing.

More Information:
- https://github.com/containernetworking/cni/blob/master/SPEC.md#cni-plugin
- https://www.youtube.com/watch?v=l2BS_kuQxBA

b.) Inter-Node communication of PODs:

In my current configuration the IP addresses are:

Nodes are deployed in a subnet: 10.240.0.0/16 .

  • Here, the packet would be traversing from POD 1 (Node 1) as source to POD 4 (Node 2) as destination.
  • Packet leaves POD 1’s network ns at eth0 and enters the root network ns at vethxxx.
  • Now packet arrives at cbr0, which makes the ARP request to find the destination.
  • ARP will fail at the bridge because there is no device connected to the bridge with the correct MAC address for the packet. On failure, the bridge sends the packet out the default route — the root namespace’s eth0 device.
  • Packet leaves the machine Node 1 onto the wire with src=pod1 and dst=pod4.
  • The Route Table has routes setup for each of the node CIDR blocks, and it routes the packet to the node whose CIDR block contains the POD 4 IP.
This Route Table is applied on the subnet of your Kubernetes deployed Nodes
  • Now, the packet arrives at Node 2 at the network interface eth0.
  • POD 4 isn’t the IP of eth0, the packet is still forwarded to cbr0 since the Nodes are configured with IP forwarding enabled.
Inside the Node
On the Node ( from Platform )
  • The Node’s routing table is looked up for any routes matching the POD 4 IP. It finds cbr0 as the destination for this Node’s CIDR block.
  • route -n:

If we assume that the src is: 10.244.1.6 (POD 1) and dest is: 10.244.2.6 (POD 4).

Using UDR packet from 10.244.1.6 is correctly routed to right node with dest as: 10.244.2.6.

Once the packet is inside the other node, it does a route lookup and send packet to “cbr0”.

  • The bridge takes the packet, makes an ARP request, and finds out that the IP belongs to vethyyy.
  • The packet crosses the veth-pair and reaches POD 4.

This brings us to the end of this post.

  • If we observe here the packet has not been NATed in any form of (in both scenario: Inter and Intra node) communication. Hence, the Kubernetes networking fundamentals are kept in check.

In the next post, we will discuss about POD to Service communication.

--

--