We discuss today the networking in container world and primarily in context of K8s . We are not covering the policies and isolation part , but only how L2 and L3 play a role in packet flows.
Flannel is an overlay network mechanism where as Calico is basically a pure L3 play.
Flannel works by using a vxlan device in conjunction with a software switch like linux bridge or ovs.
Container A when tries to reach container B on different host the traffic is pushed to the bridge on host A via the VETH pair. The bridge then based on ARP tries to get the mac of container B. Since container B is not on the host the traffic by bridge is forwarded at L2 to the vxlan device (software TAP device) which then allows flannel daemon software to capture those packets and then wrap then into a L3 packet for transport over a physical network using UDP. Also vxlan tagging is added to the packet to isolate them between tenants.
Flannel shown diagrammatically
In case of Calico, the approach is little different. Calico works at Layer 3 and depends on Linux routing for moving the packets.
Calico injects a routing rule inside the container for gateway at this IP 169.254.1.1.
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0 scope link
What this means is that any traffic from the container first tries to go to the default gateway IP. Since the default gateway IP is reachable at eth0 , the ARP request is sent to eth0 for determining the mac address for gateway IP.
The trick here is the arp proxy configured at the veth device on host side.
This arp proxy responds back with its mac for the ARP request for 169.254.1.1.
Post this resolution the packets are sent to the veth device with source IP of container and destination IP of target container. From here on the L3 routing of the host takes effect which knows how to route for the destination container IP.
The routes amongst the hosts are synchronized via the BGP protocol. There is a BGP client (Bird) running on each host which makes sure each host has the updated routes.
So here you can see in Calico solution, we got rid of software bridges as well as preserved the source IP.
Diagrammatically the flow is shown below
Also the overlay complexity is out of the picture and it’s a pure L3 solution just based on the principles of how the internet works. Since we make use of routing principles rather then L2 broadcast domains, the need of vlan is eliminated. Instead for tenant specific network flows Calico resorts to iptables based mechanism.
So if we just try to compare how say a bridge based communication happens vs a pure L3 communication, the difference is that in case of bridge the bridge device IP acts as the gateway for containers and so the next hop for any traffic not within same broadcast domain is directed to the bridge device. This allows the L3 on linux kernel on the host to apply the routing (the routing rules are configured to forward the packets to the vm on which destination container resides) or they are forwarded to a tap device to give opportunity to tunnel the packets via GRE/vxlan.
On the contrary the Calico approach relies on proxy ARP mechanism to transfer the packet to the veth counterpart device on host side and again applying the routing to take traffic out. So if we analyse this carefully, technically the bridge is replaced with proxy ARP and route synchronization happens over BGP.
For more information on Calico you can take a look at https://www.projectcalico.org/
In essence packets from vm or containers can use one of the following mechanisms to communicate with containers/vms on other hosts
- Use overlay like GRE/VXLAN
- use NAT to send packets to remote host
- use Calico like mechanism with pure L3 routing without having any NAT and bridges. This allows to preserve source IP and security policies ingress can be applied adequately based on source IPs
Disclaimer : The views expressed above are personal and not of the company I work for.