Kubernetes Networking on AWS, Part III

Vilmos Nebehaj
Elotl blog
Published in
5 min readJan 4, 2020

We have looked at a few possible options and common setups for implementing Kubernetes pod networking on AWS:

Part I: https://medium.com/elotl-blog/kubernetes-networking-on-aws-part-i-99012e938a40

Part II: https://medium.com/elotl-blog/kubernetes-networking-on-aws-part-ii-47906de2921d

Now let’s look at how pod networking works when Milpa is the Kubernetes container runtime in a cluster.

Milpa as a Kubernetes Container Runtime

Milpa plugs into Kubernetes using the Container Runtime Interface (CRI), via a shim service called Kiyot.

A problem is that some specific workloads are not supported via Milpa (e.g. pods belonging to DaemonSets, or stateful pods using Persistent Volumes). These workloads are run via a regular container runtime such as docker, containerd or cri-o. Workloads suitable for running via Milpa are annotated by an admission webhook, and a CRI proxy will route pod creation requests from the kubelet based on the presence of this annotation to either Kiyot+Milpa, or to the regular runtime.

Architecture of Milpa as a Kubernetes container runtime.

This is a simple design for enabling the elastic scaling of Kubernetes workloads on public clouds, but also brings up a few questions regarding networking:

  • How does pod networking work for Milpa pods?
  • How do pods running via a regular container runtime and pods created via Milpa communicate?
  • What about advanced networking concepts, such as Kubernetes service proxying, Network Policies, HostPorts, NodePorts, hostNetwork mode and so on?

Pod Networking with Milpa

Pod-to-pod networking in Kubernetes relies on a few assumptions:

  • Every pod has an IP address that is unique in the cluster.
  • Pods can communicate directly with each other and with nodes in the cluster, without the source or destination IP address getting changed.
  • The IP address pods see as their address (i.e. when querying their own IP address from the operating system) is the same address other pods see their network packets coming from.

This leaves lots of room for the various pod networking implementations. As long as the above three requirements are met, pod networking is expected to work.

Milpa schedules Kubernetes pods to run on right-sized cloud instances. These instances only run one pod — once the pod is terminated, the cloud instance is also removed. This is called a cell in Milpa terminology.

Each cell gets two IP addresses: one for management communication (between Milpa and a small agent running on the instance), and one for the pod. Unless the pod has hostNetwork enabled, a new network namespace is created for the pod with the second IP. Both IP addresses come from the VPC address space — fortunately, even the tiniest instances are allowed to allocate at least two IP addresses. Even though there is always one and only one pod running on an instance, this design ensures that the pod can’t interfere with Milpa management communications.

For example, I created a cluster with one node and one master, in a VPC with the IPv4 address space of 10.0.0.0/16:

$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-22-242.ec2.internal Ready <none> 37m v1.17.0
ip-10-0-30-187.ec2.internal Ready master 38m v1.17.0

Starting a pod:

$ kubectl create deployment busybox --image=busybox
deployment.apps/busybox created

Milpa will create a cell to run the pod:

$ kubectl get cells
NAME POD NAME POD NAMESPACE NODE LAUNCH TYPE INSTANCE TYPE INSTANCE ID IP
81c95bba-4607-4590-9ae6-533d7e78e1f6 busybox-7d657df9dc-6kkgr default ip-10-0-22-242.ec2.internal On-Demand t3.nano i-09a672e3c0cbc5b36 10.0.29.174

Here, 10.0.29.174 is the IP address of the cell. The pod running on the cell has a different IP address:

$ kubectl get pod busybox-7d657df9dc-6kkgr -ojsonpath='{.status.podIP}'
10.0.29.15

Interoperability between Milpa Pods and Regular Pods

As discussed above, Milpa pods get their IP addresses from the VPC address space.

As for the regular container runtime (running DaemonSets, stateful pods, etc), for interoperability with Milpa pods the only requirements are:

  • The network plugin used by the container runtime is able to use the PodCIDR allocated to the kubelet node via the controller-manager (most CNI plugins can do this).
  • The cloud controller is configured to create cloud routes with the PodCIDR allocated to nodes. This way packets from cloud instances (running Milpa pods) will be routed by the VPC network fabric to the right Kubernetes node.
Pod networking between Milpa pods and regular pods.

When these two requirements are met, the VPC takes care of routing the PodCIDRs allocated to Kubernetes nodes to the right node. As long as the regular container runtime runs a network policy agent, network policies are enforced both for Milpa pods and regular pods.

Since NodePorts are managed by the service proxy running on Kubernetes nodes, they also work seamlessly. The backend pod IP for Milpa pods will be a VPC IP address. Iptables rules for HostPort mappings are created and maintained by Kiyot.

Services and Network Policies

The cloud instances hosting pods created via Milpa also run a combined service proxy and network policy agent, which is a kube-router process. Routing is disabled in kube-router, since the cloud routes take care of routing to the CIDRs used for regular pods on the nodes; only service proxying and firewalling are enabled.

This way, pods can reach Kubernetes services either via IPVS or iptables NAT rules, created by kube-router, and network policies are also enforced (via iptables filtering rules, also maintained by kube-router). Essentially, this is the same setup as on Kubernetes nodes. Kube-router has a very low memory and CPU footprint, thus even small cloud instances can dedicate almost all their resources to the application(s) running in their pod.

Conclusion

This is the third and final part in our series of posts on Kubernetes pod networking on AWS, explaining how our Kubernetes container runtime, Milpa, integrates into a cluster and how network interoperability is ensured.

Please also check out the first two posts if you haven’t done so:

--

--