Are Kubernetes CNI solutions ready for >10 GBit/s?

Published in

omi-uulm

4 min readJun 2, 2021

Since CoreOS first suggested [1] a draft on a Container Network Interface, the amount of CNI-enabled container runtimes and CNI implementations has grown significantly. Basic and sophisticated network functionality is a core demand and integral part of the Kubernetes ecosystem. Features provided by CNI range from providing pod-to-pod connectivity across hosts, over port forwarding, to complex network policies. Thus, the multitude of features provided by CNI complete this ecosystem to the point it is today.

Relying on those features, the following questions regarding CNI arise:

Are CNI implementations able to saturate 50 and 100 Gigabit network interface cards?
Which performance implications arise for different implementation approaches?
How to decide on a provider based on requirements towards features and performance

On our Kubernetes adoption path, we initially settled on k3s [2] and its integrated CNI provider Flannel [3]. Our main usage scenarios center around Bare-Metal clusters with 50 or 100 Gigabit network cards. In these scenarios, the underlying network performance starts to become an issue.

This article is a summary of our early investigations of throughput and latency capabilities of our infrastructure. Particularly, the cluster used for that purpose is set up as follows:

2 Servers: NEC Express5800/E120f-M withIntel(R) Xeon(R) CPU E5–2630 v3 @ 2.40GHz, 256GB DDR4 RAM
2 Network interface cards: Mellanox MCX515A (50Gbit/s)
Operating System: Fedora CoreOS 33.20210104.3.0 with kernel 5.9.16
Switch: Mellanox/Nvidia MLX SN2100

Methodology

Each performance test is performed over 10 minutes by using the network performance testing tool netperf[4]. Netperf allows, amongst others, the measurement of bandwidth and latency.

Each measurement process of spawning and setting up containers, collecting and processing metrics until visualization is specified in an Argo Workflow [5]. This enables each process to be easily repeated and reproduced.

Results

To put a measurement between two pods and two hosts into perspective, we first perform some baseline tests. One being a pod-to-pod cross host test without network virtualization using a single stream. Since a single stream could not utilize the full bandwidth capacity, we parallelized this test to handle 3 simultaneous streams. Upon determination and verification of a baseline performance, further measurements for pod-to-pod performance with network virtualization done by Fannel can be performed. Flannel can be configured to utilize different approaches for network virtualization. Here the default configuration (VXLAN) is applied. Figure 1 illustrates this setup.

In this figure, the scenarios are color coded as follows: blue is single stream pod-to-pod cross host communication, whereas the difference in the orange scenario is the usage of 3 parallel streams. Finally the red scenario enables Flannel network virtualization using a single stream.

Preliminary results are shown in figure 2:

Figure 2: Preliminary results of througput, latency and cpu utilization measurements

These results yield multiple insights: (i) One TCP connection cannot saturate a 50 Gbit link with this specific CPU, whereas a second run with (ii) 3 parallel connections is able to do so. While Flannel’s single performance (iii) is easily sufficient for 1Gbit/s networks, we were not able to achieve more than ~13Gbit/s in this scenario. As we can see, latency and especially its standard deviation is significantly higher than that for scenarios without any networkwork virtualization. Also the time this process can spend in user or system cpu time is much lower, which directly influences the amount of softirq leading to an overall lower transfer rate.

Conclusion

The hardware/software combination we are currently running is capable of handling single connections with 10Gbit/s, but is certainly quickly limited afterwards. However, we cannot answer the introductory question whether CNI is ready for >10Gbit/s yet. Thus further investigations are necessary.

Outlook

Since this measurement process is only a very preliminary, a lot of further measurements and performance tunings will need to be done. Regarding higher performance eBPF [6] based solutions seem to be very promising, since they are able to avoid parts of the classic netfilter stack.

These measurements opened up a lot of further questions regarding bottlenecks, strategies and possible performance tunings. As a consequence we currently pursue further research to eventually find answers to these questions. These insights can be helpful for deciding which CNI solution fits best for specific requirements.

References

[1] GitHub. 2021. containernetworking/cni. [online] Available at: <https://github.com/containernetworking/cni> [Accessed 21 May 2021].

[2] K3s.io. 2021. K3s: Lightweight Kubernetes. [online] Available at: <https://k3s.io/> [Accessed 21 May 2021].

[3] GitHub. 2021. flannel-io/flannel. [online] Available at: <https://github.com/flannel-io/flannel> [Accessed 21 May 2021].

[4] Hewlettpackard.github.io. 2021. The Netperf Homepage. [online] Available at: <https://hewlettpackard.github.io/netperf/> [Accessed 21 May 2021].

[5] Argoproj.github.io. 2021. Workflows & Pipelines | Argo. [online] Available at: <https://argoproj.github.io/projects/argo/> [Accessed 21 May 2021].

[6] Ebpf.io. 2021. eBPF — Introduction, Tutorials & Community Resources. [online] Available at: <https://ebpf.io/> [Accessed 31 May 2021].