Exposing TFTP Server as Kubernetes Service — Part 7

Darpan Malhotra
7 min readMay 26, 2022

--

As all TFTP clients are external to Kubernetes, we used the NodePort service to expose TFTP server pod. The other way to expose the TFTP server to external clients is using service of type=LoadBalancer. Typically, cloud provider provisions a load-balancer and assigns it an IP address to be used by client to connect to. But, we are running this Kubernetes cluster on-prem and there is no cloud provider here ! So, even if we create a LoadBalancer service, it will remain in pending state. How do we implement LoadBalancer service in this cluster? Enter, MetalLB.

MetalLB provides implementation of load-balancer for on-prem Kubernetes clusters. In this article, we will expose TFTP Server pod as LoadBalancer service. TFTP traffic is UDP-based, so MetalLB will be a great option as it provides a Network Load Balancer implementation. This article is not a tutorial of MetalLB, so those details will be excluded. Please review the documentation for the concepts.

MetalLB has certain requirements. To fulfil all requirements, we need to address the following two items:
1. Address Allocation:
When running Kubernetes cluster in the cloud, it is cloud provider who creates a load balancer and assigns an IP address to it.
But, when MetalLB implements LoadBalancer service, it would need to be configured with a pool of IP addresses to be allocated to every LoadBalancer service. We will allocate MetalLB a small pool of addresses : 10.10.100.166–10.10.100.170.

2. Operating mode:
We will use L2 operating mode which uses ARP(IPv4) and NDP (IPv6) to announce the presence of IP address of load-balancer service to the devices outside the Kubernetes cluster. In this mode, one node of the cluster acts as leader node for the IP address of LoadBalancer service.

Let us prepare to install MetalLB in our cluster. v0.12.1 is the latest version as of writing this article. We will download the manifests:

# curl -OL https://raw.githubusercontent.com/metallb/metallb/v0.12.1/manifests/namespace.yaml
# curl -OL https://raw.githubusercontent.com/metallb/metallb/v0.12.1/manifests/metallb.yaml

These manifests include namespace (metallb-system), service accounts, roles, role bindings, deployment (controller) and daemonset (speaker). But, configuration file is missing. MetalLB accepts configuration from a ConfigMap object. Let us create a manifest for configuration of MetalLB:

We will deploy MetalLB to our cluster.

# kubectl apply -f namespace.yaml -f config.yaml -f metallb.yaml

Verify that controller and speaker pods are running.

Let us delete the existing NodePort service.

# kubectl delete svc tftp-server 
service “tftp-server” deleted

Then, we will modify the service manifest and set type: LoadBalancer.

Finally, create the service.

# kubectl apply -f dpe-app-service.yaml 
service/tftp-server created

We can see external IP is populated for the service object and it comes from the pool of addresses allocated to MetalLB. Simply wonderful !

Out of curiosity, I wanted to know which node is leader for this service? So, I inspected the logs of both speaker pods and found the following message on speaker-xmxgq:

{“caller”:”level.go:63",”event”:”serviceAnnounced”,”ips”:[“10.10.100.166”],”level”:”info”,”msg”:”service has IP, announcing”,”pool”:”default”,”protocol”:”layer2",”service”:”default/tftp-server”,”ts”:”2022–05–24T11:29:45.726576566Z”}

The pod speaker-xmxgq is running on learn-k8s-3, which means, it is the leader node. So, this node will respond to ARP requests for 10.10.100.166. Interestingly, same can be found by analyzing the events related to service.

It is clear that speaker-xmxgq running on node learn-k8s-3 is announcing load-balancer service IP — 10.10.100.166.

It is time to test the load-balancer service by connecting to 10.10.100.166 from external client (10.10.100.197).

# tftp 10.10.100.166
tftp> get dummy.txt
getting from 10.10.100.166:dummy.txt to dummy.txt [netascii]
Received 2016 bytes in 0.1 seconds [136153 bit/s]

That works !! Exposing TFTP server pod as a service of type=LoadBalancer also works. It looks like a magic, but there is no magic in software engineering. I did not stop here and started exploring on how this actually worked.

On the external client (10.10.100.197), I checked the ARP entries. Traditionally, we use arp command for this purpose:

The modern way is to use ip neighcommand:

So, the machine with MAC address 00:0c:29:6d:5d:f5 is responding to ARP request for IP address 10.10.100.166. As the speaker-xmxgq is leader for this IP address, we expect the node learn-k8s-3 to have an interface with this MAC address. In Section 4 of Part 1 of this series, we listed the interfaces on learn-k8s-3. And indeed, ens160 has the MAC address 00:0c:29:6d:5d:f5.
That shows how the traffic from external client reached the leader node for this LoadBalancer service. But, what happens after the traffic reached the leader node? The pod is running on learn-k8s-2. So how does the traffic reach the target pod?
Well, once the traffic has reached the leader node, kube-proxy takes over and sends the traffic to the pod backing that service. In our cluster, kube-proxy is running in iptables mode. So, let us analyze the journey of incoming packet on node learn-k8s-3 as it runs through various iptables rules.

A. Every incoming packet goes through PREROUTING chain. Kubernetes makes use of PREROUTING chain in nat table to implement its services.

Every incoming packet will match the rule to jump to KUBE-SERVICES chain.

B. KUBE-SERVICES chain is the top level collection of all kubernetes services.

Client has generated UDP packet destined to 10.10.100.166:69. The second rule matches and hence the packet will jump to KUBE-EXT-HJOS6SHZL66STTLG chain. By the way, the comment of that rule is self-explanatory. After jumping to KUBE-EXT-HJOS6SHZL66STTLG chain, the packet filtering rules are same as that seen with NodePort service (Part 2 of this series). But, we will go through them once again.

C. In the KUBE-EXT-HJOS6SHZL66STTLG chain, there are two rules and packet matches both rules.

As per first rule, packet is MARKed.

Then, the packet jumps to KUBE-SVC-HJOS6SHZL66STTLG chain.

D. Every KUBE-SVC-* has a collection of relevant Service Endpoints (KUBE-SEP-*). This is where actual load-balancing to KUBE-SEP-* chains happens.
As we have only one pod (i.e. only one Service Endpoint), so KUBE-SVC-HJOS6SHZL66STTLG chain has only one KUBE-SEP-* entry.

As the source IP of packet from client does not belong to 192.168.0.0/16, it will match the first rule, and also the second rule i.e. KUBE-SEP-XMDC2IVYMAURXD3K

E. Each KUBE-SEP-* represents the actual Service Endpoint (kubectl get ep). This is where DNAT happens.

While discussing about exposing TFTP server pod as NodePort service in the previous articles, we realized the dependency of kube-proxy on NAT. And, to NAT TFTP traffic, conntrack and NAT helpers are needed.
MetalLB brings the traffic directed to LoadBalancer service IP address to a leader node in the Kubernetes cluster and thereafter kube-proxy takes over. kube-proxy adds rules to iptables to DNAT the traffic for LoadBalancerIP:Port (10.10.100.208:69) to PodIP:TargetPort (192.168.29.67:69). So, even for the LoadBalancer service to work, there is dependency on NAT and hence, conntrack and NAT helper modules must again be needed. Let us just confirm the same. To do so, we will first remove the helper modules from all the nodes of the cluster.

# modprobe -v -r nf_nat_tftp
rmmod nf_nat_tftp
rmmod nf_conntrack_tftp

And then, test a file transfer from external client.

# tftp 10.10.100.166
tftp> get dummy.txt
Transfer timed out.

As expected, the transfer was unsuccessful after removing TFTP helper modules.

Add them back and test again.

# modprobe -v nf_conntrack_tftp
insmod /lib/modules/3.10.0–1062.1.2.el7.x86_64/kernel/net/netfilter/nf_conntrack_tftp.ko.xz
# modprobe -v nf_nat_tftp
insmod /lib/modules/3.10.0–1062.1.2.el7.x86_64/kernel/net/netfilter/nf_nat_tftp.ko.xz
# tftp 10.10.100.166
tftp> get dummy.txt
getting from 10.10.100.166:dummy.txt to dummy.txt [netascii]
Received 2016 bytes in 0.1 seconds [142601 bit/s]

TFTP file read operation from LoadBalancer service IP is successful again.

That concludes our discussion of exposing TFTP server pod as Kubernetes service when the cluster runs with kube-proxy in iptables mode and Calico CNI in use.

--

--

Darpan Malhotra

4x AWS Certified including Advanced Networking — Speciality