Exposing TFTP Server as Kubernetes Service — Part 2

Darpan Malhotra
9 min readMay 26, 2022

--

In Part 1 of this series, we deployed a TFTP server pod. We have to access it over the network using its IP address. Generally speaking, pod is ephemeral, its IP address will almost certainly change when it is deployed again by Deployment controller. So, Kubernetes recommends using ClusterIP service to expose replica of pods in a deployment object. The ClusterIP (and its DNS name) will be fixed, even if the backend pods come and go. That way, the network service offered by application running as a container can be accessed using a fixed IP.
Another benefit of using ClusterIP is that, the traffic gets load-balanced across multiple replicas of backend pods.

But how does Kubernetes implement ClusterIP service? The answer is, with the power of kube-proxy.

kube-proxy runs as a DaemonSet on Kubernetes cluster i.e. a pod on every node. To know the mode in which kube-proxy is operating, let us check configuration of kube-proxy:

# kubectl describe cm -n kube-system kube-proxy | grep mode
mode: “”

The value is blank. Kubernetes documentation says:

In Linux platform, if proxy mode is blank, use the best-available proxy (currently iptables, but may change in the future).

So, kube-proxy is running in the iptables mode. We can also check this by querying HTTP endpoint of kube-proxy. Run the following command on any node:

# curl http://localhost:10249/proxyMode
iptables

In this article, we will expose TFTP server pod as ClusterIP Service and see how the traffic from client pod reaches the server pod. Let us now expose TFTP server pod as ClusterIP service.

Apply this manifest.

# kubectl apply -f tftp-server-service.yaml 
service/tftp-server created

This will create ClusterIP service and Endpoint object.

The philosophy is, client pod running on node learn-k8s-3 should connect to ClusterIP (10.96.62.142) and the packets need to reach TFTP server pod running on node learn-k8s-2. But the question is, how is this achieved in practice? And the answer is, by the magic of :

  • kube-proxy (iptables)
  • Calico CNI (route tables)

Let us understand the role of each of them in achieving the client to server communication (i.e. pod-to-service connectivity).

  1. kube-proxy (iptables)

[ Note : As a prerequisite, this section expects reader to be familiar with iptables ]

We begin with exploring iptables rules on node learn-k8s-3. Kubernetes makes use of filter and nat tables. Let us quickly go through the journey of client pod’s outgoing packet as it passes through iptables.

A. As the client pod is generating an outgoing packet, let us explore the OUTPUT chain of nat table.

Every outgoing packet is matching KUBE-SERVICES target.

B. KUBE-SERVICES is the top level collection of all Kubernetes services (all Cluster IP services + one chain of NodePort services)

Client has generated UDP packet destined to 10.96.62.142:69. There is a matching rule for this packet and hence it will be jumped to KUBE-SVC-HJOS6SHZL66STTLG chain.

C. Every KUBE-SVC-* has a collection of relevant Service Endpoints (KUBE-SEP-*). This is where actual load-balancing to KUBE-SEP-* chains happens.
As we have only one pod (i.e. only one Service Endpoint), so KUBE-SVC-HJOS6SHZL66STTLG chain has only one KUBE-SEP-* entry.

As client pod has source IP belonging to 192.168.0.0/16, it will match the second rule i.e. KUBE-SEP-XMDC2IVYMAURXD3K

D. Each KUBE-SEP-* represents the actual Service Endpoint ( # kubectl get ep ). This is where DNAT happens.

Here, the outgoing packet is DNATed to 192.168.29.67:69.

Overall, kube-proxy adds rules to iptables to DNAT the traffic for ClusterIP:Port (10.96.62.142:69) to PodIP:TargetPort (192.168.29.67:69).

2. Calico CNI (route tables)

In the previous section, we learnt that the packet has been DNATed from ClusterIP to actual PodIP. Now, it is job of route table on the node to route the packet to target pod (192.168.29.67) which is running on learn-k8s-2 (10.10.100.208).

First, let us explore route tables on nodes learn-k8s-3 (node running the client pod).

Please note the highlighted entry. It means, if destination is 192.168.29.67 (which matches the rule — Destination: 192.168.29.64, Mask: 255.255.255.192), the gateway (i.e. the next hop) is 10.10.100.208 and tunl0 interface needs to be used. Here, 10.10.100.208 is IP address of node learn-k8s-2, where the server pod is running.

Next, route tables on nodes learn-k8s-2 (node running the server pod).

Please note the highlighted entry. It means, if destination is 192.168.29.67, traffic will directly go to calic0bf1043683.
But, how is routing table on each node getting updated? Well, it is Calico’s node agent (component called Felix) who is programming the routes. We discussed in Section 2 of Part 1 that calico-node is deployed as a DaemonSet object in the Kubernetes cluster.

Overall, when the packet from client pod exits the node learn-k8s-3, it needs to be forwarded to 10.10.100.208 (where TFTP server pod with IP address 192.168.29.67 is running). This pod-to-pod communication across nodes happens via IP-in-IP encapsulation implemented via tunl0 interface. Client’s outgoing packet will be undergo IP-in-IP encapsulation with source IP of 10.10.100.209 and destination IP of 10.10.100.208.

This completes the discussion on how client pod communicates with ClusterIP service whose backend is TFTP server pod.

We are discussing Kubernetes networking and being network engineers, we ought to capture packets flowing between client and server pods. The question is, where to capture packets? In Part 1, we have seen all interfaces on the nodes. Based on that, few possibilities are:
A. Interface of client and server pods (eth0)
B. Other side of veth pair (cali*)
C. Tunnel interface of nodes (tunl0)
D. Physical ethernet interface of nodes (ens160)

Let us start with option A. We had installed tftp client in the client container in Part 1. Now, we will exec into the client container and install tshark — a network protocol analyzer.

Duh !! Capturing packets is not permitted. Why? Well, the container is not privileged enough. The solution is to add NET_ADMIN capability to the container. The modified manifest file for client deployment looks like:

Same modification has to be applied to server’s manifest and both client and server pods need to be redeployed with updated manifest. This is too much work. So, let us evaluate Option B. There are two interesting things about veth pair:

  • The other end of veth pair (cali* interface) exists in root namespace, where packet capturing is easy.
  • By definition of veth pair, packets seen by the eth0 interface of container are immediately seen by cali* interface.
    So, let us drop option A and go for option B i.e. capture packets at cali* interface. We will exec into the client container and initiate a TFTP read request to ClusterIP of TFTP service. Simultaneously capture packets as per Option B,C and D.

That’s sweet ! File transfer completed successfully (2016 bytes as seen in Section 3 of Part 1). Exposing TFTP server as ClusterIP service works !

Let us first analyze packets on the client side:

A. Packets on client node (learn-k8s-3) at veth interface (califa5ffa0a589)

Observations:

  • Client (192.168.154.195) is connecting to ClusterIP service IP (10.96.62.142).
  • The response is received from TFTP server pod (192.168.29.67).
    This means, the outgoing traffic was DNATed.

B. Packets on client Node (learn-k8s-3) at tunnel interface (tunl0)

Observations:

  • The traffic is seen by tunnel interface.

This means, IP-in-IP encapsulation/decapsulation is happening here.

C. Packets on client Node (learn-k8s-3) at ethernet interface of node (ens160)

Due to the verbosity, only 2 frames are shown here. Later, we get summary of packets in the clientNode.pcap file.

Observations:

  • The request packet from client pod gets IP-in-IP encapsulated.
  • The encapsulated packet has destination IP of server node (10.10.100.208).

So, these packet captures reveal the journey of outgoing packet from client pod (192.168.154.195):

  1. Client pod generates the packet (eth0 interface available to pod).
  2. This packet is immediately moved to califa5ffa0a589 interface.
  3. Then, the packet moves to tunl0 interface which implements IP-in-IP encapsulation.
  4. Finally, the encapsulated packet leaves ens160 interface of the node with destination IP of server node (10.10.100.208).

Next, let us analyze packets on the server side.

A. Packets on server node (learn-k8s-2) at ethernet interface of node (ens160)

Due to the verbosity, only 2 frames are shown here. Later, we get summary of packets in the serverNode.pcap file.

Observations:

  • The ethernet interface (ens160) on server node has received packet from client node (10.10.100.209).
  • The packet has IP-in-IP encapsulation.

B. Packets on server Node (learn-k8s-2) at tunnel interface (tunl0)

Observations:

  • The traffic is seen by tunnel interface.

This means, IP-in-IP encapsulation/decapsulation is happening here.

C. Packets on server node (learn-k8s-2) at veth interface (calic0bf1043683)

Observations:

  • Decapsulated packet (i.e. original packet generated by client) is received by TFTP server pod.
  • The server pod generates response and it takes reverse path but goes through similar transformation.

So, these packet captures reveal the journey of incoming packet to server pod (192.168.29.67):

  1. The encapsulated packet arrives at ens160 interface of the node.
  2. As it is IP-in-IP packet, it moves to tunl0 interface which implements IP-in- IP decapsulation.
  3. The decapsulated packet is then delivered to calic0bf1043683 interface.
  4. Finally, the packet is available at eth0 interface of pod.

One important thing to note is that the server (192.168.29.67) sees real IP address of the client (192.168.154.195).

Please pay attention to the packets captured at ens160 interface of client and server nodes, which reveal how IP-in-IP encapsulation works in Calico.
You can see the Protocols in frame field says — eth:ip:ip:udp:tftp i.e payload of IP packet is another IP packet. There are 2 layers of Internet Protocol Version 4 in each packet, where outer IP packet (src:10.10.100.209, dst:10.10.100.208) encapsulates another IP packet (src:192.168.154.195, dst:192.168.29.67). That is how, TFTP client and server pods communicate while running on separate nodes, when server pod is backend of ClusterIP service. Following diagram shows the traffic path:

Pod-to-Pod communication across nodes

From these packet captures, we also learn few things about TFTP protocol:

  1. Client uses a random source port (41946) to connect to server on standard TFTP port (69).
  2. Server does not respond from the same port (69).
  3. Instead, server chooses a random port (38623) to send data packets.
  4. Client sends ACK for every data packet received (as this is UDP based, acknowledgement is implemented at application layer).
  5. As per calculations done in Section 3 of Part 1, there are 4 data packets.
  6. Overall 9 packets are exchanged: 1 RRQ + 4 Data packets + 4 ACK packets.

At this point, we have exposed TFTP server pod as ClusterIP service. But, in our on-prem infrastructure, the TFTP clients exist external to Kubernetes cluster. So, we need to expose the TFTP pod via NodePort and/or LoadBalancer . In the next article of this series, we will attempt to expose TFTP server pod as NodePort service.

--

--

Darpan Malhotra

4x AWS Certified including Advanced Networking — Speciality