Fix a random network Connection Reset issue in Docker/Kubernetes

Liejun Tao
Mar 5, 2020 · 12 min read

This article describes my recent experience to fix a random network “Connection Reset” issue in CI/CD pipelines running in Docker/Kubernetes when downloading binaries from an external server.

I’d like to share my experience as eventually I realized this is a very common use case — When a Container/Pod running in Docker/Kubernetes retrieves data from external services, the random connection reset problem could happen.

I have the Jenkins pipelines running in Kubernetes clusters. Each build task runs in a Pod acting as Jenkins worker . During the build, binaries are downloaded from Jfrog Artifactory server and the build outputs are put back to Artifactory for further usage. All servers are running in Cloud with private IP addresses. Diagram is as below. I described my Jenkins CI/CD pipeline architecture in this article.

Pipeline diagram

The pipelines have been running OK for a few months. There arevery rare build failures due to “Connection Reset” when retrieving binaries from Artifactory. However recently the failures increased to the level that I have to put effort into resolving. And it cost some non-trivial efforts.

The Error

Without much thought I wrongly assumed this is a problem of the Artifactory server performance issue, because the failure happens randomly and we do have many concurrent builds. I spent many weeks tuning the server performance, described in this article(To be written).

I learned it’s the wrong direction after rounds of performance tuning, even though I believe the Artifactory performance is really improved, the random failures continue to happen.

Reproduce the problem

I come back to check what the changes in the pipelines are causing the rise of failure chance. It turned out to be a piece of code like this:

This is using the Jenkins Artifactory plugin to download some binaries listed in the .json file. The .json file has a content change that increased the total number of files to download from 10 to 30.

Looking further into how the plugin downloads the file, it turns out the plugin has an optimization to download a file larger than 5M with 3 concurrent threads, each getting 1/3 of the file. Source code is here.

To reproduce the random failure, I write a script to simulate the downloading behavior and run the script in Jenkins, shared here. The script has 2 levels of loop, OUTER_LOOP is to simulate build tasks, INNER_LOOP is to simulate the downloading of multiple files.

When downloading a single file, I simulate the concurrent 3 threads downloading behavior like this.

The script is effective to reproduce the problem. I can hit the random error with OUTER_LOOP=400 and INNER_LOOP=30 for a 200M file consistently. So this gives me a way to do further debug/fix in the production environment.

What is “Connection Reset” in Java

The Exception stack is as below.

The best matching code I found is from JetBrains/jdk8u_jdk
jdk8u_jdk/src/share/classes/java/net/SocketInputStream.java

int read(byte b[], int off, int length, int timeout) throws IOException {...        if (impl.isConnectionResetPending()) {
impl.setConnectionReset();
}
if (impl.isConnectionReset()) {
210 throw new SocketException("Connection reset");
}
}

So there are 2 states called “CONNECTION_RESET” and “CONNECTION_RESET_PENDING”. Either one will trigger the “SocketException”, where the states are set in
jdk8u_jdk/src/share/classes/java/net/AbstractPlainSocketImpl.java

So the states are set due to “ConnectionResetException” from native code, where the native implementation is:
jdk8u_jdk/src/solaris/native/java/net/SocketInputStream.c

How to understand these 2 native error code, by reference:

int ECONNRESET
“Connection reset by peer.” A network connection was closed for reasons outside the control of the local host, such as by the remote machine rebooting or an unrecoverable protocol violation.

I’m still not able to tell what the problem is, but this and this stackoverflow answers gave me a clue: there is an unexpected TCP RST packet causing the problem.

There are another two “connection reset by peer” exceptions which look similar and confusing. I found some refer to one exception mostly from “sun.nio” classes.

And another one from JDK in Windows only
jdk8u_jdk/src/windows/transport/socket/socket_md.c

51    { WSAECONNRESET,            "Connection reset by peer" },

Searching around

As my Jenkins is in Kubernetes environment, I search about “connection reset in Kubernetes” and found these 2 related articles:

The first article explains a case in Docker /Kubernetes environment, a race condition in DNAT/SNAT when multiple containers try to establish new connections to the same external address concurrently. The case could be detected by checking the connection tracking system. If the problem is there, the field “insert_failed” is non-zero.

... other cpus

Checking my system, I don’t see any non-zero “insert_failed” so this is not the root cause for me. In later checking I found the solution mentioned in the article is already applied to the Kubernetes cluster.

The second article explains a case in Kubernetes cluster, accessing a service served by ClusterIP gets a random “connection reset”. The root cause is the kube-proxy does “forward” INVALID packet from service to client and causes the client to send back RST packet as it could not recognize the INVALID packet. The solution is to drop out the INVALID packet explicitly, discussed here. And the solution went into Kubernetes fairly recently, in April 2019.

I initially thought this is the same as my problem, as I checked my cluster the problem is not yet fixed in version v1.13.12.

It’s lucky that I have another cluster running v1.15.5 to compare and found the fix is in.

So I set up to run the Jenkins test on both clusters and you know what, I can reproduce my problem in both clusters!!!

Getting to the solution

The second article actually helps me to find out my problem, although my problem is different scenario but similar root cause.

I started to look into the connection tracking system, using the conntrack tool. I didn’t use tcpdump/wireshark to do packet capture as that fills disk space quickly in my test…

While running the download testing script, I capture the event of the connection state change related with the artifactory server.

A successful download looks like this

[UPDATE] tcp      6 60 SYN_RECV src=10.0.5.97 dst=192.168.artifactory sport=44434 dport=443 src=192.168.artifactory dst=192.168.local sport=443 dport=1663[UPDATE] tcp      6 86400 ESTABLISHED src=10.0.5.97 dst=192.168.artifactory sport=44434 dport=443 src=192.168.artifactory dst=192.168.local sport=443 dport=1663 [ASSURED][UPDATE] tcp      6 120 FIN_WAIT src=10.0.5.97 dst=192.168.artifactory sport=44434 dport=443 src=192.168.artifactory dst=192.168.local sport=443 dport=1663 [ASSURED][UPDATE] tcp      6 300 CLOSE_WAIT src=10.0.5.97 dst=192.168.artifactory sport=44434 dport=443 src=192.168.artifactory dst=192.168.local sport=443 dport=1663 [ASSURED][UPDATE] tcp      6 30 LAST_ACK src=10.0.5.97 dst=192.168.artifactory sport=44434 dport=443 src=192.168.artifactory dst=192.168.local sport=443 dport=1663 [ASSURED][UPDATE] tcp      6 10 CLOSE src=10.0.5.97 dst=192.168.artifactory sport=44434 dport=443 src=192.168.artifactory dst=192.168.local sport=443 dport=1663 [ASSURED][DESTROY] tcp      6 src=10.0.5.97 dst=192.168.artifactory sport=44434 dport=443 src=192.168.artifactory dst=192.168.local sport=443 dport=1663 [ASSURED]

From this log we can see the solution mentioned in the first article “NF_NAT_RANGE_PROTO_RANDOM_FULLY” is effective, as in a single connection, the internal port is different from the external port.

The failed download looks like this

[UPDATE] tcp      6 60 SYN_RECV src=10.0.5.97 dst=192.168.artifactory sport=44436 dport=443 src=192.168.artifactory dst=192.168.local sport=443 dport=53650[UPDATE] tcp      6 86400 ESTABLISHED src=10.0.5.97 dst=192.168.artifactory sport=44436 dport=443 src=192.168.artifactory dst=192.168.local sport=443 dport=53650 [ASSURED][UPDATE] tcp      6 10 CLOSE src=10.0.5.97 dst=192.168.artifactory sport=44436 dport=443 src=192.168.artifactory dst=192.168.local sport=443 dport=53650 [ASSURED][DESTROY] tcp      6 src=10.0.5.97 dst=192.168.artifactory sport=44436 dport=443 src=192.168.artifactory dst=192.168.local sport=443 dport=53650 [ASSURED]

Compared with the successful case, the connection got a sudden CLOSE, so I guess a RST packet happened in the middle.

Inspired by the second article, I’d like to check if there are such INVALID packets. I’m using iptables firewall for all my Docker/Kubernetes hosts, which is introduced in this article- “Manage iptables firewall for Docker/Kubernetes

So I updated the firewall to add a log line

And while running the test script, I got such log lines when the failures happened

kernel: [1676292.509871] invalid: IN=eth0 OUT= MAC=xx:xx SRC=192.168.artifactory DST=192.168.local LEN=23648 TOS=0x00 PREC=0x00 TTL=64 ID=30914 DF PROTO=TCP SPT=443 DPT=42235 WINDOW=505 RES=0x00 ACK URGP=0

Looking into my iptables firewall rules, it’s like this

### Allow my own hosts
# Allow bastion host
-A FILTERS -s 192.168.xx.xx/32 -j ACCEPT
# Allow artifactory
-A FILTERS -s 192.168.artifactory/32 -j ACCEPT
# Allow all hosts in the cluster
-A FILTERS -s 192.168.xx.xx/32 -j ACCEPT
### End my own hosts
# Reject everything else
-A FILTERS -m limit --limit 5/min -j LOG --log-prefix "iptables_INPUT_denied: " --log-level 7
-A FILTERS -j REJECT

So when a packet is detected as INVALID by conntrack, it will be “ACCEPTED” by the firewall.

Looking into my use case, the client (curl) initializes a network connection inside the Pod to an external server (Artifactory), so the network traffic is a kind of “Pod-To-External”. The article two explains how it works:

Pod-to-external
For the traffic that goes from pod to external addresses, Kubernetes simply uses SNAT. What it does is replace the pod’s internal source IP:port with the host’s IP:port. When the return packet comes back to the host, it rewrites the pod’s IP:port as the destination and sends it back to the original pod. The whole process is transparent to the original pod, who doesn’t know the address translation at all.

I can draw a similar diagram as the ones in the article two. Below diagram illustrates the normal case and sudden closed case for pod-to-external traffic.

# Sudden close case
src=10.0.5.97 dst=192.168.artifactory sport=44436 dport=443 src=192.168.artifactory dst=192.168.local sport=443 dport=53650

So when the INVALID packet goes through the firewall, conntrack does not rewrite its destination address to the Pod’s IP address/port. Instead, the destination is the host’s IP address, with a port nobody is listening. A RST packet is sent back and causes the connection to be closed.

With the root cause cleared, the solution is simple for me: drop the INVALID packet.

The sequence is important. The DROP rule must be put before the “Allow artifactory” rule.

Looking into my iptables rules again, an “INVALID” packet from anywhere, even not from my artifactory server, will get “REJECTED” by the last line, which is a RST packet. So this problem is not only with downloading from my own artifactory server, but also it could happen with any external services, if there is INVALID packet.

What if there isn’t any complex rule in the iptables? It’s likely an INVALID packet will go through the INPUT chain and finally hit the default “ACCEPT” policy. A RST packet will be sent due to the wrong port.

So it seems no matter what firewall solution is used to protect the Docker/Kubernetes host, one equivalent rule as below is required.

I mentioned Docker

I could reproduce the problem easily by starting a Docker container then run the test script.

This is because the Docker container-to-external traffic works in the same way as Kubernetes pod-to-external traffic.

If the default iptables rule of the Docker host machine doesn’t have the rule to drop INVALID packet, you should have a chance to reproduce the issue.

As far as I tried, the issue is reproducible in below environments, with the release’s default iptables rules.

Docker-ce: 19.03.7# Tested containers
docker run -it --name bash ubuntu:14.04 bash
docker run -it --name bash ubuntu:18.04 bash

Fedora/CentOs has the INVALID drop line in its default iptables rules, so I’m not able to reproduce the problem. I’m not familiar enough with the practice of Fedora’s firewall. I would imagine if you have a group of your own servers and you may put mutual full trust rules among them, like “-s my_server -j ACCEPT”. I referred a command from here to add a trust rule. After that, this problem is reproducible.

Docker-ce: 19.03.7# Tested containers
docker run -it --name bash ubuntu:18.04 bash
# Issue not reproducible with default iptables rules
-A INPUT -m conntrack --ctstate INVALID -j DROP
# If I add a "trust rule", the problem is reproducible
firewall-cmd --zone=FedoraServer --add-rich-rule='rule family="ipv4" source address="192.168.artifactory" accept'

Summary

In this article I described my experience to resolve a random network Connection Reset issue when a Container/Pod in Docker/Kubernetes retrieves data from external service. The pod-to-external traffic goes through Linux’s netfilter framework where iptables rules are hooked. When a packet for such pod-to-external traffic is detected as INVALID (for whatever reason), it’s wrongly delivered and causes the connection drop. The solution is to add one iptables rule to drop such INVALID packets .

Iptables rules can be very different for different systems. To check if you have the problem in your system, check my test script and modify the file/size to be downloaded. If the problem is there, adapt the solution for the system.

Note: One thing I didn’t figure out is why the INVALID packet in my cloud environment. So maybe the problem is not for every network.

Note: I found this stackoverflow answer saying one iptables rule “-m state — state RELATED,ESTABLISHED -j ACCEPT” shall always be followed by rule “-m state — state INVALID -j DROP”. If I know this earlier I may never hit this problem.

Note: I did try to download a test file from a public network, e.g. the Azure speed test site, and didn’t reproduce the problem. My best guess is the cloud provider’s firewall has filtered the INVALID packets for me from the public Internet, while the network traffic inside the cloud provider’s private network does not go through such firewall.

Thanks for reading.

Reference:

DependenciesDownloaderHelper.java

JDK8 Source code

A reason for unexplained connection timeouts on Kubernetes/Docker

kube-proxy Subtleties: Debugging an Intermittent Connection Reset

The conntrack-tools user manual

Netfilter Kernel (Packet) Traversal

My test script

The Startup

Get smarter at building your thing. Join The Startup’s +789K followers.

Sign up for Top 10 Stories

By The Startup

Get smarter at building your thing. Subscribe to receive The Startup's top 10 most read stories — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Liejun Tao

Written by

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +789K followers.

Liejun Tao

Written by

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +789K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store