Fix a random network Connection Reset issue in Docker/Kubernetes
This article describes my recent experience to fix a random network “Connection Reset” issue in CI/CD pipelines running in Docker/Kubernetes when downloading binaries from an external server.
I’d like to share my experience as eventually I realized this is a very common use case — When a Container/Pod running in Docker/Kubernetes retrieves data from external services, the random connection reset problem could happen.
I have the Jenkins pipelines running in Kubernetes clusters. Each build task runs in a Pod acting as Jenkins worker . During the build, binaries are downloaded from Jfrog Artifactory server and the build outputs are put back to Artifactory for further usage. All servers are running in Cloud with private IP addresses. Diagram is as below. I described my Jenkins CI/CD pipeline architecture in this article.
The pipelines have been running OK for a few months. There arevery rare build failures due to “Connection Reset” when retrieving binaries from Artifactory. However recently the failures increased to the level that I have to put effort into resolving. And it cost some non-trivial efforts.
[Thread downloader_1] downloading /tmp/1581932211240-0/file1 as part of file https://artifact/artifactory/file threw an exception: java.io.IOException: Could not create or write to file: /tmp/1581932211240-0/file1
Caused by: java.net.SocketException: Connection reset