Experiences for IP Addresses Shortage on EKS Clusters
In a fast-growing company, like Compass, things may become challenging for Cloud infrastructure teams. As serving more and more customers, many backend services scaled up in our Kubernetes clusters. Meanwhile, a variety of new backend services went online to satisfy new requirements.
Recently, a big challenge for our Cloud Engineering team in Compass is the shortage of IP addresses in some Kubernetes clusters which are managed by AWS EKS. I would like to share our experiences of troubleshooting, investigations, exploring solutions and mitigating the issues.
Problem Found
The problem was noticed first when some teams reported some transient failures during deployments in the staging environment. The logs can tell the reason of deployment failed.
Warning FailedCreatePodSandBox 17m kubelet, ip-10-0-0-100.ec2.internal Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "a7c7ce835d262d7a3fd4ab94c66376e0266c03ba2fc39365cb108282f440b01a" network for pod "internal-backendserver-deployment-74f49769c5-nkdpn": networkPlugin cni failed to set up pod "internal-backendserver-deployment-74f49769c5-nkdpn_default" network: add cmd: failed to assign an IP address to container
Therefore, I started checking the VPC status of the Kubernetes cluster from AWS console and then found the available IP addresses for some private subnets reaching to 0.