Debugging a Strange Kubernetes & Firebase Connection Reset Issue
A deep dive into Kubernetes, Firebase, and Cloud NAT networking
Background — ECONNRESET
At Leverege, we provide an IoT platform to build and manage large scale IoT applications. From asset tracking to remote monitoring, millions of sensors communicate with our platform every day. The Leverege IoT platform is made up of microservices running on Google Kubernetes Engine (GKE) and uses a combination of Firebase, TimescaleDB, and BigQuery for data processing. In particular, we use Firebase to store realtime and state information about our devices.
Our networking problem began with occasional connection resets and timeouts on our API server as our automotive use case began to scale significantly. At first, the connection issues only manifested during times of high traffic, so we suspected a performance issue on our servers. But even after allowing the servers to scale to more replicas and running more child processes to handle requests, ECONNRESETs continued.
When we ran a load test to multiply traffic by five times the production load, the resets occurred more frequently, and at a…