Challenges of running gRPC services in production (part 2)
Almost a year ago, I wrote about the challenges of running gRPC services in production and how we handled them at Incognia (if you haven’t read it, I suggest you do, as this one will be based on it). Since then, the gRPC ecosystem has evolved quite a bit, ranging from new service discovery protocols to first-class support in proxies.
All this new tech allowed us to simplify the way we deploy gRPC services and how they communicate with each other while keeping everything reliable and high-performant (these are must-have features for Incognia, given that we handle requests from more than 100 million devices). The main topic of this post will be the evolution of our gRPC architecture over the last year.
A major issue when running gRPC in production is load balancing. This happens due to the persistent nature of connections (it uses HTTP/2). In the previous post, we showed how we were using Envoy together with external-dns (with AWS Route53) to make load balancing even across multiple server instances (in our case, Kubernetes pods).
While this approach worked nicely, it required some configuration in the server’s deployment to handle DNS cache TTL correctly and some additional security group configuration for communication between multiple Kubernetes clusters.
After some research, we found Contour, which is a kubernetes ingress controller that uses Envoy under the hood. We decided to test it, as we were already using Envoy directly and found no issues with it.
Basically, Contour listens for changes to Kubernetes Ingresses and updates the underlying envoy configuration accordingly, as the following diagram shows.
While deploying Contour is quite straightforward, the recommended configuration is running Contour as deployment and Envoy as daemonset (running one envoy instance per Kubernetes node). This proved to be inefficient for our workload, as we need to scale Envoy up and down as the load changes during the day, independently of the Kubernetes nodes.
For example: if we have a Kubernetes cluster with 5 nodes, we would have to run 5 Envoy instances if they are deployed as a daemonset. If we run Envoy as a deployment, we scale Envoy using HPA, enabling us to run fewer instances of Envoy when the request count is low and a higher number of pods when the request count is high.
gRPC Connection Configuration
To understand how we should configure gRPC connections on Envoy, let’s take a look at the whole flow, including DNS details.
We have 2 DNS entries: one for the gRPC server (
my-api.mydomain.com) and one for contour (
contour.mydomain.com points to the Envoy pods using a headless service, while
my-api.mydomain.com is simply a CNAME to
contour.mydomain.com. The flow is detailed in the following diagram.
In the first step of the diagram, a gRPC client calls a gRPC method using
my-api.mydomain.com as the DNS. It will resolve to one of the envoy’s pods, which will handle calls made through this connection. As the connection is persistent, a single pod would handle connections from this client, which is not ideal.
To avoid this, we set a maximum connection duration time between the client and envoy (currently this is set to 5 minutes, but it depends heavily on the use case). This forces the client to create a new connection to a new Envoy pod every now and then, allowing for better load balancing between the client and Envoy. Below are the configs used in contour.yaml.
In addition, for cases in which the number of client pods is low, we configure client-side load balancing between the client pod and envoy to avoid idle envoy pods. Below is an example of the configuration we use to create grpc-java’s
ManagedChannel channel =
channel.getState(true);//create connection on initialization
MyApiBlockingStub stub = MyApiGrpc.newBlockingStub(channel);
In the second step of the diagram, the envoy pod chooses a pod that can handle the request forwards the gRPC call to it. Envoy is able to know which pods can handle the request via the DNS provided by the client (
my-api.mydomain.com), which is the same as defined in the server’s ingress.
With this architecture, we simplified the approach we used previously, centralizing configuration (like gRPC connection timeout) in Contour/Envoy, so the clients become simpler while keeping the calls highly performant (in our tests, the envoy added about 0.3ms of overhead, which is negligible for most scenarios).