How three lines of configuration solved our gRPC scaling issues in Kubernetes

Published in

Jamf Engineering

5 min readMar 23, 2023

It all started with a question I asked our senior software engineer:
“Forget the speed of communication. Is it really better for you to develop communication in gRPC instead of REST?”
The answer I didn’t want to get came immediately: “Absolutely yes.”

Before I asked this question, I was monitoring a strange behavior of our service during a rolling update and mostly when scaling pods up. Most of our microservices have historically communicated via REST calls without any issues. We have migrated some of these integrations to gRPC mostly because of the overhead of REST we wanted to get rid of. Lately, we have observed several issues that pointed in the same direction — our gRPC communication. Of course, we followed suggested practices for running gRPC without service mesh in Kubernetes like those described in this blog post and used a headless service object on the server, and deployed client “round-robin” load balancing with DNS discovery in gRPC, etc.

Scaling pods count

Kubernetes internal load balancers do not balance RPCs but TCP connections. You can find more information on how Kubernetes balances TCP connections in my other blog post.
Level 4 load balancers are common due to their simplicity because they are protocol agnostic. However, gRPC breaks connection-level load balancing provided by Kubernetes. This is because gRPC is built on HTTP/2, and HTTP/2 is designed to maintain a long-living TCP connection where all requests can be active on the same connection at any point in time. It reduces the overhead of connection management. However, in this case, connection-level balancing isn’t very useful because once the connection is established, there’s no more balancing to be done. All requests get pinned to original destinations pods, as shown below until a new DNS discovery happens (with headless service). And it won’t happen until at least one of the existing connections breaks.

Example of the problem:

2 clients (A) call 2 servers (B).
Autoscaler steps in and scales up clients.
Server pods are overloaded so autoscaler steps in and scale up the server pod count but no load balancing happens. We can even see no incoming traffic on the new pods.
Clients are scaled down.
Clients are scaled up again, but the load is still not balanced evenly.
One server pod crashed due to overload — rediscovery happens.
Not shown in the picture, but when the pod comes back, it looks similar to picture 3, i.e. the new pod doesn’t receive traffic.

2 lines of configuration solve this. Ehm, technically one line

As I mentioned before, we use “Client-side load balancing” with DNS discovery using a headless service object. Other options might be the use of Proxy load balancing or implementing another discovery method that will ask Kubernetes API instead of DNS.

Apart from that, gRPC documentation provides Server-side Connection Management proposal and we gave it a try.

Here are my suggestions for setting up the following server parameters with an example of a Go code snippet for gRPC initialization:

MAX_CONNECTION_AGE to 30 seconds. This time period is long enough to have low-latency communication without an expensive and frequent connection establishment process. Also, it allows services to react relatively quickly to the existence of new pods so the traffic distribution will be balanced.
MAX_CONNECTION_AGE_GRACE to 10 seconds. Defines the maximum time for the connection to be kept alive for outstanding RPCs to complete.

  grpc.KeepaliveParams(keepalive.ServerParameters{
      MaxConnectionAge:      time.Second * 30,  // THIS one does the trick
      MaxConnectionAgeGrace: time.Second * 10,
  })

How it behaves in the real world:

Number of pods before and after application of gRPC configuration change

Network I/O activity observed in the new pod after gRPC configuration change

Here comes the third line

Scaling had been resolved, but another issue became more visible. The focus changed to gRPC code=UNAVAILABLE during rolling updates on the client side. Curiously, this has been only observed during a rolling update but not during a single pod scaling event.

Number of gRPC errors during a rolling update

The procedure during deployment rolling is simple: a new replicaset is created, creates a new pod, and when the pod is ready the old pod gets terminated from the old replicaset, and so on. The time between the starts of each pod was 15 seconds. What we know about gRPC DNS rediscovery is that it starts only if the old connection breaks or ends with GOAWAY signal. So clients started a new rediscovery every 15 seconds but got obsolete DNS records. Then they repeatedly did the rediscovery until it was successful.

It’s always DNS . . . except when it’s not

DNS TTL cache is almost everywhere. Infrastructure DNS has its own cache. Java clients suffered from their default 30s TTL cache more than Go clients which usually have no DNS cache implemented. Go clients also reported just a small number of occurrences of this issue on the contrary Java clients reported hundreds or thousands. Of course, we can shorten the TTL cache but why when it affects just gRPC during rolling updates?

Luckily, it has an easy-to-implement workaround. Or better to say solution: let’s set up 30 seconds delay while the new pod starts

.spec.minReadySeconds = 30

The Kubernetes deployment specification allows us to set a minimum amount of time that a new pod must be in the ready state before it starts terminating the old pod. After this time, the connection is terminated, gRPC clients get the GOAWAY signal and start rediscovery. The TTL has already expired, so the client gets new, up-to-date records.

Conclusion

gRPC is like a swiss knife in terms of configuration and may not fit your infrastructure or application by default. Go through documentation, tune it up, experiment, and get the most from what you already have. I believe reliable and resilient communication should be your end goal.

I also suggest looking at:

Keepalives. It doesn’t make sense for short-living internal cluster connections but it could be handy in some other cases.
Retries. Sometimes it is worth retrying first with some backoff instead of overloading infra by attempting to create new connections.
Code mapping. Map your gRPC response code to well-known HTTP codes to better understand what is going on.
Loadbalancing. Balance is the key. Don’t forget to set up back off and do thorough testing.
Server access logs (gRPC code=OK) may be too verbose because are set to info level by default. Consider lowering them to debug level and filtering them out.

How three lines of configuration solved our gRPC scaling issues in Kubernetes

Scaling pods count

2 lines of configuration solve this. Ehm, technically one line

Here comes the third line

Conclusion

Written by Jiri Luska