Don’t Load Balance GRPC or HTTP2 Using Kubernetes Service

6 min readMar 3, 2024

Load balancing for GRPC and HTTP2 does not work out of the box on L4 proxies. But there are multiple ways to improve it.

Introduction

Both GRPC and HTTP2 are great protocols. With persistent tcp connection and multiplexing, HTTP2 helps improve the client side concurrency and performance significantly. GRPC runs on top of HTTP2 with a clean service definition and great tooling to generate the code.

But when dealing with service to service connection for GRPC and HTTP2, the load balancing becomes trickier. I started to write this post is because I found usually the load balancing for GRPC and HTTP2 is not what you expected and sometimes does not work out of the box.

The problem with kubernetes service

A common scenario is kubernetes, we have a GRPC client service needs to call another GRPC server service in the same namespace. We will naturally use the kubernetes service like grpc-server.namespace.svc.cluster.local to address the server, right?

But this will not load balance the requests. Please look at the diagram below

Now, the kubernetes service will resolve to a single cluster IP 10.0.0.20as the proxy for the upstream server pods. And the client can only see 10.0.0.20 as the upstream.

For GRPC and HTTP2, the client will only try to establish 1 single tcp connection to 10.0.0.20. The kubernetes service, being a tcp/udp passthrough proxy, will randomly forward the tcp connection to any of the server pod.

But no matter how many server pods we have, 1 client service pod will only create 1 tcp connection to 1 server pod.

This is no load balance at all! If you have more server pods than client pods, there will always be some pods become useless!

The root cause

This is simply because kubernetes service is a L4 (tcp/udp) proxy. All it sees is a single tcp connection. It doesn’t have any idea about how many GRPC/HTTP2 connections are running on top of it.

This actually applies to all other L4 proxies. If you use any L4 proxy at any hop in your GRPC/HTTP2 request path, you will expect a poor load balancing behavior, especially when the number of clients is small.

OK, now lets check for some better solutions for this issue. The official grpc blog already gives comparison between different implementations.

Now we will talk about implementations more specific to kubernetes.

Solution 1: headless service with client side load balancing

We can use the headless kubernetes service where there kubernetes no longer generates a cluster ip. Instead the kubernetes just resolves to all the server pod IP addresses.

In this way, the client service knows about all servers and can implement a load balancing strategy in the code (like round robin). For example

func main() {
  conn, err := grpc.Dial(
    "dns:///grpc-server.namespace.svc.cluster.local:9090", 
    grpc.WithInsecure(),
    grpc.WithBalancerName(roundrobin.Name),
  )
}

In the connection string we need to use the dns scheme because the default is passthrough scheme which just lets the connection itself to resolve dns and doesn’t periodically re-resolve the DNS and updates client connections.

This works well because kubernetes has great service discovery mechanism built in. For example, if one of the server pod is not healthy and failing the readiness probe, the IP will be removed from the DNS. But keep in mind, this mechanism is not available out of the box outside kubernetes.

But it also heavily relies on the client implementation and it can become tricky. For example, the DNS shouldn’t be cached anywhere so that we can get the latest server ip updates.

Solution 2: use ingress nginx

Another way is simply using ingress to address the grpc server.

The good thing is by default ingress nginx doesn’t use kubernetes service as upstream. Instead the controller always subscribe to the ready enpdoints of the kubernetes service and updates the nginx upstream configuration when there are changes. Since the nginx is a L7 proxy and already supports GRPC/HTTP2 since 2018, it load balances the GRPC/HTTP2 requests properly.

You just need to add an annotation to your ingress to tell ingress nginx to use grpc_pass instead of proxy_pass . For example,

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/backend-protocol: "GRPC"

But there are several caveats though.

First of all, ingress nginx doesn’t support GRPC over plaintext because of the limitation that nginx doesn’t support running h2c and http 1 on the same port. But it seems like this is resolved by nginx in 1.25. And ingress-nginx just released a version that is based on nginx 1.25 so this limitation might be already gone.

Secondly, by default, ingress nginx enables reuse-port nginx option. This is an option that nginx introduces to improve the performance. But it will make the request not perfectly balanced.

When reuse-port is enabled, each nginx worker process to establish tcp connection to the upstream individually. For each worker itself perspective nginx still use round robin so it’s still balanced. But if we look at it as a whole, it makes it a bit worse. For example, if the requests are perfectly balanced across workers, e.g. worker 1 takes request 1, worker 2 takes request 2 etc, you will find that request 1 and 2 both go to server 1, then 3 and 4 to server 2 etc. If there are a lot of nginx workers then all the requests will go to server 1 first before goes to server 2.

While most the time reuse-port is ok and should be enabled to help the performance, this is something also to be aware of.

reference: https://github.com/kubernetes/ingress-nginx/issues/4054

Solution 3: use service mesh

Another way is to use a service mesh like istio. Service mesh works like nginx because it is also a L7 proxy. But instead of a centralized entrypoint, service mesh deploys the proxy as a sidecar container along with every application container.

While service mesh is a pretty good idea, especially in the VM era, things like consul helps a lot in automated service discovery. But in kubernetes, service discovery is already builtin, so it seems like service mesh somewhat overlaps with the bulitin functionality.

While service mesh provides a lot other great functionalities, like tracing, circuit breaker etc, there are several caveats.

Service mesh is not a trivial thing to manage, and not as stable as kubernetes itself. For example, the sidecar requires a webhook to be injected, if somehow kubelet cannot call the webhook (e.g. some network failures) or webhook fails, the pod cannot be created.
Service mesh still doesn’t work properly in pod termination (e.g. the sidecar needs to be terminated after the application container terminates), and thus it doesn’t work properly with kubernetes jobs. This potentially will become better after kubernetes offcially introduces sidecar support.
Service mesh has more overhead and is more costly. Because we need to inject the sidecar to every pod, and we need to request some cpu/memory resource for each sidecar, it adds up to much more resources than a single ingress nginx deployment. This potentially can be mitigated with eBPF and sidecarless mesh.

Summary

To load balance GRPC/HTTP2, it’s quite tricky but there are several ways.

Still use a L4 proxy: yes you can still use L4 proxy as long as the number clients are much more than the servers, e.g. when we use a public tcp proxy directly facing users.
Use ingress nginx: additional hop, still not perfectly balanced, but nginx is solid, and you get free request logs
Use kubernetes headless service: this requires changes to client and might be tricky to deal with DNS.
Use service mesh: a major overhead on the whole cluster and has certain limitations currently

Ironically, while we discussed several solutions, in our case we still prefer still using kubernetes service or use ingress nginx based on different scenarios.