LoadBalancing gRPC for Kubernetes Cluster Services

Sample application demonstrating RoundRobin gRPC loadbalancing on Kubernetes for internal cluster services (eg, type: ClusterIP and headless services type: None)

gRPC can stream N messages over one connection. When k8s services are involved, a single connection to the destination will terminate at one pod. If N messages are sent from the client, all N messages will get handled by that pod resulting in imbalanced load.

One way to address this issue is insteaad define the remote service as Headless and then use gRPC’s client side loadbalancing constructs.

In this mode, k8s service does not return the single destination for Service but instead multiple destination addresses back to a lookup request.

Given the set of ip addresses, the grpc client will _itself_ send each rpc to different pods and evenly distribute load. To emphasize, gRPC clients have buit in capability to do this LB…you just have to specify or override the scheme!

You can find the source here:

Setup Application

In the repo above either create the image in cd ~app/http_frontend

or use the one i’ve setup docker.io/salrashid123/http_frontend

The image provides two endpoints: one using a ‘normal’ Cluster service and one “Headless” service.

  • /backend: make 10 gRPC requests over one connection via 'ClusterIP` k8s Service. The expected response is all from one node
  • /backendlb: make 10 gRPC requests over one connection via k8s Headless Service. The expected response is from different nodes

Create a cluster

gcloud container  clusters create cluster-grpc \
--zone us-central1-a \
--num-nodes 4


kubectl apply -f be-deployment.yaml  \
-f be-srv-lb.yaml \
-f be-srv.yaml \
-f fe-deployment.yaml \
-f fe-srv.yaml

Wait ~5mins till the Network Loadblancer IP is assigned

Connect via k8s Service

Normally, you connect to a k8s service using DNS SRV or directly via provided serviceName. In the example below, its be-srv.default.svc.cluster.local

type: ClusterIP

k8s will return a single IP representing the backend. A grpc client that connects to that IP will terminate the connection at ONE pod. So if the client sends 10 requests over that connection, they will all terminate at the same node.

The following shows one connection and the responses that come back all from one host.

Note: all the responses are from one node

Connect via k8s Headless Service

Now connect and test with a “Headless” Service which returns a set of PodIPs to connect to directly.

Given the set, gRPC client will automatically keep track of the destination IPs and periodically refresh the list (so to keep aware of pod health)

Now if you invoke the /backendlb endpoint, you’ll see the responses from different pods!! :)

Note: responses are distributed evenly.


Some other References you maybe interested in:

There are several other modes for grpc on GKE and GCP. Below find some additional blog posts in this series