LoadBalancing gRPC for Kubernetes Cluster Services
Sample application demonstrating RoundRobin gRPC loadbalancing on Kubernetes for internal cluster services (eg, type: ClusterIP and headless services type: None)
gRPC can stream N messages over one connection. When k8s services are involved, a single connection to the destination will terminate at one pod. If N messages are sent from the client, all N messages will get handled by that pod resulting in imbalanced load.
One way to address this issue is insteaad define the remote service as Headless and then use gRPC’s client side loadbalancing constructs.
In this mode, k8s service does not return the single destination for Service but instead multiple destination addresses back to a lookup request.
Given the set of ip addresses, the grpc client will _itself_ send each rpc to different pods and evenly distribute load. To emphasize, gRPC clients have buit in capability to do this LB…you just have to specify or override the scheme!
You can find the source here:
gRPC client/server running loadbalanced/failover on Google Compute Engine and Google App Engine - salrashid123/gcegrpcgithub.com
In the repo above either create the image in
or use the one i’ve setup
The image provides two endpoints: one using a ‘normal’ Cluster service and one “Headless” service.
/backend: make 10 gRPC requests over one connection via 'ClusterIP` k8s Service. The expected response is all from one node
/backendlb: make 10 gRPC requests over one connection via k8s Headless Service. The expected response is from different nodes
Create a cluster
gcloud container clusters create cluster-grpc \
--zone us-central1-a \
kubectl apply -f be-deployment.yaml \
-f be-srv-lb.yaml \
-f be-srv.yaml \
-f fe-deployment.yaml \
Wait ~5mins till the Network Loadblancer IP is assigned
Connect via k8s Service
Normally, you connect to a k8s service using DNS SRV or directly via provided serviceName. In the example below, its be-srv.default.svc.cluster.local
k8s will return a single IP representing the backend. A grpc client that connects to that IP will terminate the connection at ONE pod. So if the client sends 10 requests over that connection, they will all terminate at the same node.
The following shows one connection and the responses that come back all from one host.
Note: all the responses are from one node
Connect via k8s Headless Service
Now connect and test with a “Headless” Service which returns a set of PodIPs to connect to directly.
Given the set, gRPC client will automatically keep track of the destination IPs and periodically refresh the list (so to keep aware of pod health)
Now if you invoke the /backendlb endpoint, you’ll see the responses from different pods!! :)
Note: responses are distributed evenly.
Some other References you maybe interested in:
There are several other modes for grpc on GKE and GCP. Below find some additional blog posts in this series