Balancing gRPC Traffic in K8S Without a Service Mesh

Published in

The Startup

6 min readSep 21, 2020

One of the challenges some users (like me) face when trying to implement gRPC services into a Kubernetes cluster is to achieve a proper load balancing, and before diving into the way of balancing gRPC we first need to answer the question why do I need to balance the traffic if Kubernetes already does that job.

this article is focused on Kubernetes and Golang.

Why gRPC traffic is not properly balanced in Kubernetes?

The main reason why is difficult to balance the gRPC traffic is that people see gRPC as HTTP and here is where the problem begins, by design they are different, while HTTP creates and closes connections per request, gRPC operates over HTTP2 protocol that works over a long lived TCP connection making more difficult the balancing since multiple requests go through the same connection thanks to the multiplexing feature. However, this is not the only reason why balancing issues happen when configuring gRPC services in Kubernetes and these are some of the common mistakes:

Wrong gRPC client configuration
Wrong Kubernetes service configuration

Wrong gRPC client configuration

The common case when setting up a gRPC client is to choose the default configuration, which works perfectly for a 1–1 connection type, however, for a productive environment it does not work as we would like to. The reason behind this is because the default gRPC client offers the possibility to connect with a simple IP/DNS record which creates just one connection with the target service.

That’s why a different set up needs to be done for connecting with multiple servers, so we move the connection type from 1-1 to 1-N.

Default set up

func main(){
  conn, err := grpc.Dial("my-domain:50051", grpc.WithInsecure())
  if err != nil {
    log.Fatalf("error connecting with gRPC server: %v", err)
  }
  
  defer conn.Close()
  cli := test.NewTestServiceClient(conn)
  rs, err := cli.DoSomething(context.Background(), ...)
  .
  .
  .
}

New set up

func main() {
  address := fmt.Sprintf("%s:///%s", "dns", "my-domain:50051")
  conn, err := grpc.Dial(address,
  grpc.WithInsecure(),grpc.WithBalancerName(roundrobin.Name))
  if err != nil {
    log.Fatalf("did not connect: %v", err)
  }
  
  defer conn.Close()
  .
  .
  .
}

there are two major changes to take a look here:

the address: the final address parsed will look like dns:///my-domain:50051 and the reason why this format is used is that the Dial function allows us to use a target conformed by Scheme://Authority/Endpoint, in our case I am skipping the authority. So first I added dns as scheme because I want to resolve a domain and keep watching the changes over it, the resolver options are pass-through (default), dns, and manual, more details here.
balancer option: in the case our client gets connected with multiple servers now our gRPC client is able to balance the requests according to the balancing algorithm chosen.

Summing up our gRPC client is now able to create different connections if and only if the domain resolves multiple A or AAAA records, and not just that, now is able to balance the request evenly to the different servers.

Now let’s see the missing piece in the puzzle to make it work with Kubernetes.

Wrong Kubernetes service configuration

Creating a service in Kubernetes is pretty straightforward we just need to define the service name, the ports, and the selector so the service can group the pods dynamically and automatically balance the request like so:

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app: my-app
  ports:
    - name: grpc
      protocol: TCP
      port: 50051
      targetPort: 50051

so what is the problem with the previous set up?, it is simply that the default Kubernetes service creates a DNS record linking just one single IP, thus, when you do something like nslookup my-service.{namespace}.svc.cluster.local what is returned is a single IP, which makes the connection graph in a common gRPC implementation something like this:

e.g connection graph with a default Kubernetes service

the green line means the active connection with the client, the yellow is the pods not active. The client creates a persistent connection with the Kubernetes service which at the same time creates the connection with one of the pods but this does not mean the service is not connected with the rest of the pods.

Let’s solve it using a headless service:

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
 clusterIP: None **this is the key***
 selector:
    app: my-app
  ports:
    - name: grpc
      protocol: TCP
      port: 50051
      targetPort: 50051

After creating the headless service the nslookup looks like a bit different, now it returns the records associated with it (the pods IPs grouped into the service) giving to the gRPC client better visibility of the number of servers that need to be reached.

Now that you have seen the gRPC client configuration you must know why it is important that the Kubernetes service returns the IPs associated with the set of pods, and the reason is that the client can have visibility all servers that need to establish the connections. There is one caveat that you probably already realized at this point and is that the balancing responsibility is now in the client part and not in the Kubernetes side, the main task we need from Kubernetes now is to keep up to date the list of pods associated to the service.

e.g connection graph with a headless Kubernetes service

As you can see in the picture the connection changes a little bit, now we do not go through the Kubernetes service to reach the pod, instead we use the Kubernetes service to retrieve the list of pods linked to the domain and then we make the connection directly with the pods, but do not get scared about connecting directly to the pods since we set the DNS resolver type in the client that will keep watching the changes against the headless service, and will keep up to date the connections with the pods available.

Why not using then a service mesh?

If you can, please do it, with a service mesh all this set up is transparent, and the best part is that it is language agnostic. The key difference is that the service mesh leverage the sidecar pattern and a control plane to orchestrate the inbound and outbound traffic also has visibility of all the network and traffic type (HTTP, TCP… etc) being is able to balance the request properly. In a nutshell, if you are not using a service mesh you need either to get connected to multiple servers directly form each client or to connect to an L7 proxy to help to balance the requests.

Bonus

Although the previous set up works I had a problem trying to re-balance the connections when pod rotation or scales-up happened in the cluster in alpine linux images, and after some research, I realized that I was not the only with the same kind of problem, check here and here some github issues related. That’s why I decided to create my own resolver that you can take a look here, the custom resolver I created is a very basic but functional now the gRPC clients were able to listen for domain changes again, adding to the library a configurable listener that does X period of time a lookup to the domain and updates the set of IPs available to the gRPC connection manager, more than welcome if you want to contribute.

On the other hand, since I wanted to go deeper I decided to create my own gRPC proxy (and I learned a lot), leveraging the http2 foundation that gRPC has I could create a proxy without changing the proto payload message or without even knowing the proto file definition (also using the customer resolver aforementioned).

As a final comment, I would like to say that If your gRPC client needs to get connected with many servers I highly recommend using a proxy as a mechanism of balancing since having this in the main application will increase the complexity and resource-consuming trying to keep many open connections and re-balancing them, picture this, having the final balancing in the app you would have 1 instance connected to N servers (1-N), yet with a proxy, you would have 1 instance connected to M proxies connected to N servers (1-M-N) where for sure M<N since each proxy instance can handle a lot of connections to the different servers.