Load Balancing for gRPC
This post is focused on how we can achieve load balancing using the gRPC protocol.
gPRC is a technology for implementing RPC APIs that uses http/2 as its underlying protocol.
Why do we need special Load Balancing for gRPC ?
As compared to JSON-over-HTTP, gRPC breaks standard connection-based load balancing as it is built on HTTP/2. HTTP/2 is connection persistent and can allow long-lived TCP connection across which all the requests are multiplexed (multiple requests can be active on the single connection at any point in time). It significantly minimizes the overhead for connection management efforts. However, it means that connection-level load-balancing cant be used in this case once the connection is established. All the requests can be entertained at a single destination.
Why connection based Load Balancing doesn't affect HTTP/1 ?
There is a concept of long-lived connection in HTTP/1.1 that still has several features to support the cycling of TCP connections; due to which connection-level load-balancing works fine with HTTP/1.1. For example, if the client is making a request, e.g.
GET /response, it will wait until the server responds. No other requests can be issued on that connection while that request-response cycle is in process.
Normally, we run concurrent requests in parallel. Therefore, to entertain HTTP/1.1 requests, we need to make multiple HTTP/1.1 connections and issue our requests across all of them. Also, long-lived HTTP/1.1 connections typically expire after some time, and are torn down by the client (or server). These two factors combined mean that HTTP/1.1 requests cycle across multiple TCP connections and so connection-level balancing works fine here.
How to Load Balance gRPC?
Back to gRPC. We can’t balance at the connection level, so to load balance gRPC we need to balance amongst the requests i.e., request balancing. To achieve this, we need to open an HTTP/2 connection to each destination, and balance requests across these connections, as shown below:
gRPC Load Balancing on Kubernetes:
To implement load-balancing for gRPC we can use Linkerd which is a CNCF-hosted service mesh for Kubernetes. Linkerd can also be applied to a single service without even cluster-wide permissions. This means, when we add Linkerd to a service, it adds an ultra-fast proxy to each pod, and these proxies keep a watch on Kubernetes API and do gRPC load balancing automatically. Our deployment then looks like this:
Using Linkerd has various benefits.
First, it can work with services written in any language, with any gRPC client, and any deployment model (headless or not). Since Linkerd’s proxies are completely transparent, they auto-detect HTTP/2 and HTTP/1.x and do L7 load balancing, and all other traffic will pass through pure TCP. This results in everything working just fine.
Secondly, Linkerd also keeps a watch on Kubernetes API and automatically updates the load-balancing pool as any pod get rescheduled. It uses exponentially-weighted moving average response latencies to automatically send requests to the fastest pods. This can be effective in reducing tail latencies.
Lastly, Linkerd has a concept of Rust-based proxies which are significantly fast and small. They introduce <1ms of p99 latency and require <10mb of RSS per pod, which means the impact on system performance will be negligible.
gRPC for AWS — ALB
This support has been launched in October 2020, now we can use HTTP/2 with AWS-ALB.
To use HTTP/2 on your ALB, choose HTTPS as your listener protocol, gRPC as the protocol version for your target group, and register instance or IP as targets for the configured target group. In the target group, ALB will use gRPC specific health checks to determine the availability of targets and provide gRPC specific access logs to monitor the traffic.
Hope you will find this helpful!
See you in the next Chapter!