Scaling Microservices with gRPC and Envoy Proxy — Part II — with Envoy Proxy

Kamalashree Nagaraj
The Startup
Published in
8 min readMay 3, 2020


In Part 1 of my blog post, I spoke about microservices architecture style, HTTP vs HTTP 2, Protocol Buffers and gRPC. In this blog, let me share about Load balancing and how Envoy Proxy helps in Load balancing.

Load Balancing

Load balancing¹ refers to efficiently distributing incoming network traffic across a group of backend servers, also known as a server farm or server pool. A load balancer performs the following functions:

  1. Distributes client requests or network load efficiently across multiple servers
  2. Ensures high availability and reliability by sending requests only to servers that are online
  3. Provides the flexibility to add or subtract servers as demand dictates

Load-balancing can be done either by the client (Client-side load balancing) like Netflix Ribbon or you could have a dedicated server which load-balances the requests (Server/ Proxy load balancing) like Nginx, HAProxy, Consul, DNS and Envoy Proxy. Each of these methods has its own pros and cons, as shown in the table below.

Comparison of Proxy and Client-side Load Balancing

Comparison of Proxy and Client-side Load Balancing

Because of the downsides of client-side load-balancing, most systems built using microservices architecture use Proxy load balancing.
At one of the clients I was working for, HAProxy was extensively used as a load balancer and at that point in time (around the year of 2018), Envoy Proxy was still under assessment. Hence we had proceeded with HAProxy as a load balancer for a system we were building.

Why Envoy Proxy?

But as our system started scaling, i.e, as we started having more customers and in turn higher throughput, we started observing performance degradation on a few servers. Few servers were receiving more throughput than the others and thus had higher response times. Upon further investigation, we identified that the problem lied in the HAProxy load balancer with the root cause being HTTP2 multiplexing. HAProxy load balancer was built for HTTP based load balancing and did not support HTTP2 based load balancing at the time.

Consider a single connection from a client to the HAProxy Server, as shown in the figure below, with 5 requests multiplexed on the same connection. HAProxy, built for HTTP based load balancing, considers that a client would create a new HTTP connection for every request. And for each such incoming connection, HAProxy creates a corresponding outgoing HTTP connection to a backend server. So in this case, even though there are 5 requests, HAProxy considers it as a single request since they are all multiplexed on a single connection and forwards them to the same backend server. But your backend server would now have to serve 5 requests instead of 1.

Load Balancing using HAProxy Server

Envoy solves this problem with its support for HTTP2 based load balancing.
Consider a similar example as above, where you have a single connection from a client to the Envoy Proxy Server, as shown in the figure below, with 5 requests multiplexed on the same connection. Unlike HAProxy, Envoy recognizes the 5 multiplexed requests and load-balances each request by creating 5 individual HTTP2 connections to 5 different backend servers.

Load Balancing using Envoy Proxy

What is the Envoy Proxy?

Envoy Proxy is an L7 proxy and communication bus designed for large modern service-oriented architectures.

Layer 7 load balancers operate at the highest level in the OSI model, the Application layer (on the Internet, HTTP is the dominant protocol at this layer). Layer 7 load balancers base their routing decisions on various characteristics of the HTTP header and on the actual contents of the message, such as the URL, the type of data (text, video, graphics), or information in a cookie.

The Envoy Proxy configuration primarily consists of listeners, filters and clusters.


Envoy listener configuration to listen on 7777 from localhost

A listener is a named network location (e.g., port, Unix domain socket, etc.) that tells Envoy the network location on which it should listen and can be connected to by downstream clients. Envoy exposes one or more listeners that downstream hosts connect to as shown in the figure below.


A set of filters provided tell Envoy how it should process the messages it hears. Envoy supports Listener filters, Network (L3/L4) filters and HTTP filters.

The flow of traffic through Envoy proxy with data flow enhancement using filters

Listener filters are processed before the network-level filters and have the opportunity to manipulate the connection metadata, usually to influence how the connection is processed by later filters or clusters.

Network (L3/L4) filters form the core of Envoy connection handling. There are three types of network filters:

  1. Read filters — Invoked when Envoy receives data from a downstream connection.
  2. Write filters — Invoked when Envoy is about to send data to a downstream connection.
  3. Read/ Write filters — Invoked both when Envoy receives data from a downstream connection and when it is about to send data to a downstream connection.

HTTP filters can be written to operate on HTTP level messages without knowledge of the underlying physical protocol (HTTP/1.1, HTTP/2, etc.) or multiplexing capabilities.

An example of Envoy filters


A cluster tells Envoy about one or more logically similar upstream hosts, to which Envoy can proxy incoming requests. Envoy discovers the members of a cluster via service discovery. It optionally determines the health of cluster members via active health checking. The cluster member that Envoy routes a request to is determined by the load balancing policy.

Envoy cluster configuration

Envoy supports the following load balancing policies²:
Weighted round-robin
Each available upstream host is selected in round-robin order. If weights are assigned to endpoints in a locality, then a weighted round-robin schedule is used, where higher weighted endpoints will appear more often in the rotation to achieve the effective weighting.

Weighted Least Request
The least request load balancer uses different algorithms depending on whether hosts have the same or different weights.
All weights equal
An O(1) algorithm which selects N random available hosts as specified in the configuration (2 by default) and picks the host which has the fewest active requests.
All weights not equal
If two or more hosts in the cluster have different load balancing weights, the load balancer shifts into a mode where it uses a weighted round-robin schedule in which weights are dynamically adjusted based on the host’s request load at the time of selection (weight is divided by the currently active request count)

Ring Hash
The ring/modulo hash load balancer implements consistent hashing to upstream hosts. Each host is mapped onto a circle (the “ring”) by hashing its address; each request is then routed to a host by hashing some property of the request and finding the nearest corresponding host clockwise around the ring. This technique is also commonly known as “Ketama” hashing, and like all hash-based load balancers, it is only effective when protocol routing is used that specifies a value to hash on.

The Maglev load balancer implements consistent hashing to upstream hosts. The idea is to generate a lookup table of fixed size (65537), with each backend taking some entries in the table. These methods provide two desirable properties that Maglev also needs for resilient backend selection:
- Load balancing: each backend will receive an almost equal number of connections.
- Minimal disruption: when the set of backends changes, a connection will likely be sent to the same backend as it was before.
Maglev can be used as a drop-in replacement for the ring hash load balancer in any place where consistent hashing is desired. Like the ring hash load balancer, a consistent hashing load balancer is only effective when protocol routing is used that specifies a value to hash on.

The random load balancer selects a random available host. The random load balancer generally performs better than round-robin if no health checking policy is configured. Random selection avoids bias towards the host in the set that comes after a failed host.

How we leveraged Envoy to solve yet another scalability challenge

After we solved the load balancing problem with Envoy, we were confronted with another business problem, the solution to which meant more read throughput (~30k) to the system we had built previously. The system was to receive a request on every App launch, i.e., from the Home Page.

To avoid facing similar problems with response times as before, we decided to deploy a dedicated read-only cluster for the additional read throughput this system was to receive. We introduced a request header to differentiate the requests made from the Home Page. However, routing these requests to the read-only cluster only needed two changes on Envoy.

  1. Cluster configuration for the read-only cluster
  2. Filter configuration to route the requests from the Home Page to read-only cluster

But we didn’t want to route all the traffic from the Home page to the read-only cluster at once. Instead, we wanted an incremental deployment. Envoy Proxy helped us solve this problem with ease with its’ cluster weightage as mentioned above in the load balancing policies.

As shown in the configuration below, we started off with routing 5% of the traffic from the Home page to the read-only cluster, gradually increasing it until we routed all 100% of the traffic to the read-only cluster.

Envoy incremental deploys


gRPC, built on HTTP/2, provides us with a high-speed communication protocol that can take advantage of bi-directional streaming, multiplexing and more. Protocol Buffer provides client/ server library implementations in many languages. It has a binary format and, hence, provides a much smaller footprint than JSON/XML payloads. When you are targeting least response times, opt for gRPC as your communication protocol.

Envoy Proxy is an L7 proxy and communication bus designed for large modern service-oriented architectures. It supports load balancing both HTTP and gRPC requests. Envoy provides a rich set of features via the built-in filters which one can quickly leverage via Listener configuration. The filter chain paradigm is a powerful mechanism, and Envoy lets users implement their own filters by extending its API.



Kamalashree Nagaraj
The Startup

Developer at ThoughtWorks. Interested in full-stack development and data science.