SRE: Resiliency: Bolt on Rate Limiting using Envoy

Rate-limiting is an effective and simple way to mitigate cascading failure and shared resource exhaustion. Envoy is a feature rich proxy which allows for the easy addition of rate limiting, to any service. This post walks through configuring envoy to enforce rate limiting without changing any application level configuration.

Problem

Have you ever had the problem of a resource (api, db, etc) being overwhelmed or saturated with requests? In microservice architectures, resources without constraints on their usage can easily become overwhelmed by the number of clients making requests. There may be any number of clients, each implementing a wide variety of retry/backoff or rate-limiting policies. Greedy clients can easily starve resources from other clients by saturating a service. Worst of all greedy clients can make requests until it completely brings down a service.

A common way to enforce usage constraints is to enable rate limiting. Envoy allows for quick, performant and reliable global rate limiting at the HTTP layer. This is in contrast to IP based rate limiting or an application level rate limiting like may web frameworks provide.

This tutorial will be using the architecture illustrated in the image above. The Service Client on the left represents a client with particularly high usage. During its execution it can completely saturate all the service instances and cause other higher priority clients to have their requests dropped.

Envoy enables the ability to rate limit any service at the network level, without making any modifications to an application. Additionally, since envoy is application level 7 aware it can inspect and rate limit on HTTP header information.

For this tutorial the vegeta load testing tool is used to simulate the batch job in the example above. Below shows steady state of ~500 requests /second:

$ make load-test
echo "GET http://localhost:8080/slow" | vegeta attack -rate=500 -duration=0 | tee results.bin | vegeta report
~500 requests/second

During simulated background jobs the API /slow resource spikes to 3500 requests per second, impacting other endpoints and clients:

The solution configured below uses Envoy to enforce a 500 requests / second limit. But first…

What is Envoy?

Envoy is a light weight proxy server that natively handles TCP/IP/HTTP/GRPC/HTTP2 Connections. It is highly configurable and supports many different plugins. It also makes observability a first class citizen.

Before Envoy, application level retries, latency injection, rate limiting, and circuit breaking were pretty expensive application level initiatives. Envoy outsources these and empowers ops minded people to configure and enable these organizational capabilities without any application level changes. Envoy is completely upending service resiliency and microservice operations.

The Envoy Documentation and Matt Klein’s articles provide a much better introduction to Envoy than I could provide:

Originally built at Lyft, Envoy is a high performance C++ distributed proxy designed for single services and applications, as well as a communication bus and “universal data plane” designed for large microservice “service mesh” architectures. Built on the learnings of solutions such as NGINX, HAProxy, hardware load balancers, and cloud load balancers, Envoy runs alongside every application and abstracts the network by providing common features in a platform-agnostic manner. When all service traffic in an infrastructure flows via an Envoy mesh, it becomes easy to visualize problem areas via consistent observability, tune overall performance, and add substrate features in a single place.

(The resources that I read in order to write this are linked at the end of this article.)

Solution

All code and examples are available on github.

The solution outlined below will:

  • Configure Envoy as a service front to the API Load balancer; still passing through all traffic
  • Configure and Run the Global Rate Limit Service
  • Configure Envoy to use the Global Rate limit Service

We need a way to limit the # of requests being made in order to insulate the API from spiking and ensure that other clients can make progress when these batch jobs (simulated by vegeta) are being executed. In order to do this we’ll place Envoy proxies to the fake api locally on the batch/vegeta client.

Running Envoy as a sidecar to the batch job client allows for rate limiting requests before even hitting the load balancer! Envoy is a good candidate for this because it’s highly configurable, performant and can handle HTTP balancing. Additionally, it is a cloud native project and is highly observable.

Configure Envoy as a service front to the API Load balancer

The first step is to configure Envoy to sit in between the batch job and the API. All requests that the service will make to the API will be done through envoy. First we’ll configure Envoy to know about the api. Then we’ll update the Batch job to make requests to Envoy instead of directly to the API. The final state of this step will look like the following:

This step only routes API traffic through Envoy, it doesn’t yet perform any rate limiting. In order to configure Envoy we need a couple of things:

cluster

A cluster Represents the upstream resource that Envoy will connect to (in this case it is the API load balancer). The configuration for this is pretty simple:

clusters:
- name: api
connect_timeout: 1s
type: strict_dns
lb_policy: round_robin
hosts:
- socket_address:
address:
localhost
port_value: 8080

In our example we’re running a fake api on localhost:8080 to simulate the load balancer in the chart above. The above configuration says any requests through Envoy to the API should be made to localhost:8080.

virtual_host

This section establishes that all requests are routed to the api cluster defined above.

- name: api
domains:
- "*"
routes:
- match:
prefix:
"/"
route:
cluster:
api

The rest of the configuration file establishes which address Envoy itself will listen on and the rules for connections.

static_resources:
listeners:
- name: listener_0
address:
socket_address:
{ address: 0.0.0.0, port_value: 10000}
filter_chains:
- filters:
- name: envoy.http_connection_manager
config:
stat_prefix:
ingress_http
codec_type: AUTO
route_config:
name:
remote_api
virtual_hosts:
- name: api
domains:
- "*"
routes:
- match:
prefix:
"/"
route:
cluster:
api

http_filters:
- name: envoy.router

clusters:
- name: api
connect_timeout: 1s
type: strict_dns
lb_policy: round_robin
hosts:
- socket_address:
address:
localhost
port_value: 8080

admin:
access_log_path:
"/dev/null"
address:
socket_address:
address:
0.0.0.0
port_value: 9901

Updating the load test to target the local envoy shows that it is accepting traffic. The Envoy graphs below are from a public grafana dashboard. (but Lyft also makes their Envoy dashboards available).

$ make load-test LOAD_TEST_TARGET=http://localhost:10000 LOAD_TEST_RATE=500
echo "GET http://localhost:10000/slow" | vegeta attack -rate=500 -duration=0 | tee results.bin | vegeta report

The above graphs show that Envoy is now receiving ALL requests to the API, and is sending them upstream to the Load Balancer!

Configure and Run the Global Ratelimit Service

This step will setup Lyft’s Global Ratelimit service. Running it was as simple as cloning their repo, modifying the config file and starting up their docker-compose stack.

Next is cloning Lyft’s Ratelimit service and modifying the config file. The Domain has been updated and the descriptor key and value have been modified:

⟫ cat examples/ratelimit/config/config.yaml
---
domain: apis
descriptors:
- key: generic_key
value: default
rate_limit:
unit: second
requests_per_unit: 500

Next let’s start it using the Lyft’s built in docker-compose configuration (steps are listed in their README):

/vagrant_data/go/src/github.com/lyft/ratelimit⟫ docker-compose down && docker-compose up

Configure Envoy to use the Global Rate limit Service

The final step is to configure Envoy to use the rate limiting service in order to enforce rate limits and slow the rate of requests to the API. This makes it so that Envoy will check the rate limits on each incoming connection and filter down requests according to the configuration above (max 500 request/second):

The Envoy configuration file to use the rate limiter looks like:

static_resources:
listeners:
- name: listener_0
address:
socket_address:
{ address: 0.0.0.0, port_value: 10000}
filter_chains:
- filters:
- name: envoy.http_connection_manager
config:
use_remote_address:
true
stat_prefix: ingress_http
codec_type: AUTO
route_config:
name:
remote_api
virtual_hosts:
- name: api
domains:
- "*"
routes:
- match:
prefix:
"/"
route:
cluster:
api
rate_limits:
- stage: 0
actions:
- {generic_key: {descriptor_value: "default"}}

http_filters:
- name: envoy.rate_limit
config:
domain:
apis
stage: 0

- name: envoy.router

clusters:
- name: api
connect_timeout: 1s
type: strict_dns
lb_policy: round_robin
hosts:
- socket_address:
address:
localhost
port_value: 8080

- name: rate_limit_cluster
type: strict_dns
connect_timeout: 0.25s
lb_policy: round_robin
http2_protocol_options: {}
hosts:
- socket_address:
address:
localhost
port_value: 8081

rate_limit_service:
grpc_service:
envoy_grpc:
cluster_name:
rate_limit_cluster
timeout: 0.25s

admin:
access_log_path:
"/dev/null"
address:
socket_address:
address:
0.0.0.0
port_value: 9901

We then run the load test at 1000 requests / second (2x the allowed limit):

$ make load-test LOAD_TEST_TARGET=http://localhost:10000 LOAD_TEST_RATE=1000
echo "GET http://localhost:10000/slow" | vegeta attack -rate=1000 -duration=0 | tee results.bin | vegeta report

Looking at Lyft’s ratelimiter logs shows it accepting requests and doing rate limit lookups:

msg="cache key: apis_generic_key_default_1540829538 current: 35"
ratelimit_1 | time="2018-10-29T16:12:18Z" level=debug msg="cache key: apis_generic_key_default_1540829538 current: 34"
ratelimit_1 | time="2018-10-29T16:12:18Z" level=debug msg="cache key: apis_generic_key_default_1540829538 current: 33"
ratelimit_1 | time="2018-10-29T16:12:18Z" level=debug msg="cache key: apis_generic_key_default_1540829538 current: 31"
ratelimit_1 | time="2018-10-29T16:12:18Z" level=debug msg="cache key: apis_generic_key_default_1540829538 current: 32"
ratelimit_1 | time="2018-10-29T16:12:18Z" level=debug msg="cache key: apis_generic_key_default_1540829538 current: 42"
ratelimit_1 | time="2018-10-29T16:12:18Z" level=debug msg="starting get limit lookup"
ratelimit_1 | time="2018-10-29T16:12:18Z" level=debug msg="cache key: apis_generic_key_default_1540829538 current: 46"
ratelimit_1 | time="2018-10-29T16:12:18Z" level=debug msg="looking up key: generic_key_default"
ratelimit_1 | time="2018-10-29T16:12:18Z" level=debug msg="looking up key: generic_key_default"
ratelimit_1 | time="2018-10-29T16:12:18Z" level=debug msg="looking up key: generic_key_default"
ratelimit_1 | time="2018-10-29T16:12:18Z" level=debug msg="looking up key: generic_key_default"
ratelimit_1 | time="2018-10-29T16:12:18Z" level=debug msg="looking up key: generic_key_default"

(https://github.com/envoyproxy/envoy/issues/3388 was extremely helpful in getting rate limit wired up!)

Stopping our load test prints the vegeta report showing that 1/2 of the requests are getting rate limited by Envoy resulting in a status code of 429!!!

/vagrant_data/go/src/github.com/dm03514/grokking-go/bolt-on-out-of-process-rate-limits⟫ make load-test LOAD_TEST_TARGET=http://localhost:10000 LOAD_TEST_RATE=1000
echo "GET http://localhost:10000/slow" | vegeta attack -rate=1000 -duration=0 | tee results.bin | vegeta report
Requests [total, rate] 128093, 1000.02
Duration [total, attack, wait] 2m8.102168403s, 2m8.090470728s, 11.697675ms
Latencies [mean, 50, 95, 99, max] 10.294365ms, 11.553135ms, 33.428287ms, 52.678127ms, 177.709494ms
Bytes In [total, mean] 1207354, 9.43
Bytes Out [total, mean] 0, 0.00
Success [ratio] 52.69%
Status Codes [code:count] 200:67494 429:60599
Error Set:
429 Too Many Requests

The corresponding visualization can be seen by graphing the rate of Envoy’s exposed ratelimit metric (envoy_cluster_ratelimit_over_limit) or (4XX responses):

Visualizing the # of requests that our api service actually sees shows that it hovers around ~500 requests / second, exactly what we were hoping for!

Once again, looking at envoys outgoing api connections shows that requests hover around ~500 requests per second as well!

Success!

Conclusion

I hope that the above illustrates how simple it is to configure Envoy to mitigate API resource exhaustion from a greedy client. I have found this pattern extremely useful because resiliency is often on the back burner to feature development. Before Envoy application level retries, latency injection, rate limiting, and circuit breaking were pretty expensive application level initiatives. Envoy outsources these and empowers ops minded people to configure and enable these organizational capabilities without any application level changes. Envoy is completely upending service resiliency and I hope you had as much fun reading this article as I did writing it!!

Resources