Mastering Istio Rate Limiting: Essential Techniques and Insights

Isar Nasimov
saas-infra
Published in
8 min readJun 17, 2024

We recently began implementing the Istio service mesh across our EKS clusters to enhance our observability into the application’s networking layer within each namespace. Alongside these observability benefits, we were keen to leverage Istio for circuit breaking. Initially, we delved into the Istio documentation, specifically focusing on the destination rule and its trafficPolicy.connectionPool. However, we found the documentation to be somewhat unclear, prompting us to embark on a detailed exploration to thoroughly understand the nuances of Istio's circuit breaking mechanisms.

This article draws inspiration from a piece on the same topic by OLX Engineering. I encourage you to explore their article as well. My goal was to deepen my understanding of the subject and expand on areas not covered by their discussion. You can read the OLX article here.

Current Architecture Overview

In our setup, I’m using a Kind cluster equipped with Istio, which hosts two services: appA and appB. appA is configured to send X requests (specified by the requestCount parameter) and introduces a delay of Y between each request. This delay, defined by the delay parameter, operates as a variable compliant with Go's time.Duration, allowing for flexible time intervals in units ranging from milliseconds to hours.

UML

You might have noticed that the requests directed at appB target the /scenarioA endpoint, suggesting the presence of additional scenarios. Indeed, there are more scenarios within our setup, each tailored for different testing or operational purposes. However, I'll delve into those in a future article, where we can explore each scenario in detail.

Kiali and the apps map

Using Kiali, we’ve observed that the workload of appB, represented visually as a square box, typically responds to its service within an average time of about 5 seconds. Additionally, the service handles approximately 0.67 requests per second (rps).

To further enhance our understanding, it’s crucial to examine the logs from appA. Analyzing these logs, in conjunction with the data from Kiali, will provide us with a wealth of information to explore — setting the stage for deeper insights in our next article.

- - - Start - - - -
2024/06/06 16:00:05 ID: 16, STATUS: 200 OK, START: 16:00:00, END: 16:00:05, TIME: 5.064351516s
...
- - - Summary - - - -
Requests that returned 200: 20

Those logs will give us information about the requests from app-a to app-b.
Its status when the request start, when it ended, and the total time it took (end-start).
And a quick summary for how many requests returned a status code.

Circuit breaking

As mentioned earlier, we implement circuit breaking using Istio’s destination rule alongside the connection pool settings.

Important note: the istio proxy provides some flexibility regarding the number of connections.

One critical observation is that all circuit breaking mechanisms rely on the tcp.maxConnections setting to manage the load effectively. To demonstrate this, let's configure tcp.maxConnections to 10. This setting ensures that no more than 10 connections can be active at any given time, helping prevent overloading and ensuring stable network behavior under high traffic conditions.

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: circuit-breaker
spec:
host: app-b
trafficPolicy:
connectionPool:
tcp:
maxConnections: 10

And observe the logs.

------Start-------
2024/06/16 12:28:12 ID: 0, STATUS: 200 OK, START: 12:28:07, END: 12:28:12, TIME: 5.014503078s
...
2024/06/16 12:28:17 ID: 10, STATUS: 200 OK, START: 12:28:08, END: 12:28:17, TIME: 9.915286761s
...
------Summary-------
Requests that returned 200: 20

We noticed that 10 requests were processed in 5 seconds, while another 10 took 10 seconds. To better understand these differences, examining appB’s logs will be crucial, as they can reveal details about response times and potential bottlenecks.

2024/06/16 12:28:07 POST /scenarioA id=0
...
2024/06/16 12:28:12 POST /scenarioA id=10
...

We can see that app-b started to handle 10 requests at 12:28:07 and another 10 at 12:28:12.

Istio configures appB to handle up to 10 requests at a time, effectively using the tcp.maxConnections setting as a queue mechanism for the service. This setup ensures controlled processing and helps manage traffic flow efficiently.

Advancing with http.http1MaxPendingRequests = 4

...
connectionPool:
tcp:
maxConnections: 10
http:
http1MaxPendingRequests: 4

let's again look at the logs of app-a first

2024/06/16 12:30:52 ID: 14, STATUS: 503 Service Unavailable, START: 12:30:52, END: 12:30:52, TIME: 7.36836ms
...
2024/06/16 12:30:57 ID: 0, STATUS: 200 OK, START: 12:30:52, END: 12:30:57, TIME: 5.008148736s
...
2024/06/16 12:31:02 ID: 12, STATUS: 200 OK, START: 12:30:52, END: 12:31:02, TIME: 9.769365077s
...
------Summary-------
Requests that returned 503: 6
Requests that returned 200: 14

Upon setting http.http1MaxPendingRequests to 4, we observed that 6 requests were immediately terminated with a 503 status code. The remaining 14 requests received a 200 status code. Within this successful group, approximately 10 responses were delivered within 5 seconds, while the remaining 4 took about 10 seconds. This pattern suggests that the http.http1MaxPendingRequests setting limits the number of pending requests.

The app-b pod got only 14 requests and others were dropped by istio.

Another case:

With that logic, let's put http.http1MaxPendingRequests = 12 we can expect that all request will pass to app-b won't be rate limited.

------Start-------
2024/05/28 15:14:05 STATUS: 200 OK, START: 15:14:00, END: 15:14:05, TIME: 5.008211852s
...
2024/05/28 15:14:10 STATUS: 200 OK, START: 15:14:00, END: 15:14:10, TIME: 10.010497053s
...
------Summary-------
Requests that returned 200: 20

Following this logic, by setting http.http1MaxPendingRequests to 12, we anticipate that all requests directed to appB will be processed without encountering rate limiting.

Last case:

connectionPool:
tcp:
maxConnections: 6
http:
http1MaxPendingRequests: 8

With the http.http1MaxPendingRequests set to 12, our 20 requests are effectively organized into 4 "queues": 6, 6, 6, and 2. The first 6 requests are processed immediately, leaving 14 in the queue. However, only 8 of these will be serviced, given the concurrent limitations.

This leads to the following equation for understanding which requests are likely to be rejected with a 503 status:

burstOfRequests−(maxConnections+http1MaxStandardPendingRequests)=requests that will return 503

Using this model, we can better predict and manage the flow of requests to ensure optimal performance and minimal service disruption.

Exploring the Role of Timeouts in Request Management

With tcp.maxConnections, we observed an increase in response times across our requests. Introducing http.idleTimeout adds a layer of control by setting a timeout for any request that exceeds a specified duration, X. This setting is intended to mitigate delays by terminating idle connections, although we'll later see that this assumption doesn't always hold true.

connectionPool:
tcp:
maxConnections: 10
http:
http1MaxPendingRequests: 4
idleTimeout: 4s

Let's look at app-a logs:

------Start-------
2024/06/16 15:04:28 ID: 14, STATUS: 503 Service Unavailable, START: 15:04:28, END: 15:04:28, TIME: 4.837148ms
...
2024/06/16 15:04:33 ID: 0, STATUS: 200 OK, START: 15:04:28, END: 15:04:33, TIME: 5.008434011s
...
2024/06/16 15:04:38 ID: 13, STATUS: 200 OK, START: 15:04:28, END: 15:04:38, TIME: 9.748320124s
...
------Summary-------
Requests that returned 503: 6
Requests that returned 200: 14

Upon closer examination, it becomes evident that http.idleTimeout does not actually limit the duration of our requests. Instead, this setting primarily manages the duration that a connection can remain idle before being closed, which is a subtle but important distinction.

Implementing Request Timeouts: Effective Strategies

We will add another resource, a VirtualService.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: app-b-virtual-service
spec:
hosts:
- app-b
http:
- timeout: 10s
route:
- destination:
host: app-b
trafficPolicy:
connectionPool:
tcp:
maxConnections: 6
http:
http1MaxPendingRequests: 8

Based on the configuration, here’s how the request responses are distributed:

  • The first 0–5 requests are processed and return after 5 seconds.
  • The next batch, 6–11 requests, takes longer and returns after 10 seconds.
  • Following that, 12–13 requests extend to a 15-second return time.
  • Finally, requests 14–19 are immediately terminated, resulting in a 503 status code due to overload conditions.

The log shows:

------Start-------
2024/06/16 15:14:23 ID: 14, STATUS: 503 Service Unavailable, START: 15:14:23, END: 15:14:23, TIME: 6.926893ms
..
2024/06/16 15:14:28 ID: 0, STATUS: 200 OK, START: 15:14:23, END: 15:14:28, TIME: 5.017792169s
...
2024/06/16 15:14:33 ID: 6, STATUS: 200 OK, START: 15:14:23, END: 15:14:33, TIME: 9.899476726s
...
2024/06/16 15:14:33 ID: 11, STATUS: 504 Gateway Timeout, START: 15:14:23, END: 15:14:33, TIME: 10.010734435s
...
------Summary-------
Requests that returned 503: 6
Requests that returned 200: 12
Requests that returned 504: 2

So, any requests that take more than 10 seconds will return 504 (gateway timeout).

An easier alternative?

Can we use http.http2MaxRequests for rate limiting?

connectionPool:
http:
http2MaxRequests: 8

Well yes we can, we can even see that the app is rate limited and can accept only 8 requests.

------Start-------
2024/06/16 15:34:08 ID: 8, STATUS: 503 Service Unavailable, START: 15:34:08, END: 15:34:08, TIME: 14.866511ms
...
2024/06/16 15:34:12 ID: 0, STATUS: 200 OK, START: 15:34:07, END: 15:34:12, TIME: 5.012923839s
...
------Summary-------
Requests that returned 503: 12
Requests that returned 200: 8

After applying the http.http2MaxRequests limit, we observed a change in the success rate in Kiali, indicating that the rate limiting could be impacting how requests are handled and their outcomes.

Using http.http2MaxRequests for rate limiting differs significantly from earlier methods involving maxConnections and http1MaxPendingRequests. While the former directly limits concurrent requests on HTTP/2 (but work on HTTP/1), the latter combination manages overall connections and queued requests, potentially leading to different impacts on request throughput and success rates as monitored by tools like Kiali.

And keep the edge green and healthy.

Summary

In this blog, we explored the implementation and nuances of rate limiting and circuit breaking in Istio within our EKS clusters. We began by setting up Istio and analyzing the basic architecture involving two services, appA and appB. By diving into the details, we examined how Istio’s destination rules and connection pools, particularly tcp.maxConnections, impacted our system's response times and throughput.

We also experimented with http.http1MaxPendingRequests and observed how setting it to different values influenced the handling of concurrent requests, notably affecting how many requests were immediately rejected with a 503 status code.

Further, we explored using http.http2MaxRequests as an alternative for rate limiting, discovering its direct effect on the application's ability to handle concurrent HTTP/2 requests. Comparing this with the previous method involving maxConnections and http1MaxPendingRequests, we noted the different impacts on service performance and success rates, as monitored in Kiali.

This comprehensive analysis provided a deeper understanding of Istio’s capabilities and the practical implications of various rate limiting strategies, guiding us toward more informed decisions for optimizing service performance and reliability.

--

--