Rate Limiting on AWS API Gateway: Beyond Usage Plans

Published in

The Startup

4 min readNov 13, 2020

Planning to deploy a high volume API gateway in your AWS environment? Here are some important things that about throttling and limits you may have missed

The Basics: API Keys and Usage Plans

Throttling is done on the per second level via usage plans and API keys. Hence you set request per second, RPS on API keys via usage plans, while in other platforms it might be done on a minute level where you set requests per minute, RPM. This is well-covered in the official documentation, do give it a read if you are unfamiliar.

Do not use API keys as the only means of authentication and authorization, especially if requests are made from a browser and a mobile app. API keys can be extracted from browsers in plain text and also from mobile apps when decompiled. For server to server API calls, it is always good to have an additional layer of security. Web tokens based on OAuth flows is the de facto standard. You can either implement it on your own or use a SaaS service like Amazon Cognito or Okta.

My default limit is 10,000 RPS? What does it mean?

Throttling based on requests per second (RPS) is for new requests, not inflight/concurrent requests. Hence, AWS API Gateway has no concurrency limit on requests, meaning no limits for existing or open requests. This may be surprising to many especially if you are used to scaling web servers based on concurrent connections. Instead, on AWS API gateway, throttling is based on new requests.

Hence by default, API gateway can have 10,000 (RPS limit) x 29 (timeout limit) = 290,000 open connections. The 10,000 RPS is a soft limit which can be raised if more capacity is required, while the 29 second is a hard limit which cannot be further increased. 29 seconds is a long time for a web service to respond! So either something is wrong in the backend or you will have to re-design the request flows.

In addition, API Gateway allows for burst based on token bucket algorithm if the RPS limit is hit within the second. To put it simply, leftover RPS from previous seconds can be accumulated in a bucket as tokens to be used for burst later. If there are no tokens left, you will get throttled and “429: Too Many Requests” errors will be thrown. The maximum capacity for the bucket is 5000 by default and typically half of the RPS limit.

What if I need to limit the number of open connections to the backend? Maybe my backend is a legacy non “auto-scaling” application

As there is no concurrency limit on AWS API Gateway, the easiest way to limit the number of open connections to the backend is via a Lambda proxy. It is a common set up as Lambda allows you to input custom business logic using popular languages such as Python, Java, .NET, etc. It is much more flexible than using Apache VTL in the mapping template feature of API gateway to transform requests. The caveat of using a Lambda proxy is slightly lower performance as Lambda boot and execution time will add to overall API response time.

Assuming each Lambda invocation only opens one connection to the backend, the number of open connections is essentially the Lambda function’s concurrency, which can be configured and controlled .

Managing Lambda Concurrency

There are 2 concurrency settings on Lambda. Reserved concurrency is the one to use for concurrency management.

Provisioned Concurrency: Use to pre-warm your lambda functions to provide consistent function start time. As Lambda functions scale at an additional 500 each minute, provisioned concurrency allows you to go from 0 to 1000 in a second if you have provisioned concurrency of 1000. Do note that provisioned concurrency is not cheap.
Reserved Concurrency: Use this to reserve concurrency for a specific function. When reserved concurrency is allocated to a Lambda function, no other Lambda function can use it. It also sets a maximum concurrency for your function, preventing it from scaling out of control and controls the concurrent connections to the backend.

Default Concurrency Limit of 1000

Think of the region limit as a pie and you are splitting the pie to different Lambda functions. Unreserved concurrency is a common pool which all other functions tap. Functions with reserved concurrency cannot tap on unreserved capacity.

Ensure that you have enough buffer concurrency for your Lambda functions. Default limit starting at 1000 is a soft limit and can be raised if you need more concurrency.

Rate Limiting on AWS API Gateway: Beyond Usage Plans

The Basics: API Keys and Usage Plans

My default limit is 10,000 RPS? What does it mean?

What if I need to limit the number of open connections to the backend? Maybe my backend is a legacy non “auto-scaling” application

Managing Lambda Concurrency

Feel free to leave a note or comment if anything is not clear

Disclaimer: This blog’s content are solely opinions of my own and not my employer’s.

Written by Yiquan Ong