All APIs providers, like Google Cloud Platform, have to protect and preserve the available resources for the users by setting quotas and rate limit on APIs. This is can be for protecting the services against wrong usages, for avoiding cascading failure or simply for discarding attacks like DDoS.
However, when you want to implement a rate limit feature in your API, it’s not easy.
- You have to code it, or to use a framework that do it for you.
- You have to pay the processing time of the rate limit operation and that impacts your billing. In case of attacks, your billing will explode!
- You have to manage the API usages in distributed cache of in memory database like Redis for allowing the horizontal scalability.
A free API management solution
That’s why, solutions exist like Apigee which is a well known product for enterprise grade API management, but has also enterprise grade cost. For smaller projects, a free solution exists: Cloud Endpoints.
- Compute Engine
- Google Kubernetes Engine (GKE)
- Cloud Run
- App Engine
- Cloud Functions
In my previous article, I described how to secure serverless endpoints with API keys. It will be the starting point for the implementations steps
API Keys for identifying the consumers
For using the Cloud Endpoints quotas, you have to use API Keys. Here, the key is use for identifying the project that consume your API. Yes, the project, not the key. Indeed, if you create several keys in the same project, all refer to the same project and the limit are the same.
How to limit the API to several customers without dependencies between them?
In this page, Google propose a decision chart. The solution, if you have several customers that you want to distinguish, is to create several projects, one per customer, with an API keys authorized on your API in each project.
Improve the API definition with quotas
For implementing quotas to the API, I start from the existing definition described in my previous article and I use the
endpoint.yaml file as base file.
There is 3 steps for defining a quota:
- Name your quotas to be understandable.
- Set the default rate (query per minute and per project) .
- Define the cost of each paths call.
The final file with all the descriptions/updates provided in this article is here.
1. Name your quotas
Naming your quotas allows you to see them in the quotas page of Cloud Endpoints, but also each project which use this API will be able to see its quota in the
IAM & Admin -> Quotas page.
So, let’s defining the name of the quotas that you want to use, at the root level of the YAML file, for example just before the
- name: "my_first_quota"
displayName: "My first quota"
displayName is this returned in case of error for quota reached and displayed in the GUI. Make it explicit!
2. Set the default value of a quota
Now, you have to define the standard value for this quota. The
metric: field has the same value as the name of the previous bloc for making the link between both.
The bloc come at the same level as
metrics: definition (in the previous part, for the quota naming)
- name: "read-get"
values: is the value of the limit. Here 1 request per minutes, easier for tests. The unit can’t be changed in this Beta version.
3. Set the cost of an API call
Cloud Endpoints quotas feature allows you to create different quota metrics for differentiating, for example, the business cases, functionalities or type of query (GET, POST, PUT, DELETE).
However, you can also, inside a same quota definition, differentiate the consumed resources on each
paths. For example, a multicriteria search with result list is much more expensive than a get of a single resource.
Thereby, on each
paths, you can set the cost of the call. You can add this part at the same level as the
Test the quotas
For testing, start be deploying the new service definition
gcloud endpoints deploy endpoint.yaml
And test it. Repeat the
curl several times.
-i stands for displaying the response header, especially the response code
curl -i https://endpoint-<hash>-uc.a.run.app/hello?key=<API KEY>
After few calls, you should have this error. You may note that you have to perform 3, 4 or more tries before being blocked. See the paragraph below for more details.
date: Mon, 27 Jan 2020 13:25:01 GMT
server: Google Frontend
"message": "Quota exceeded for quota metric 'My first quota' and limit 'My first quota per minute' of service 'endpoint-<hash-uc.a.run.app' for consumer 'project_number:XXXXXXX'.",
The quota limit works. You can note the
429 HTTP code returned for To Many Requests. You can also see the quota
display_name in the error description body and the project number identified with the API Key.
Best effort quotas management
Cloud Endpoints quotas is today in beta and some features are missing and sometime the behavior isn’t perfect.
You can perform tests with
hey. If you set the request concurrency to 1 (
-c parameter), the quota is roughly respected.
However, if you remove it, the requests are served until the information to block the project number is took into account by the Cloud Run endpoint service. This can take some milliseconds, enough for having dozens of additional requests that are served even if the quota is exceeded.
And more you have Cloud Run instances in parallel and higher is your traffic, more important is the number of requests that pass through instead of being blocked.
Anyway, after this glitch, the rate limit is applied, the
429 HTTP returned as expected and the API protected against a too large number of requests.
Improve the security and keep quotas
In my previous article, I explained that API Keys wasn’t the optimal solution for the API user authentication, even if it was possible. However, the API keys are required for the quotas.
The ideal was to keep the API keys for quotas and to use OAuth2 authentication
Cloud Endpoints allows you to set up this configuration. In the security definition, add the authentication method that you want. In the following example, I chose the Google ID token authentication and I added this in the security definition part in the
This configuration allows any google account to call my API, it’s broadly open! If you want to use Cloud Identity Platform and to manage your subset of users, I recommend you to use the Firebase authentication method
Then, I updated the security definition of your endpoint paths (or globally) by setting this value
- google_id_token: 
Test Cloud Endpoints with OAuth2 security definition
Now deploy the new API definition
gcloud endpoints deploy endpoint-quotas.yaml
And test it with a bearer token
curl -H "Authorization: Bearer $(gcloud auth print-identity-token)"\
And…. it failed. Why? Look at the error message
"message": "JWT validation failed: Audience not allowed",
Yeah, you simply have to set the audience to your token. Easy, use the
--audiences param to the command
gcloud auth print-identity-token and set the Cloud Run endpoint host name as audience value
gcloud auth print-identity-token \
If you are authenticated with your user account. You should have another error.
ERROR: (gcloud.auth.print-identity-token) Invalid account Type for `--audiences`. Requires valid service account.
Ok, identity token with audience can be generated only on service account. For this,
- Create a service account
gcloud iam service-accounts create test-endpoint
- Generate and download the key file. Replace the
PROJECT_IDwith your project ID
gcloud iam service-accounts keys create key.json \
- Activate the service account in the gcloud CLI
gcloud auth activate-service-account --key-file=key.json
Now, all the pieces are set for performing a working test
curl -H "Authorization: Bearer $(gcloud auth print-identity-token \
That works!! Try several calls, and yes, the quotas are also active!
Note: Only an active service account is required here, without any granted role because only the authentication is checked.
In this schema, there is 2 level of authentication:
- The caller to Cloud Endpoints with a “no role” service account.
- The Cloud Endpoints proxy which performs the call to the desired
paths. Here Cloud Run endpoint service account is used for being authenticated and authorized (with role
run.invokerfor example) when performing the request to the
These 2 authentication tokens are transmitted to the
path service in the request header with these names:
- Caller token is in the
- Cloud Endpoints proxy is in the
You can use them to perform the check and/or the additional authorisation at business level in your
Note: the token validity has been already checked, you don’t have to do it, simply use the tokens content.
API call without API key
Now, the API Key is not yet used for securing the API and you can call the API without API Key. Have a try on it!
What is the impact on quotas if there isn’t API Key?
As said before, this Cloud Endpoints quotas feature is in Beta. Thereby, today, the used project number for the quota limit when the API Key is missing is the current project which hosts the Cloud Endpoints service.
Protect your APIs
Your APIs are precious, it’s the entry point of your services, your information system and your business values. You have to:
- Ensure an high level of availability, for customers satisfactions
- A correct and fair latency/usage between all API consumers
- A high level of security with OAuth2 instead of API Keys
Cloud Endpoints allows you to set all this features, with a free product that you host where you want. Not yet perfect, and in Beta for the quotas feature, but the usages and the evolutions will consolidate these good foundations and great things should come!