Rate limit your API usage with Cloud Endpoints quotas

Published in

Google Cloud - Community

7 min readFeb 13, 2020

All APIs providers, like Google Cloud Platform, have to protect and preserve the available resources for the users by setting quotas and rate limit on APIs. This is can be for protecting the services against wrong usages, for avoiding cascading failure or simply for discarding attacks like DDoS.

However, when you want to implement a rate limit feature in your API, it’s not easy.

You have to code it, or to use a framework that do it for you.
You have to pay the processing time of the rate limit operation and that impacts your billing. In case of attacks, your billing will explode!
You have to manage the API usages in distributed cache of in memory database like Redis for allowing the horizontal scalability.

A free API management solution

That’s why, solutions exist like Apigee which is a well known product for enterprise grade API management, but has also enterprise grade cost. For smaller projects, a free solution exists: Cloud Endpoints.

This product is open source and can be hosted on several services:

Compute Engine
Google Kubernetes Engine (GKE)
Cloud Run
App Engine
Cloud Functions

In my previous article, I described how to secure serverless endpoints with API keys. It will be the starting point for the implementations steps

API Keys for identifying the consumers

For using the Cloud Endpoints quotas, you have to use API Keys. Here, the key is use for identifying the project that consume your API. Yes, the project, not the key. Indeed, if you create several keys in the same project, all refer to the same project and the limit are the same.

How to limit the API to several customers without dependencies between them?

In this page, Google propose a decision chart. The solution, if you have several customers that you want to distinguish, is to create several projects, one per customer, with an API keys authorized on your API in each project.

Improve the API definition with quotas

For implementing quotas to the API, I start from the existing definition described in my previous article and I use the endpoint.yaml file as base file.

There is 3 steps for defining a quota:

Name your quotas to be understandable.
Set the default rate (query per minute and per project) .
Define the cost of each paths call.

The final file with all the descriptions/updates provided in this article is here.

1. Name your quotas

Naming your quotas allows you to see them in the quotas page of Cloud Endpoints, but also each project which use this API will be able to see its quota in the IAM & Admin -> Quotas page.

So, let’s defining the name of the quotas that you want to use, at the root level of the YAML file, for example just before the paths: definition.

x-google-management:
  metrics:
    - name: "my_first_quota"
      displayName: "My first quota"
      valueType: INT64
      metricKind: DELTA

The displayName is this returned in case of error for quota reached and displayed in the GUI. Make it explicit!

2. Set the default value of a quota

Now, you have to define the standard value for this quota. The metric: field has the same value as the name of the previous bloc for making the link between both.

The bloc come at the same level as metrics: definition (in the previous part, for the quota naming)

  quota:
    limits:
      - name: "read-get"
        metric: "my_first_quota"
        unit: "1/min/{project}"
        values:
          STANDARD: 1

The values: is the value of the limit. Here 1 request per minutes, easier for tests. The unit can’t be changed in this Beta version.

3. Set the cost of an API call

Cloud Endpoints quotas feature allows you to create different quota metrics for differentiating, for example, the business cases, functionalities or type of query (GET, POST, PUT, DELETE).

However, you can also, inside a same quota definition, differentiate the consumed resources on each paths. For example, a multicriteria search with result list is much more expensive than a get of a single resource.

Thereby, on each paths, you can set the cost of the call. You can add this part at the same level as the OperationId:

      x-google-quota:
        metricCosts:
          "my_first_quota": 1

Test the quotas

For testing, start be deploying the new service definition

gcloud endpoints deploy endpoint.yaml

And test it. Repeat the curl several times. -i stands for displaying the response header, especially the response code

curl -i https://endpoint-<hash>-uc.a.run.app/hello?key=<API KEY>

After few calls, you should have this error. You may note that you have to perform 3, 4 or more tries before being blocked. See the paragraph below for more details.

HTTP/2 429
content-type: application/json
x-cloud-trace-context: d08f223742db0784fbb661e2512b3381
date: Mon, 27 Jan 2020 13:25:01 GMT
server: Google Frontend
content-length: 354{
 "code": 8,
 "message": "Quota exceeded for quota metric 'My first quota' and limit 'My first quota per minute' of service 'endpoint-<hash-uc.a.run.app' for consumer 'project_number:XXXXXXX'.",
 "details": [
  {
   "@type": "type.googleapis.com/google.rpc.DebugInfo",
   "stackEntries": [],
   "detail": "internal"
  }
 ]
}

The quota limit works. You can note the 429 HTTP code returned for To Many Requests. You can also see the quota display_name in the error description body and the project number identified with the API Key.

Best effort quotas management

Cloud Endpoints quotas is today in beta and some features are missing and sometime the behavior isn’t perfect.

You can perform tests with hey. If you set the request concurrency to 1 (-c parameter), the quota is roughly respected.

However, if you remove it, the requests are served until the information to block the project number is took into account by the Cloud Run endpoint service. This can take some milliseconds, enough for having dozens of additional requests that are served even if the quota is exceeded.

And more you have Cloud Run instances in parallel and higher is your traffic, more important is the number of requests that pass through instead of being blocked.

Anyway, after this glitch, the rate limit is applied, the 429 HTTP returned as expected and the API protected against a too large number of requests.

Improve the security and keep quotas

In my previous article, I explained that API Keys wasn’t the optimal solution for the API user authentication, even if it was possible. However, the API keys are required for the quotas.

The ideal was to keep the API keys for quotas and to use OAuth2 authentication

Cloud Endpoints allows you to set up this configuration. In the security definition, add the authentication method that you want. In the following example, I chose the Google ID token authentication and I added this in the security definition part in the endpoint-quotas.yaml file.

google_id_token:
  authorizationUrl: ""
  flow: "implicit"
  type: "oauth2"
  x-google-issuer: "https://accounts.google.com"
  x-google-jwks_uri: "https://www.googleapis.com/oauth2/v3/certs"

This configuration allows any google account to call my API, it’s broadly open! If you want to use Cloud Identity Platform and to manage your subset of users, I recommend you to use the Firebase authentication method

Then, I updated the security definition of your endpoint paths (or globally) by setting this value

security:
  - google_id_token: []

Test Cloud Endpoints with OAuth2 security definition

Now deploy the new API definition

gcloud endpoints deploy endpoint-quotas.yaml

And test it with a bearer token

curl -H "Authorization: Bearer $(gcloud auth print-identity-token)"\
    https://endpoint-<hash>-uc.a.run.app/hello?key=<API KEY>

And…. it failed. Why? Look at the error message

{
 "code": 7,
 "message": "JWT validation failed: Audience not allowed",
 "details": [
...

Yeah, you simply have to set the audience to your token. Easy, use the --audiences param to the command gcloud auth print-identity-token and set the Cloud Run endpoint host name as audience value

gcloud auth print-identity-token \
  --audiences https://endpoint-<hash>-uc.a.run.app

If you are authenticated with your user account. You should have another error.

ERROR: (gcloud.auth.print-identity-token) Invalid account Type for `--audiences`. Requires valid service account.

Ok, identity token with audience can be generated only on service account. For this,

Create a service account

gcloud iam service-accounts create test-endpoint

Generate and download the key file. Replace the PROJECT_ID with your project ID

gcloud iam service-accounts keys create key.json \
  --iam-account test-endpoint@PROJECT_ID.iam.gserviceaccount.com

Activate the service account in the gcloud CLI

gcloud auth activate-service-account --key-file=key.json

Now, all the pieces are set for performing a working test

curl -H "Authorization: Bearer $(gcloud auth print-identity-token \
  --audiences=https://endpoint-<hash>-uc.a.run.app/)" \
  https://endpoint-<hash>-uc.a.run.app/hello?key=<API KEY>

That works!! Try several calls, and yes, the quotas are also active!

Note: Only an active service account is required here, without any granted role because only the authentication is checked.

Authentication forwarding

In this schema, there is 2 level of authentication:

The caller to Cloud Endpoints with a “no role” service account.
The Cloud Endpoints proxy which performs the call to the desired paths. Here Cloud Run endpoint service account is used for being authenticated and authorized (with role run.invoker for example) when performing the request to the path services.

These 2 authentication tokens are transmitted to the path service in the request header with these names:

Caller token is in the x-endpoint-api-userinfo header key.
Cloud Endpoints proxy is in the authorization header key.

You can use them to perform the check and/or the additional authorisation at business level in your path service.

Note: the token validity has been already checked, you don’t have to do it, simply use the tokens content.

API call without API key

Now, the API Key is not yet used for securing the API and you can call the API without API Key. Have a try on it!

What is the impact on quotas if there isn’t API Key?

As said before, this Cloud Endpoints quotas feature is in Beta. Thereby, today, the used project number for the quota limit when the API Key is missing is the current project which hosts the Cloud Endpoints service.

Protect your APIs

Your APIs are precious, it’s the entry point of your services, your information system and your business values. You have to:

Ensure an high level of availability, for customers satisfactions
A correct and fair latency/usage between all API consumers
A high level of security with OAuth2 instead of API Keys

Cloud Endpoints allows you to set all this features, with a free product that you host where you want. Not yet perfect, and in Beta for the quotas feature, but the usages and the evolutions will consolidate these good foundations and great things should come!