Managing Google OAuth Tokens for your large scale Gmail app

Raghav C S
Hiver Engineering
Published in
9 min readMay 16, 2022

At Hiver, we use the Gmail API quite extensively. In fact, our Shared Inbox product is built right on top of Gmail. Users are able to sync their emails across inboxes within a team for a single shared view of their Gmail workload.

With over 100,000 active users using Shared Inboxes, ability to update their Gmail content through the Gmail API is a basic necessity for the Hiver App. And it all starts with managing access to their Gmail — User Authentication and maintaining it for as long as the user is an active part of our processing.

Now, if it is a small standalone application serving a few hundred users, it seems quite a simple problem doesn’t it?
Obtain access to their tokens, store them somewhere and refresh them when needed. Any errors and delays incurred during the lifecycle will not prove as costly and some wastage of resources in retries are warranted.

But if you scale that a 1000 times over and add a bunch of distributed services making millions of Gmail API calls per second, it gets a little more complicated.

Here’s a few things to keep in mind-

  • Fast reads on the user tokens. With several services making frequent Gmail API calls, fetching the user tokens instantaneously becomes a necessity.
  • The services need to be aware of the current authentication state of each user. That way they can make quick decisions about whether or not to fire API calls. Ex — If we have temporarily lost access to a user’s Gmail, it would make sense to retry the API calls for a fixed period of time, hoping that we regain auth for the user by then.
  • Over the course of a user’s lifecycle with our app, we may encounter various errors during the authentication process. Managing these effectively ensures that delays in processing are minimized and resources are utilized efficiently without waste.

At full production scale, any delays/wastage incurred by the above have potential to throw off the flow of processing in a major way.

And now, here’s what we did at Hiver to tackle some of these problems :D

Some background info before jumping in -

Google OAuth

Google uses an OAuth2.0 protocol for the authentication and authorization of its users. This allows access to various Google APIs, including the Gmail API which we will be focusing on.

To summarize, Google requires your application to obtain an access token, which will be used to authorize access to all related Google APIs within the scope of said token. This access token is valid only for a certain period of time before it expires.

Normally your application redirects a browser to a Google URL — which takes care of user authentication, session selection, and user consent and finally results in an authorization code.

The application can then exchange the authorization code for an access token and a refresh token.

Access Tokens

Token Response

Above is a JSON response of the access token fetch request, which we obtain by exchanging the authorization code.

The response contains an access token/refresh token pair, along with a scope field which represents the set of resources and operations that the access token can access. In the above example, the access token has global access to the user’s Gmail, Google Calendar, and access to the user’s basic Gmail settings — and respectively the APIs that are governed by these scopes.

There is also an expires_in parameter, which dictates the expiration time in seconds for the access_token.

Once the access_token expires, you can exchange the refresh_token for a new access_token.

It is necessary to store these tokens and their expiry, especially the refresh token. The refresh tokens are used to obtain new access tokens, and managing them properly ensures your application continues to access the necessary Google APIs without any hiccups.

Token Management

Why do we need it?

Loss of connectivity with the Gmail APIs can happen due to a multitude of reasons — expiration of the access token, temporary revoking of the access by the user which invalidates the refresh token, temporary errors during token access or refresh, downtimes to one or more Hiver services which may render the tokens unusable after a while.

The state of a user’s token has a direct relationship with the state of their Gmail auth with the Hiver app.

A user’s auth state at any point of time tells us not only our ability to access their Gmail, but also allows us to make decisions — token refresh retries, trigger reauthentication and identify long term auth problems.

Subsequently, storage and access of the tokens also becomes important during the designing of the system, for persistence; as well as quick reads — since the application may need to make a huge volume of API calls within a minute for a single user, and delays here can throw off email sync times in a major way.

The Hows -

Hiver’s Token/Auth management system tackles both the problems of storage/access and maintenance as part of its token updater service.

Here’s an overview of how it works.

Token Updater Architecture

Storage/Access

Persistent — Refresh Tokens are technically supposed to be permanent once the user is authenticated with the app, so storing them in a persistent storage like RDS (MySQL) makes sense. Along with the refresh tokens, we store their access token counterparts along with their time of expiration — for each user who is authenticated with Hiver.

Cache — Access Tokens are used frequently by multiple services for accessing the Gmail APIs of Hiver users. Having an in-memory Redis cache allows not only for fast reads, but it can be set to expire once the tokens expire.

Token Maintenance

The token updater worker is responsible for detecting token expired users, refreshing them, handling any errors (internal or external) that may occur during the process, updating them in storage, finally culminating in determining and setting the auth-state of the users involved.

Parallelism — By fanning out the token updater worker to multiple independent processes, we ensure they are free to act on multiple users at the same time, instead of a single blocking process.

Polling — A token updater cron runs frequently, to determine which users part of the system need to have their tokens updated (based on their auth-state and expiration periods) — then throws user specific jobs to the token updater worker.

User Authorization State Management

Auth-State

The auth-state of a user is simply the state of the user’s authorization with the Hiver App at any given point in time- correlating to Hiver’s ability to access the user’s Gmail API.

A valid auth-state indicates that currently, we can use the user’s access/refresh tokens to gain access to the APIs referenced in its scope.

Hiver maintains the following possible auth-states throughout a users auth cycle

ACCESS_REVOKED_TEMP = -2ACCESS_REVOKED = -1SUCCESS = 0UNKNOWN_ERROR = 1

The valid auth-state is represented by a 0 or success.
The in-valid auth-states are represented by non-0 numbers indicating either known or unknown errors with varying degrees of permanence. A user with an in-valid auth-state will need some actions by the system or the user themselves before it can be valid again.

Why do we need it?

For Hiver services using the Gmail API, knowledge of a user’s auth-state at a given point in time can impact decision making. Source users of shared inboxes possessing a non-zero auth state, may lead to sync being stopped for the whole shared inbox. Knowing the source user’s auth-state will allow the web application to indicate to the user, via certain UI elements that the user may need to re-authenticate with the Hiver App. Other services may also decide not to make any gmail API calls for a user with a non-zero auth-state, knowing that they do not have access to the said user’s Gmail.

The token updater makes decisions about token refresh and retries based on the current auth state of the user. For example, if a user’s auth-state is 1 (UNKNOWN_ERROR) it may indicate a temporary loss of auth-state due to an internal error or an error during authentication, meaning that the token updater ought to retry a certain amount of times to gain a valid auth-state for the user.

Similarly, an auth-state of -2(TOKEN_REVOKED_TEMP) indicates a temporary loss of auth-state due to either expiry of the refresh token itself or other known errors thrown by Gmail. Knowledge of this allows the token updater to make decisions to retry a finite amount of times and will eventually lead into SUCCESS or TOKEN_REVOKED (which is a more permanent version of the same).
An auth-state of -1(TOKEN_REVOKED) indicates that the user has revoked access to their Gmail, and the token updater need not retry. Instead web services which can redirect the user to re-authenticate with Hiver will do so.

So for a distributed system like the Hiver App, it becomes necessary to maintain these states in a way that allows for the other services to do their processing efficiently with minimal decision making regarding user auth.

A Hiver user’s auth journey

Error management

As we have previously mentioned, the user auth-states heavily influence decision making. A big factor in determining the invalid auth-states are the various internal and external errors that can occur during authorization. Here’s a few of them, and how they affect the auth-state of the user.

Invalid Grant

In Google’s Oauth2 documentation, the “invalid_grant” error is sort of a catch-all for all errors related to invalid/expired/revoked tokens (access/refresh token). This is often the most common error thrown by the auth APIs.

This error can be caused by a myriad of causes, mainly when the user revokes their token, or the refresh tokens are invalidated. A pretty detailed explanation into this error and it’s various causes can be found in this Medium post

As to how the invalid grant affects the auth-state, it needs to be said that sometimes this error may be temporary. For example, this error may occur due to time-bound events such as “too many access tokens requested in a short time”, or throttling due to Google usage limits.
During these cases, it makes sense to retry a certain number of times to see if the error resolves itself. At Hiver, we initially mark the user’s auth-state as TOKEN_REVOKED_TEMP and retry.
If after a certain number of retries, we still experience the same error, we will revert to a more permanent invalid auth-state of TOKEN_REVOKED.

Other Errors

Apart from the known “invalid-grant” errors, we may experience other errors which could be attributed to a myriad of other reasons.

Some of these may be temporary and others more permanent

  • Invalid Response 403 Forbidden — Usually a temporary error that may be thrown by the Google server
  • Internal Failure 500 (Internal Server Error) — Usually a temporary error that may be thrown by the Google server
  • Invalid Client/Invalid Request — When the client credentials or the request params are invalid or incorrect or are not of the right format. May need to be raised and corrected by the developer.
  • Invalid Account — The user account who is being accessed may be invalid or does not exist anymore.
  • Unauthorized Client — The client credentials like id and secret may be incorrect and do not match the ones that are linked to the client app. May need to be raised and corrected by the developer.

Some of these errors can be specifically caught and handled separately by the application; and most others default to an UNKNOWN_ERROR auth-state. This will be retried for a finite amount of time before being raised as an exception.

Final Thoughts

Although there is a lot of documentation and queries regarding google auth tokens available on Google/StackOverflow, few go into detail regarding how to manage them with scale. Hopefully, this gives some insight for you next big Gmail project :D

Also, we’re hiring at Hiver, and if you want to help solve interesting problems like these, feel free to hit us up!

--

--