PKCE at Apartment List

Laura Poss
Apartment List
Published in
5 min readFeb 4, 2021

As the user base has continued to grow here at Apartment List, the need for an identity platform that is fast, scalable, and secure has become abundantly clear. We have recently embarked upon the journey to revamp this platform to accomplish the management of the 3 A’s: authentication, authorization, and accountability.

In the spirit of starting small, the first piece we decided to tackle was adding fine-grained access control to one of our client services. This required creating a new service to manage authentication and authorization, as well as extending the client service to perform user session management and embed JWTs in the header of outgoing requests.

To facilitate authentication, we decided to use PKCE and will detail our implementation of the flow below.

PKCE

PKCE stands for Proof Key for Code Exchange. It is a protocol used when you cannot store or pass a client secret securely, such as when you are writing a native app or a single page app (SPA). A native app’s binary can be decompiled, and a SPA’s source code is one “view-source” click away.

This diagram represents the full flow, from a user visiting the client service in their browser, all the way to being successfully logged in. Note that Google’s SSO is serving as our identity provider, so users authenticate on a page provided by Google. The initiation of this particular authentication flow is formless (we want to automatically redirect a user to login without any additional steps from the user if they are logged out), so we were unable to use redirects as POST requests. As a direct consequence, additional data must be passed as query parameters, which is an insecure method of data transfer and the reason PKCE is necessary. (Redirects are always GET requests unless you redirect via a form submission, in which case the redirect will be a POST.)

The diagram starts with a user navigating to the client service in their browser. A piece of middleware in the client service runs to validate an existing session, if present. Assuming there is no valid session, the client service begins the PKCE flow by generating a code verifier and a random state. These values are then persisted to a Redis cache using the state as the key and the code verifier as the value. The client service then hashes the code verifier using the SHA256 hashing algorithm and encodes the hash using the URL-safe base64 encoding scheme. This encoded hash is known as the code challenge. Finally, the client service performs a browser redirect to the auth service, including the state, client ID, code challenge, and code challenge method as query parameters.

When the auth service receives an authentication request, it persists the client ID and the code challenge to its own cache, then redirects the browser to Google’s SSO. The redirect request includes the state generated by the client service. The purpose of the state is to tie together a login flow across all of the various players involved. Without the state, later on in the process, the client service would not know how to choose the correct code verifier to send to the auth service for validation.

Google is responsible for redirecting the browser to its SSO page where the user must enter the correct credentials. Once the user has authenticated, Google redirects to a pre-configured endpoint in the auth service. Google provides its own code as a query parameter that is used to fetch details about the user that was just authenticated, like the user’s email address. Google also supports a mechanism for passing general parameters that Google will return to us after a user has authenticated, which we use to pass our state.

In the auth service, we exchange the Google-provided code for a token that enables us to get the email address of the user. The auth service then generates and caches a final code and redirects to the client service, including this new code and the state as query parameters.

Using the state, the client service retrieves the original code verifier from its cache. With the code verifier, its client ID, and the code just generated by the auth service, the client service makes a request to the auth service with the goal of retrieving a JWT. Now it is time for the auth service to perform validation of the code and code verifier. The code is easily verified by using it in combination with the client ID to see if a cached entry exists. Verification of the code verifier is slightly more involved since the auth service only has the code challenge cached. As a result, the auth service must replicate the steps originally taken by the client service of hashing and URL-safe base64 encoding the code verifier. Once the auth service has generated its own code challenge, it verifies that code challenge against what it has cached. If both of these validation steps pass, the auth service will generate a JWT for the user and send the JWT to the client service.

The final steps in the client service are to persist the JWT in its session store and to redirect the browser to the homepage. Note that the same middleware that ran in the beginning will run again upon this last redirect, but there is now a valid session, so the user is granted access to the site.

Gotchas

Implementation of this authentication flow was not without its challenges. The first issue that we ran into was not using URL-safe base64 encoding when sending the code challenge from the client service to the auth service. Use of the URL-safe base64 encoding scheme is necessary because we are passing the code challenge as a query parameter. If you do not use the URL-safe scheme, then the + characters, which are valid standard base64 characters, will be encoded into spaces during the API request. The PKCE RFC makes explicit mention of using URL-safe base64 encoding, but it was something we had originally overlooked.

The second challenge we ran into was rendering a login error page within the client service. Our client service consists of a React frontend and an Express backend, with a piece of middleware that runs on all requests to check validity of a session. During build time, the frontend static assets get fingerprinted: their filenames change each time the service is built. We had originally added a login error page to the frontend to be displayed whenever an authentication attempt was unsuccessful. However, this page was lumped together with other pages to form an asset whose filename we could not predict, so we could not add a rule to the authentication middleware to ignore session checks on just the assets related to the login error page. We tried skipping the authentication middleware for requests to any asset, but that was far too liberal a rule and rendered authenticating useless. We ended up serving the error page from the backend, whose route was constant, which afforded us the ability to precisely disable the authentication middleware for that route alone.

There is still much to be done to complete our identity platform. We need to support new user types and transfer existing user types to the new system. We need to integrate all of our existing services with this new platform. We need to build and integrate a service to support auditing actions. An exciting journey awaits!

--

--