Deep Dive: How we Built Deploy Previews at Opendoor

Josiah Grace
Open House
Published in
12 min readNov 9, 2021
Opendoor brand color stylized cartoon browsers with 3d UI elements floating around them.

Goal

This is a deep-dive into Opendoor’s frontend application serving architecture with a focus on our support for “deploy previews” (per-commit preview releases available on pull requests, or “PRs”). By the end, you’ll have a good idea of exactly how it works, including what Kubernetes (k8s) configuration we use, how we serve our static assets, and how we support deploy previews.

We’ll cover:

  1. The purpose and benefit of deploy previews
  2. Our legacy frontend application serving architecture
  3. Our current frontend application serving architecture
  4. Our deploy preview implementation

Some of the technologies covered in this post are Kubernetes (k8s) ingresses, Cloudflare Workers, JSON Web Tokens (JWTs), and cookie based authentication.

Ideally, by the end of this post, you’d be able to implement your own deploy previews (or at least have an idea of what a solution for this problem could look like).

If you have any questions, suggestions, or comments, feel free to reach out to @defjosiah on Twitter or leave a comment. If you’re interested in this type of work, we’re hiring!

What we did

At Opendoor, we have a lot of frontend applications. This includes everything from our internal microservice registry and our internal operations tooling to our consumer-facing products. Many of these are business critical and complicated (in both product and technical dimensions). We realized there was a large gap in our development process for these applications — we didn’t have a clean solution for visually or interactively testing out an application in a production accurate environment prior to deploying it.

For frontend applications, production accurate test environments are extremely important. This is for a couple of reasons:

  1. Local development environments are significantly different from production environments. This includes things like minification, hot-reloading, and development builds of libraries (e.g. React development vs. production).
  2. Product and design requirements are as important as the actual code, and a lot of this can’t be automatically tested (user experience, animations/interactions) — and requires collaboration from PMs and product designers.
  3. Integration tests (frequently browser automation) struggle with the local development environment and work best when run against “production” builds.

In order to address these issues, we designed and rolled-out “deploy previews” for our frontend developers. These are production/staging accurate deployments of frontend applications on a per-commit/pull-request granularity — similar to Netlify’s Deploy Previews and Vercel’s preview URLs. Frontend developers at Opendoor push up a new PR, and we automatically build and deploy that application into our application serving infrastructure. We then provide a link to “activate” that preview. Automated tools can also “activate” deploy previews to run their tests on the previews.

Everyday, we connect engineers and non-engineers with these deploy previews. They give us more confidence in our deployments and are a great tool for developing high quality frontend applications at Opendoor.

A bot Github pull-request comment with links to activate a deploy preview
What our developers see when they push up a PR with a deploy preview

In order to implement this feature, we re-architected how we serve our frontend applications, which resulted in:

  1. Removal of ~50 kubernetes deployments (and around 150 pods)
  2. Fixes for downtime and missing assets during new deployments
  3. Fully cached (CDN) static assets and partial caching for release-specific files

In the next section, we’ll take you from our legacy frontend serving architecture to our current one, and then show how we implemented deploy previews using that new architecture.

How we did it

Legacy frontend application architecture

Early on, we settled on a simple architecture for deploying and serving these applications, treating each application as a separate service/deployment with an ingress (in k8s terminology). After a PR was created and passed tests, our deploy pipeline triggered, which would:

  1. Build the application (usually using our customized “create-react-app” configuration, or other similar step)
  2. Load these assets and the entrypoint into a docker image with nginx installed
  3. Create a kubernetes deployment and service for this application (using the docker image created in the previous step)
  4. Create an ingress for the application. The fallback route handles serving assets from our nginx service/deployment, and different paths are mounted to the ingress (i.e. /api routes to backend service gateway-api)
  5. Multiple api routes could be mounted and point to different backend services
  6. Our nginx configuration handled the “single-page application” logic, where request fallbacks serve the “index.html” instead of a 404.

Note: developers don’t write this configuration manually, we have a simplified configuration for developers. They specify a service name and what host/path their app should be available at and the rest happens automatically.

When everything is finished, the k8s ingress looks something like this:

https://gist.github.com/defjosiah/17e4ce0f24566d605e64019f41234c95

With this ingress, requests work as follows:

http GET myapp.opendoor.com/
-> ingress (matches asset path)
-> $servicename-http-assets:80/ (served by nginx docker deployment)
http GET myapp.opendoor.com/api
-> ingress (matches api path)
-> gateway-api:3000/api (served by gateway-api service/deployment)

This system works, but has a couple of drawbacks:

  • We have a deployment for every application to serve static assets, which is both over-complicated and underutilized (our traffic requirements don’t even approach what nginx can handle on a single instance).
  • We don’t have caching of static assets by default, and we’re hitting nginx for each one (we can tune this with nginx, but it complicates the set-up significantly).
  • After new releases, the previous assets are no longer available, and you can get into a state where the browser requests an asset that is not in the new deployment (and it will 404).
  • We build/push a lot of docker images to make this whole pipeline work, which is slow and wasteful.

We knew that we wanted to implement deploy previews, but our existing architecture would make this challenging — deploy previews with this architecture would require a lot of k8s/docker orchestration. We took this opportunity to re-think our architecture and build it to support deploy previews as well.

Updated frontend application architecture

After analyzing our legacy application architecture, we came up with a couple of goals:

  1. Remove our “nginx per application” set-up and move to a shared asset serving structure
  2. Preserve history across deployments (requesting an “old” version of a static asset should not 404)
  3. Support single-page (asset 404s return /index.html) and standard (404s return /404.html) applications
  4. Allow for asset caching through a CDN

And the following solution:

  1. Single s3 bucket for static assets (fronted by Cloudflare)
  2. The s3 path structure segments different applications and deployments
  3. New tooling for uploading a new deployment to the s3 bucket (instead of creating docker images)
  4. The active deployment is defined by the Kubernetes ingress and routes to the proper s3 bucket path
  5. The s3 bucket has a small handler that handles serving behavior for the application (we used Cloudflare Workers for this behavior)
Architecture diagram comparing legacy architecture vs. new architecture. Visual representation of what is described in the architecture sections.
Architecture diagram legacy vs. updated

There is a single s3 bucket that holds all of the static assets and entry points for all of our frontend applications. It has the following structure ({} denotes parametrized fields):

Three main bucket paths:

  1. /{app}/static — permanent app-specific fingerprinted assets
  2. /{app}/{env}/{version} — release specific assets (usually /index.html for SPAs, which then has links to permanent fingerprinted assets)
  3. /static — common unique assets (i.e. opendoor-font.ttf)

A standard create-react-app build outputs the following files:

build/
index.html
...other files...
static/
js/
fingerprinted.{hash}.js
fingerprinted2.{hash}.js

Where the index.html refers to assets in the static/js folder. An example index.html corresponding to the output above is

<html>
<head>
<script src="https://cdn-url.com/{app}/static/js/fingerprinted.{hash}.js"></script>
<script src="https://cdn-url.com/{app}/static/js/fingerprinted2.{hash}.js"></script>
</head>
<body>
<div id="root"></div>
<body>
</html>

Following our “what happens after a PR is merged” example above, this new system is as follows:

  1. Same as before, build the application, which is usually using our customized “create-react-app” configuration, or other similar step
  2. Upload the built assets to the s3, using “app”, “environment” and “version” parameters
  3. The release-specific assets (i.e. index.html) are uploaded to /{app}/{environment}/{version}/{filepath} (i.e. /myapp/staging/git-sha-1/index.html)
  4. The fingerprinted static assets are uploaded to /{app}/static/{filepath} (i.e. /myapp/static/js/fingerprinted.xyz123.js)
  5. Create an ingress for the application. The fallback route’s upstream is the CDN fronting the s3 bucket and different paths are mounted to the ingress (i.e. /api routes to backend service gateway-api)
  6. The ingress uses the “rewrite-target” annotation to map to the correct bucket URL for the current release. Every time a new release is created, we add an annotation specific to that release /myapp/staging/git-sha-1/$1, which points the ingress to the currently released version.
  7. We had to split the previous single ingress into multiple ingresses with the same host, because the “rewrite-target” annotation applies to the whole ingress.

When everything is finished, the k8s ingresses look something like this:

https://gist.github.com/defjosiah/59d67c2b17d7e0c58b1db4096640c832

With these ingresses the requests work as follows

http GET myapp.opendoor.com/
-> ingress (matches asset path) and applies the rewrite-target annotation
-> cdn-proxy:80/myapp/staging/git-sha-1/
http GET myapp.opendoor.com/api
-> ingress (matches api path)
-> gateway-api:3000/api (served by gateway-api service/deployment)

With this new set-up, we don’t require any deployments or separate nginx servers per application, the ingress itself handles selecting the bucket path for that release of the application.

However, we do need one additional step on each request. When an application makes a request to myapp.opendoor.com/ and that transforms to cdn-proxy:80/myapp/staging/git-sha-1, that does not map to a specific file in our bucket (our bucket has /myapp/staging/git-sha-1/index.html in it). We handle this logic with Cloudflare Workers.

We have a simple Cloudflare Worker running, which transforms the requested path /myapp/staging/git-sha-1 to the intended /myapp/staging/git-sha-1/index.html path, by matching the mime-type.

The code is similar to this:

https://gist.github.com/defjosiah/25eafc24f1847b60d06ddc0587e6fff9

This code runs in front of our s3 bucket before every request. It deconstructs our incoming request path /{app}/{env}/{version}/{assetPath} using the “path-to-regexp” javascript library. It then checks if the assetPath is a known “mime-type” (like .html or .js). If it is, then it passes the request to the s3 bucket as is. If it is not, it appends an index.html to the path before passing it to the s3 bucket.

Note: we’re already thinking about how we might be able to replace s3 with Cloudflare’s R2 Storage

Examples:

  1. Incoming request /my-app/staging/git-sha-1/ sent to s3 bucket as /my-app/staging/git-sha-1/index.html
  2. Incoming request /my-app/staging/git-sha-1/release-specific.js is sent to the s3 bucket as-is.

Reader questions: how would you implement behavior where /about routes to /about/index.html? How would you support a fallback 500 page if anything goes wrong in the worker?

This architecture/implementation enables us to cache release specific files. It also provides a way to host our static files, which come from the *.html files, and enables us to skip having to build docker images and do separate k8s deployments for our applications. Frontend applications at Opendoor only have k8s ingress configurations and simple tooling to upload assets after they are built.

Deploy previews

Starting from our updated frontend application architecture, implementing deploy previews is an (almost) straightforward feature. There are a couple of requirements that we have; deploy previews should:

  1. Be created for every commit on a PR
  2. Require a user to be an Opendoor employee (note: this is a risk-reduction requirement, imagine a super-secret redesign that is accessible before launch)

3. Not require any database/storage lookups on the asset request path

4. Exist on the same host + paths as the deployed application

We have a lot of cookies/experiments and ingress routing that would be hard to replicate if we put deploy previews on separate URLs (like preview-pr-12345-{version}.opendoor.com) like other implementations do.)

With our current architecture, where the current release is encoded in a specific bucket path /{app}/{env}/{version} mounted on the ingress, and given our above requirements, we can implement deploy previews by:

  1. Uploading a release from a PR to a new environment/version bucket
  2. Adding context (i.e. a cookie) to a request so that we use the “preview” release instead of the release encoded in our deployed ingress
  3. Authenticating this context with an authentication service that can validate whether or not someone is an Opendoor employee

For deploy previews, we’ll start with uploading a new version to our s3 bucket paths during the PR. For example, if you open up a PR with frontend changes, we’ll build the application and upload it to the s3 bucket path /{app}/pr-12345/git-sha-2 (and static assets at /{app}/static). We’re essentially creating a new release for every combination of pr-# and pr commit sha.

From here, deploy previews can be implemented by providing a way to transform the path that ends up at our s3 bucket from the current release (encoded in the deployed ingress) /myapp/staging/git-sha-1 to our new release /myapp/pr-12345/git-sha-1. If we do this then the developer will be viewing changes in their PR instead of the currently active release.

In order to do this, we need to attach context to the request, which is modeled in the http world as a cookie. Our deploy preview implementation is as simple as providing environment and version overrides to our asset http requests. We’ll do this with a JWT (json web token) that’s sent with asset requests. Our Cloudflare Worker will recognize that there is a cookie with environment and version overrides, and it’ll substitute the original environment and version with the replacements.

This works as follows:

http GET myapp.opendoor.com/
Cookie:deploy-preview-myapp=jwt({"env": "pr-12345", "version": "git-sha-2"})
-> ingress (matches asset path) and applies the rewrite-target annotation
-> cdn-proxy:80/myapp/staging/git-sha-1/
-> Cloudflare Worker
1. Called with /myapp/staging/git-sha-1/
2. Recognizes that there is a cookie with environment and version overrides
3. Transforms the incoming call to /myapp/pr-12345/git-sha-2/
4. Performs mime-type logic and sends to s3 bucket

Our final step here is actually setting the cookie that activates a deploy preview. There are a couple of constraints here:

  1. The cookie should only be set for authenticated Opendoor employees
  2. The cookie should be tamper-proof (i.e. if someone manually edits the cookie version or environment, it shouldn’t activate the deploy preview)
  3. Multiple deploy previews can be active at the same time

We solved these constraints with signed JWTs for the cookie context and an auth/redirect handshake to set the cookie onto the proper domain. This is all kicked off by a URL that is added as a comment on the PR once the deploy preview is uploaded as a release.

Developers will see something like:

https://deploy-preview.opendoor.com/activate?env=pr-12345&version=pr-git-sha-1&rd=https://example-app.opendoor.com/

When a developer clicks this link, it would take them through an SSO (single sign on) authorization flow, before allowing their request to get to the deploy-preview.opendoor.com/… url above. From here, the service generates a signed JWT (with the environment and version encoded), and builds a 302 redirect URL to a path that is mounted on the application’s ingress:

302 to https://example-app.opendoor.com/_deploy_preview/start?token=jwt({env: pr-12345, version: pr-git-sha-1})

This route _deploy_preview/start is mounted to the application’s ingress. A route that is mounted on the domain can set cookies for that domain. We can then use this route (which sends request to the deploy-preview service above) to:

  1. Verify the token that was set and signed in the previous redirect
  2. Respond with the final deploy preview cookie

302 to https://example-app.opendoor.com/ Set-Cookie: deploy-preview-example-app=jwt({env: pr-12345, version: pr-git-sha-1})

Now, the deploy preview is activated! The final request has the proper cookie context and will return the deploy preview release (instead of the original release). The final request is:

http GET https://example-app.opendoor.com/ Cookie: deploy-preview-example-app=jwt({env: pr-12345, version: pr-git-sha-1})

This final request has the context required to serve the previously uploaded deploy preview.

note: This handshake is required so that we can set our cookies on the most specific domain (i.e. example-app.opendoor.com/ instead of something like .opendoor.com (which is necessary because we have some staging domains that are not opendoor.com).

This required adding one more path to our ingress, so our final ingress looks like this:

https://gist.github.com/defjosiah/56a3ac1a53e3d47e694612851324c7b5

Our final step here is clarifying the Cloudflare Worker code that is needed to actually parse this deploy preview cookie and use the alternate s3 bucket release path. That code is similar to this:

https://gist.github.com/defjosiah/e1e7f2eb79a07ee5581fb07872973c10

With this, we have deploy previews at Opendoor! After developers activate the link, they’re taken to their app, where we inject a warning so they are aware that they’re in a deploy preview. The warning is injected using the Cloudflare Workers HTML Rewriter, which injects a script tag into the *.html pages which surface the warning. The deploy preview is active until the developer clears the cookie. The final view for the developer after everything is this:

A view of an application after the deploy preview is active. It is a red outlined modal with warnings.
Deploy Preview welcome screen/warning

Wrap-up

Our static asset serving architecture supports all of our frontend applications at Opendoor, and deploy previews have been a major benefit for developers. This architecture works well for our use-cases and has been in production since mid-2020. There are some things not covered here, like how we rolled this out, errors/analytics for the previews, why didn’t we just use Vercel or Netlify. Feel free to leave a comment or reach out if you’d like to know more.

If this type of project is interesting to you, we’re hiring!

--

--

Josiah Grace
Open House

Software Engineer @ Opendoor. @defjosiah on twitter.