Migrating Single-Page Applications from Heroku to AWS

Published in

Extra Credit-A Tech Blog by Guild

7 min readOct 1, 2020

In the second quarter of 2020, Guild Education began transitioning our Single-Page Application (SPA) hosting stack from Heroku over to AWS. For us, SPAs were only the first major step in migrating all of our deployment infrastructure to the new platform. We chose AWS, despite several robust and fully-featured alternatives like Heroku and Netlify, for the following reasons:

Cost. When Guild started in 2016 and had only a handful of applications, cost was not a major factor. But we now have 14+ deployed SPAs, with more coming online each year.
Flexibility and capacity to scale. As we scale our teams and capabilities, custom tooling built to align with our needs and use cases enables our business and engineering teams to move faster
Access control, auditability, and compliance (e.g. SOC). Serving a broad spectrum of employers, university partners, and students means more responsibility to keep our data safe and secure.

AWS Migration Goals

After settling on AWS as our new platform of choice, we set several main goals for our re-architected SPA deployment:

Decouple application hosting from application routing. Today, Guild relies on Cloudflare to handle our routing (e.g. ensuring that when a user hits a url, they land where we want them to). But we saw a future where we could replace Cloudflare with AWS technologies like Amazon Route 53 and Lambda@edge. To allow for this evolvability, we wanted our routing implementation to be replaceable without affecting our hosting infrastructure. To accomplish this, we created an API interface between the routing and hosting layers.
Establish a standard pattern for how SPAs are deployed and configured. With our previous Heroku solution, our SPAs had many divergent deployment and configuration patterns. These differences made it harder for engineers to work across teams because of the overhead of needing to learn new patterns. Examples of some differences include: a) Different tooling chains for CI/CD (GitHub actions vs CircleCi); b) Some applications used Cloudflare DNS for routing while others used Cloudflare workers; c) Teams had their own URL patterns and webpack configurations for how they handled test/pull request applications. We deemed the migration an excellent opportunity to rethink our frontend deployment patterns and standards and improve developer experience.
Continue to support Guild’s multiple URL patterns. Guild serves web applications via two main URL patterns: subdomain-based URLs and path-based URLs. Our new infrastructure had to support both in order to be successful. The subdomain serves as the unique identifier in determining what application a subdomain-based URL serves. Subdomain based URLs take the form https://subdomain.guildeducation.com. Path-based URLs take on the form https://subdomain.guildeducation.com/path. The path serves as the unique identifier in determining which application gets served. This means different subdomains could serve the same application if the path is the same. For example, we host our programs catalog at https://walmart.guildeducation.com/catalog and https://disney.guildeducation.com/catalog, but both URLs serve the same application code.

New Architectural Overview

With our platform and technologies chosen, requirements set, and our goals defined, we set out to design a solution that would meet these goals.

Overview diagram of our SPA deployment architecture. — Overview of our SPA deployment architecture.

Our deployment architecture can be divided into two main categories:

SPA hosting
URL Routing

As mentioned earlier, this split comes from the desire to separate our hosting and routing layers. By splitting them up, we are better able to design for future changes.

SPA Hosting

“Hosting” simply refers to where the SPA files (html, css, and javascript files, etc.) physically live, and a URL that can be hit to access them.

Our new design uses Amazon Simple Storage Service (S3) for hosting. Each SPA is given its own S3 bucket. Some of the key advantages of S3 are its high availability (99.9%) and robust backups. For any public-facing website, where being available at all times is critical, these features provide major advantages. S3’s biggest drawback is that it can only host static artifacts. Because our SPAs are client-side rendered, this limitation is ok.

For URL access, SSL security, and caching, Guild uses Cloudfront (not to be confused with Cloudflare, already mentioned, for DNS and routing). AWS Cloudfront provides us with a unique URL that can be used to access the application hosted in a bucket.

Tips for S3 hosting

Keep separate buckets for each SPA. Doing so will help keep each SPA isolated and allows for independent deployment lifecycles.
Enforce naming standards to enable discoverability. You want it to be fairly obvious what a bucket does, just by looking at the name. We name our SPA buckets after the Github repo they serve, and the domain name they are hosted on. (E.g. {github-repo-name}.{domain-name}.com). This allows us to easily identify which application a bucket hosts.
Remember to make use of object “metadata”. We use metadata to provide HTTP headers like MIME types and cache control headers.

URL Routing

URL routing emerged as the most challenging aspect of our deployment architecture. It also proved to be the most vital in accomplishing the decoupling we were going for.

Previously, each application employed its own unique way of routing its public-facing URL (i.e. the route a user visited to see a page) to the URL that Heroku provided for an application deployment.

To address these challenges, we broke up URL routing into two parts:

Translating a public request
Translating an asset request

Translating a public request

Diagram showing that a user-friendly url is translated into a more complex Cloudfront url that points to an S3 bucket

A public request refers to the process of translating a public URL into the Cloudfront URL that the requested application is served from.

To better understand how this works, let’s walk through the following scenario:

I am a user who visits https://walmart.guildeducation.com/catalog. The code that this URL serves lives in a Github repository called programs-catalog. How does that source code make it to my browser when I visit https://walmart.guldeducation.com/catalog?

A Cloudflare worker translates the public URL https://walmart.guldeducation.com/catalog into a standardized internal URL: https://programs-catalog.guildeducation.com
Next, a Cloudflare DNS entry translates the standard internal URL https://programs-catalog.guildeducation.com to the correct Cloudfront URL in AWS

The core idea is that we created a standardized internal URL that uniquely identifies a Github repository. We settled on the format https://{github-repo}.guildecucation.com. For this example, the standardized internal URL is https://progams-catalog.guildeducation.com

We chose this pattern because Github repos are unique within orgs, and it’s an easy way to identify which repository a URL refers to.

Translating an asset request

Continuing with our example from above, once https://walmart.guildeducation.com/catalog has been translated and its index.html page has been returned to the user, the user’s browser now needs to get the static assets that are required to render the page.

A static asset is a file that is loaded via a link or script from an HTML document. Examples of assets include javascript files, css files, and images.

Because we need to support path-based applications like in this example, we need to use absolute links for our static assets.

The key insight is that we can reuse our standardized internal URL to create an absolute link for all our assets. As demonstrated above, the Cloudflare DNS record correctly translates that internal URL to the Cloudfront URL. By utilizing this strategy, we are able to standardize the way all of our SPAs create links to their assets.

Putting it all together

The solution outlined above accomplishes several of the architectural goals we set out to achieve.

Decouples routing, application builds, and hosting infrastructure

We created two primary interfaces that serve to decouple the hosting and the routing layers:

Standardized Internal URL

This serves as a way for the routing to map assets and public URLs back to the correct DNS record for an application. Applications need to know only about this URL. Because it’s simply a standard, it can be reused even if the routing layer is replaced.

Cloudflare DNS record

The hosting infrastructure automatically creates a DNS record in Cloudflare when it’s created. The consequences of this interface means that if the routing layer gets replaced later, we only need to change where the hosting infrastructure creates this DNS record.

Establish a standard pattern for how all SPAs are deployed and configured

All applications, regardless of their public URL requirements, are configured in the same standardized way: a standardized internal URL representation. This creates better uniformity across teams, which helps increase velocity as we scale.

Conclusion

Our migration from Heroku to AWS enabled us to redefine best practices and standards in how we deploy SPAs. Our utilization of AWS and Cloudflare fit our needs well, but isn’t necessarily recommended for everyone. Most existing patterns for hosting SPAs in AWS recommended using all AWS services. If you’re able to do this greenfield, I’d recommend using AWS-first technologies for DNS and routing, like S3 and Edge.

However, hopefully this pattern serves as a guide for other teams or companies that are looking for migration strategies to AWS and ways to create a more decoupled SPA deployment architecture.