How We Got Our Content Publish Time Down To 1 Minute

Max Novak
Policygenius
Published in
6 min readDec 20, 2019

Content is king. Here at Policygenius we are focused on giving people the tools they need to make good decisions about their financial protection and a huge part of that is educating users via content.

We do that through our team of writers, who are focused on understanding the insurance industry and breaking down complex topics into layman’s terms. This core idea of education is why we have taken a proactive approach to deploying our content rapidly, and making the process as painless as possible for our writers.

To do this we utilize GatsbyJS to statically build our site and Contentful as a CMS (Content Management System) for our writers. GatsbyJS, is an open source React-based static site framework, which allows us to quickly create new templates and components for our writers so they can focus on writing new amazing content. These two tools allow us to build a static site that can quickly serve rich content to our users and teach them about insurance.

While we could serve our content directly from our CMS, we chose to statically build our site for speed. A static site only needs to be served and rendered, instead of also making a trip over the wire to fetch data from the CMS. However, after launching our initial static site with Gatsby, our build times started to grow. After publish time surpassed 20 minutes, we realized this would be a major blocker for our content team and focused on optimizing these publication times.

After investigating our publication process and researching the build time increase, we discovered that the root cause was adding new pages to our site. As we increased our content output, our build time would grow linearly, further delaying our content from going live. We even encountered an issue where we started to hit the Node.js memory cap during our build because we had too much content!

How much our content has grown over time

Getting the most out of Gatsby

Originally, creating our site using GatsbyJS and Contentful was a one-off project. However, the importance of content led us to create a whole new team to support our writers. In January 2019, a new Content Engineering team was created and our first task was to decrease the build time.

We had a couple of different approaches to tackle build times. The first approach was to upgrade Gatsby to the newest version, which was released the previous September. This release boasted a reduction in build time by 75% and a shrink in JS client runtime by 31%. With this upgrade we were able to drop our build time by 35%, to just a bit over 13 minutes! While not as huge of a drop as we were expecting, this was still an amazing first step.

Quick shout out to the Gatsby team, they have been doing amazing work and this past summer released another optimization for large sites around how they organize their data files, check it out here.

The next thing we tried was figuring out if we could cut off any cruft from our build process. We started to look at our build pipeline and found that we were moving around a lot of files unnecessarily. We would build our site with Gatsby, push it up to a Google Cloud Bucket, download it, download some other legacy static assets, and then push all of that up into a different Google Cloud Bucket to statically serve our site. We were able to roll that all up to a single step that would download legacy assets, build Gatsby, & push up to the bucket.

Illustration of our old deploy pipeline

Cleaning up Contentful

We also discovered that we were requesting way more data than we needed from Contentful! Contentful organizes its data into “content models”, which you can think of as database tables.

When Contentful creates links between content models, it actually includes the entire data object. In most cases that is good and we want that data, but sometimes we just want to get a link to another page’s URL. Essentially, whenever we have a page link to another page we were actually grabbing the entire page’s data and its assets as well, which is a lot of data (remember that JavaScript memory cap problem?).

After cleaning up our requests to Contentful and the pipeline, our build time dropped to 4 minutes! While it might seem like we had accomplished our goal, we knew that this cleanup still wasn’t 100% sustainable. Our content would continue to grow as our writers continued to produce it, and someday we would have super slow builds again.

Illustration of our new deploy pipeline

So, we decided to shrink our build to the smallest piece possible - a single page - and deploy that on its own. The initial changes for this were fairly simple, i.e. use a Contentful UI-Extension to create a button for writers to kick off a build, which could pass the page id so that it only builds the page we want. Gatsby would then build a small package with uniquely hashed data, CSS, & JS files that would deploy to our Google Cloud Bucket.

This hashing makes the packaged files for our full site and our single page unique. That meant we could push the files up to the bucket without conflicting with the existing build. We could then push up the index.html that pointed to these new packaged files and the page would go live with no downtime. This also meant that our writers could edit pages and deploy them without risking the page becoming a broken link for even a second.

Using an internal reference in Contentful to populate a link on one of our pages

Handling hard-coded pages

However, when we looked at these builds we discovered we were also deploying a few hard-coded pages that exist inside Gatsby, rather than in Contentful. These pages were intentionally hard coded to allow us to quickly iterate on their format without having to worry about modeling our data and putting it into Contentful.

These hard-coded pages were also built when we built new pages, and we deployed them as well. So, as a way to eke out a little more speed for our single page deploys, we started to investigate how we could remove these pages from our Gatsby build. We quickly found that this was actually a feature of Gatsby, and so we would have to do a small hack to only build the single page.

For background, before building your site Gatsby compiles all of your data and organizes it into a GraphQL object so that your pages can query and populate themselves. Before it actually writes out this data to the GraphQL object, it exposes a method called `onCreatePage()`. This method allows you to go in and manipulate your data a little before everything gets built out. So, we removed our hard-coded components and left just the new page. This means that our final package for a specific page will be as small as possible, with just the required logic for that page to deploy.

The end result

With this final change, along with the other tweaks we made along the way, we had finally gotten our deployment down to an average of 1 minute and 10 seconds!

We still do a full site deploy once a day in order to keep everything in parity, but our writers are now able to push a button and then see their page live without having to pass the time by grabbing a cup of coffee while their page loads.

We’re growing!

We’ve gathered a pretty great bunch of product, design and engineering folks to tackle some pretty tough problems in the insurance industry. Fun fact: we still need a lot more. If you’re interested in what we have going on here and want to help us on our mission to make insurance suck less, mosey on over to our careers page to see if something fits.

--

--

Max Novak
Policygenius

Software Engineer that likes to futz about with games