How we shaved 1.7 seconds off casper.com by self-hosting Optimizely

Kyle Rush
The Casper Tech Blog: Z++
7 min readAug 28, 2018

--

We recently deployed a change to casper.com that loaded a piece of 3rd party JavaScript from our own server instead of the vendor’s server. This change shaved 1.7 seconds off of the start render time:

Measurement taken on Desktop Chrome w/3G network connection

The 3rd party JavaScript in question is from a company called Optimizely. We use their client side JavaScript to conduct a/b tests on casper.com. Once the JavaScript file is downloaded and executed, it changes the document for 50% of our website visitors to measure how they react to the change. To ensure the smallest possible flash of unstyled content (FOUC) we follow the best practice of loading Optimizely before anything else, in a blocking manner.

As would be expected, loading the JavaScript snippet in this way has a negative impact on the web performance of our website. It’s a trade-off we struggled with for a long time. Should we follow web performance best practices and load the Optimizely JavaScript asynchronously, or follow experimentation best practices and load it blocking as the first asset? Both of these approaches have their pros and cons.

In one of our web performance audit conversations, we decided to experiment with techniques to address the performance problem. Our first idea was to take Optimizely out of the <head> and/or put the async attribute on the <script> tag so that it does not block render and loads asynchronously. Our Product Management team pointed out the resulting flash of unstyled content (FOUC) effect would be a bad customer experience. Our Data and Analytics team pointed out that we would likely suffer from a drop in data integrity if the JavaScript is loaded later in the process.

The next best thing we could think of was self-hosting the Optimizely snippet. Optimizely actually has a knowledgebase article that encourages this. Typically, vendors like Optimizely give you a URL of a JavaScript file (which they host). The problem is that this causes a new DNS lookup and a new HTTP connection and SSL handshake with the vendor’s server. Another cost of loading it this way is that you miss the opportunity to serve the asset with HTTP2 multiplexing, a much more efficient way for a browser and server to communicate. As you can see in the screenshot below, from one of our performance tests, this was causing 39ms of latency for DNS lookup, 54ms of latency to establish a server connection, and 135ms of latency for the SSL handshake. Additionally, there’s 175ms of latency on waiting for the first byte, which would be eliminated if we could utilize HTTP2 multiplexing.

One last benefit of self-hosting the file is that we would have more control over the edge (CDN) and client (browser) cache. Optimizely doesn’t give you control of their edge cache, but they do give you control of the client cache. There is a setting that allows you to configure the cache-control value, which for us was set to 2 minutes. This is an ideal setting for us when the file is hosted by Optimizely.

To test our theory that self-hosting was better, we manually copied the contents of the Optimizely javascript file, saved a version on our server, and replaced the reference in staging to point to our self-hosted version of the file. The results were not spectacular. They were so underwhelming that one of our data analysts said it wasn’t worth the effort to shave 200ms off the start render time. And we agreed with that!

We kept pushing though because we believed our staging environment wasn’t a good place to test this kind of performance change. Our staging environment is missing a lot of 3rd party JavaScript that only runs in production. So we devised a production test in which the data analysts wouldn’t make any changes to Optimizely for 3 days while we deployed the static, self-hosted version of Optimizely.

The drop is the period in time that the self-hosted version of Optimizely was in production (measured on Chrome desktop, cable network connection — at the time we weren’t measuring performance on 3G network speeds, which is why the graph at the beginning of the article has a bigger effect, but 3G is now our standard network speed for measurements)

In the chart above, you can see a drop in the start render time from the period of time that our self hosted, static version of the Optimizely snippet was live in production. By self-hosting, the start render time dropped substantially because we eliminated the DNS lookup, Optimizely server connection, SSL handshake, time to first byte, and enabled H2 multiplexing.

We weren’t quite ready to make this change permanently though. The way Optimizely works is that if a change is made to an experiment, the JavaScript snippet is updated on the Optimizely server. The change might be starting/pausing an experiment, changing an experiment, etc. Any change that is made generates a new version of the JavaScript file. Since we were just loading a static copy of the JavaScript file in production that we manually copied, we couldn’t keep it there forever because we’d never be able to start/pause experiments. It would also be too much of a lift for our software engineers to manually copy over the new file every time it changed. So now that we saw the benefits of this approach, we had to figure out how to dynamically load the newest version of the Optimizely snippet from our own servers.

To do this, we created an AWS Lambda that runs every 60 seconds. When it runs, it sends a request to optimizely.com for the JavaScript file. It creates a hash of the file and checks S3 to see if the hash changed (we store the hash from the last execution in a file on S3). If the hash changed, then it saves the new JavaScript file to S3 with part of the hash in the filename (example: snippet-c36d504bc3c26479f1181e6119617a64.js). Next, the Lambda sends the hash to a dictionary on our Fastly edge server. This is where the magic comes in. We configured our edge servers with a combination of an edge side include (ESI) and edge dictionary to dynamically insert the latest Optimizely JavaScript file name into the HTML of every page served out of the edge servers. This allows us to update the reference to the Optimizely file at the edge instead of having to redeploy the website every time the file changes.

Here is a screenshot from WebPageTest measuring the performance of the new Optimizely file hosted by Casper:

And here is a side by side comparison of data collected prior to self-hosting and after via WebPageTest:

Ideally we’d be presenting 95th percentile of real user monitoring (RUM) data for these values, but we haven’t fully implemented this for casper.com. There is some presumed volatility in the Optimizely hosted times (for better or worse — we aren’t certain) and on the Casper hosted content download time because these are synthetic tests.

Here’s a waterfall that shows HTTP2 multiplexing at work on casper.com and the Optimizely file. Notice how the content download for the top 5 assets starts at nearly the same time for all of them.

And lastly, as mentioned earlier, self-hosting gives us more control over caching. We configured our edge servers to keep the file in the edge and browser cache for a full year. We are able to do this because the filename is unique to the contents (we add part of the file’s hash to the filename) and replace the reference to the filename when it changes. This way, if we don’t make any changes to the Optimizely snippet, the repeat visitor’s browser will not even make a request to casper.com for the file. It will instead pull the file directly from the cache on the user’s filesystem. Super fast!

Here you can see the benefits of the file being served from the browser cache:

The downside to this approach is that website visitors will not experience optimal caching when we modify the Optimizely snippet frequently. As our business grows, it is possible that our data analysts will run more a/b tests, requiring frequent changes to the file. This could result in website visitors needing to download multiple versions of the file during their visit to casper.com. We track each time the JavaScript file is changed in a custom DataDog dashboard:

In this chart we can see that there was a 3 hour period on Thursday the 23rd where the snippet changed about 25 times. It’s unlikely that a large number of visitors would be downloading multiple versions of the snippet at this change frequency because our average visit duration is not very long. Overall we think there are more benefits to self-hosting than drawbacks.

This project was about a month’s worth of on-and-off work from our software engineers, product managers, site reliability engineers, and data analysts. It was a great example of some performance-minded people on the Casper Tech team identifying an issue, finding an elegant solution, shipping it to production, and making a huge impact for our customers.

By the way, we’re looking for a performance engineer to join our NYC Tech team! If you’re interested, please reach out to me on Twitter @kylerush.

--

--