Lambda@Edge and S3 Landing Page Caching for Performance and Scale
By: Regis Wilson
At TrueCar, we spend a lot of time making sure our pages look attractive and display information cleanly and clearly. Most of our engineering time is spent making sure our web pages have the most useful content available, look great and are intuitive to use. We have a solid development platform and cloud infrastructure that delivers pages quickly and effectively. However, there are some glaring spots in our site where pages render slowly or perform less than optimally on mobile connections.
We are working with a new set of initiatives toward creating useful and beautiful landing pages, with great consumer information for browsing and comparing cars and trucks. These pages will help consumers find more information about the cars and trucks they are interested in. Some of these pages have required gathering larger amounts of data than we had previously worked with in building our site. Some of these new pages, those that incorporate data in the form of reviews and long lists of options, are accessed relatively infrequently, while still containing important information for users. All of this meant that our page load times, in some cases, were longer than a few seconds.
Given our goals of attracting more user traffic and serving the pages quickly (although perhaps infrequently), we needed a way to improve landing page times in order to get them to display as quickly as possible.
If you would like to skip the discussion of this process and simply jump to the technical and results sections, move forward two sections.
Describing the Problem and Forming the Goal
We started off measuring some of our more important pages, those to which we wanted to attract organic traffic. Organic traffic in this case means traffic coming to us from search engines or pages visited during browsing and researching our site. The first page we launched was the Model Overview, which showcases models of cars, including useful information consumers might want to use in order to find out if a certain model of vehicle is right for them. The second page we launched was the Model Comparison. These pages allow users to compare two models side-by-side, with every option laid out next to each other for comparison.
We discovered that our pages varied wildly in terms of performance. Additionally, our previous targets for page performance were inadequate for slower and mobile connections. After dissecting our page load times, we found a few opportunities to target the initial load times for pages, allowing us to get a head start on performance optimizations for our pages and hopefully see some noticeable improvement in our consumer metrics.
During our migration to the cloud over the last two years, we set a target of building and designing pages with a Speed Index of 2000. According to WebPagetest, “The Speed Index is the average time at which visible parts of the page are displayed. It is expressed in milliseconds and dependent on size [sic] of the view port. The Speed Index metric was added to WebPagetest in April, 2012 and measures how quickly the page contents are visually populated (where lower numbers are better).”
Where did our target number of 2000 come from? Easy: our legacy datacenter site regularly measured pages with a Speed Index of 4000, so we simply halved that figure to come up with our goal! (We have since discovered that highly optimized sites regularly target Speed Index scores of less than 1000, so in hindsight, perhaps 2000 was not such a lofty goal after all.)
We were either very lucky or very talented — or both — because we found that most of our newly built site pages already clocked in with a Speed Index score between 1000 and 2000, just exactly where we thought we should be. We believe this was due to the anchoring effect, where initial estimates or “anchors” can result in decisions that are close to the anchor.
While our initial goal may not have been highly ambitious, it was important for us not to create an unachievable goal that might cause us to expend vast amounts of energy and money chasing a fruitless and impossible dream. In the end, we reassessed our goal and decided that we would aim for achieving a world-class and best-of-breed Speed Index of less than 1000.
It was also crucial to measure any business impact we expected or could foresee with risks or rewards for having a low Speed Index. Anecdotally, decreases in page load times have been correlated with better business outcomes. At TrueCar, we believe that business outcomes will be measured over time, so we elected to leave that off the table for now. If we do note any measurements for improvements, we’ll try to share as much as we can.
The Lowest Fruit Is the Best Fruit
When looking at options for improvement, it is common to ask the question, “What is the low-hanging fruit?” The assumption is that the lowest and (hopefully) easiest to pick fruit will be sweet and delicious. However, that’s not necessarily true. Fruit may fall off the tree and lie at the lowest point possible — directly on the ground. There, it can come into contact with insects, animals, gastropods, bacteria and dirt. So the fruit that is on the ground certainly may not be better and may in fact be significantly worse than the fruit that is just a few meters up the tree. Excluding the fallen or rotten fruit, then, is it necessarily true that the lower a fruit is, the better it is to eat? Are strawberries always better than apples or cherries? [Personally, I think mangoes are too far down that list and apples are too high, but that’s not important].
Suppose that all the fruit within easy reach has been picked in some previous harvest. Now, the fruit that is higher up might be more ripe and more plentiful, and it may benefit from exposure to more sunlight. It may be more tasty and rewarding to grab the higher fruit, as elephants have learned to do. While the coconut is a dry drupe, we can call it a fruit for the purposes of our discussion. But the coconut tree has no low-hanging fruit unless you wait for it to fall, and an experienced climber can easily accomplish the task of dropping several water and soft meat-filled “nuts”. If coconuts are fruit (which they loosely are), then we can never get any low hanging coconuts unless one falls down. And of course, if we restrict ourselves to only the lowest-hanging fruit, we will never be able to compare it with the upper fruit. How will we know what the high-hanging fruit is even like if we never try it? It could be the best fruit of all time, but we’d never find out. Clearly, we need a better formulation for this tired saying.
Using this new insight, we actively sought out the best-tasting, most juicy fruit in our landing pages, regardless of how high or low each was. That is, we allowed each fruit to exist as an individual, expressing the value of its character rather than being based on the height at which it is most easily harvested.
We have implemented the Sitespeed.io page testing tool to measure our landing pages and help us get an accurate view of how the page performs at various points of the browser loading cycle. We also get detailed reports, including a page HAR waterfall, for the test runs, which we can analyze to look for improvement spots.
Looking at a breakdown of the page timings from one landing page, we identified a fair amount of room in the Server Response Time metric (which is in milliseconds, so 1000 ms equals 1 second). This metric closely resembles the Speed Index, as discussed above. Taking more than a full second to render a page for the end user is simply too long, and we knew we could do better. There was a big gap between Page Download Time (or more familiarly, Time to First Byte) and Server Response Time (including Backend Time), so that’s where we began harvesting first.
Measure Twice, Cut Once
The solution we proposed was one of the winning ideas from the TrueCar 2017 hackathon. In the hackathon project, we demonstrated that a web crawler could call the frontend web server, grab the HTML documents and write them to an Amazon Simple Storage Service (S3). We could then use Lambda at the Edge (Lambda@Edge) running in Cloudfront to intercept requests and route them to a cached version, if one existed.
Going into a bit more detail, the phases are as follows:
- The first step is to generate and upload a list of landing page permutations we want to cache during the build process. When the next version of the software is deployed, we will write this list of URLs to a cache bucket.
- When the URL list file(s) are written to S3, they will generate an event trigger for Lambda, which starts the next phase.
- During the batch phase, the Lambda code takes the input file from S3 and splits the file into individual crawler messages. This step takes what could be a list of thousands of URLs and writes them individually in batches to a Kinesis stream.
- Kinesis can “fan out” the requests by starting up to one Lambda for each Kinesis shard. Each Kinesis batch from a shard can contain from 1 to 100 URLs to process.
- We can manipulate the batch size, shard count, threading in the code and Lambda concurrency limits to expand or restrict concurrency for the next phase. We do not want too many concurrent sessions to choke our production or QA servers.
Crawl and Cache
- During the crawl and cache phase, a Lambda function is triggered when batches from the previous phase arrive at the head of the Kinesis stream. The Lambda function then takes that batch of URLs and then downloads each URL.
- After the contents of the URL(s) are downloaded, the Lambda code compresses and writes a copy to S3, making sure to update the metadata to specify cache maximum age, content encoding and so on.
- The last step of this phase is to update a corresponding rule in a DynamoDB table that controls the routing for Cloudfront. This database entry will control whether a visitor sees the cached version from S3 or the dynamic version from our origin servers.
- During the serve phase, a request from a browser is received by an edge node in Cloudfront.
- The request becomes an event that triggers Lambda@Edge, which does a lookup in the DynamoDB table for a matching rule.
- If a rule entry in the DynamoDB table matches, Lambda@Edge then rewrites the request so that Cloudfront routes the request to a cached copy in S3.
- If a rule entry does not match, the request is passed along unchanged to the origin server for a dynamic response.
But Does It Work?
The next step was to check whether our work paid off. Could we show that the lush fruit that we picked was tasty, delicious and nutritious? We carefully examined response time metrics and graphs after rolling out this process in our QA environment. We found some very promising results. See if you can spot the change in load times plotted below. Each colored line traces one phase of the page load, and the stacked total shows the completed page load time. Pink shaded vertical markers show when a new deploy occurs.
We noticed that page load times for First Time to Byte-like metrics significantly improved — from about 2 seconds to less than 1 second, as is dramatically shown below. The light blue cone facing right is a forecasted region where the algorithm predicts what load times are likely to be in the future.
We noticed that the speed index for some pages dropped from our “acceptable” levels of 1000 (or about 1 second) by nearly two-thirds to well below 500 (or less than half a second). These kinds of results show that there was a lot of room for improvement in our initial download and display phases. The gray charted area below is the prediction algorithm trying to figure out where the expected range of values will be.
We also noticed the same results for First Paint as for our Speed Index results (which makes sense since they are similar metrics). They say that a picture is worth a thousand words, but sometimes numbers are worth a few words as well. Here you can see dramatic improvements at each stage of the page loading process.
Caching the web pages was a fun and entertaining exercise. It was also fruitful in that it helped us improve some key metrics for page load times. We were able to improve Speed Index metrics by approximately 50%, a decent improvement that should result in some demonstrable benefits. However, further analysis and hard work is required to improve higher-order metrics and make the site more interactive earlier in the user’s experience. Page load time is the first (important) step on the way to making our pages intensely fast and interactive.