Caching the long tail on AWS with Amazon S3 and CloudFront Origin Failover

Niels Laukens
VRT Digital Products
7 min readSep 9, 2019

Uploading images on the VRT websites is a necessary and frequent job for all the editors providing content for these websites. Accessing these images in a variety of formats is important for our end users. There is a system in place to do all this, however, this system could use a renovation. So that’s what we set out to do. We wanted to rebuild the part of the website where images are dynamically resized and we wanted to use a Cloud-Native “Serverless” approach.

We decided to run a PoC, to see if this technology would be able to deliver on three big unknown factors that we defined going into the experiment. Would we able to deliver images larger than the maximum allowed size of the Serverless platform? How could we ensure that images are only resized once, and reused for all the following requests for that specific size? And finally, would resizing even be possible using our current software in a Serverless environment? Within the scope of our PoC we could not do this last element, but we did manage the first two. Check out how we went about this cachy challenge.

A bucket on a beach
A bucket (a regular one, not an S3 one) with a (regular) cloud front in the background (Photo by Gregory Culmer on Unsplash)

A cache is a component to store data in order to serve future requests for that data more efficiently. This usually means “faster”, but could also mean “cheaper”. Caches come in a whole range of scales, usually with a different size/speed tradeoff. There is the super fast but small L1-cache which typically delivers data within the CPU in under 1 nanosecond (that is a billionth of a second) but only stores a few tens of kilobytes of data. But you could also say that the Docker Registry is a cache: it stores data (Docker images) in order to avoid having to rebuild these from the Dockerfiles. The official Docker Hub stores many millions of images, totalling many terabytes of data, but isn’t particularly fast, at least not compared to the L1-cache example.

Most caches are limited in size, and have a mechanism to optimise which items they store. They can evict items that were not recently accessed (LRU), or that are not accessed very often (LFU), … Sometimes you can influence the cache’s behaviour, e.g. by specifying the maximum age of a cached item, or by specifically requesting the cache to flush a particular (set of) item(s).

When applied to web technologies in an Amazon Web Services context, this could translate to CloudFront — AWS’s CDN — as a caching layer. HTTP standardised a way to control cache behaviour by specifying Cache-Control headers. These mark a particular object as cacheable or not, and specify how long the item may be cached. Note that a cache is allowed, not required, to cache the object. This is particularly true with CloudFront: you can’t rely on your content staying cached by CloudFront, especially less popular content may be evicted from cache early.

In most web-contexts, you will notice that some objects are requested very frequently, while other objects are less “popular”. When pouring the data into a graph, it usually looks similar to this:

Graph of request count vs ranked object, showing a typical power-law probability distribution

Every bar represents a particular object. The height of the bar indicates how many times this object was requested. The objects are sorted from most to least popular. As the size of your cache grows, you can fit in more and more objects. A good cache will learn which objects are popular and thus most useful to store: it will cache the left part of the graph. But you will always cut off the “tail” of the curve. And while these objects are less popular, there is still a lot of volume, and thus opportunity, in this long tail. The graph above shows the 100 most popular objects (0.03%), but they account for 15% of the total number of requests.

Image resizing

When serving images to a wide range of platforms (a responsive website, native app on tablet and smartphone), it is usually desirable to serve an appropriately sized image: While you want to show the 1280×720 pixels version in the article on your desktop website, you want to send only 64×64 pixels to be used as thumbnail in the overview screen of the smartphone app.

While image resizing isn’t particularly computationally intensive, you don’t want to resize the image for every single request. Amazon has an AWS Solution to cover exactly this need: it provides dynamic resizing via Sharp.js, with CloudFront as cache.

This solution works great, but has several limitations:

  • AWS Lambda has a 6MB limit for its return data. Since the body itself is base64-encoded in the return data, this limits the actual image to slightly less than 4.5MB. The AWS Solution code currently has no handling for the case where the image is larger than 4.5MB, and simply times out.
  • Caching is done by CloudFront, meaning less popular content will not be served from cache. On our production system, we attain a very respectable cache hit ratio of around 97%, but that would still leave tens of resizes per second to be rendered dynamically.

Long tail caching

In order to cache more content, we added an additional layer of cache in the design: In case CloudFront does not have the requested image in its cache, a second cache is consulted. Only if this second, larger, cache does not have the requested images, it is resized on-demand. This second, larger, cache is (at the moment) never flushed, so it keeps all resized images at hand. Additional logic could be added to manage this cache according to the business needs.

To implement this additional caching layer, we used a relatively new CloudFront feature called Origin Failover. This is a feature intended to be used in failover scenario’s, but it also fits our caching needs surprisingly well:

We configured an S3 bucket as the primary origin. Either S3 will return the object, or it will respond with a 403/404 status code (S3 doesn’t always return 404 when a file is not found). CloudFront is configured to consider this Not Found condition as a failure, and fall back to the backup origin. This backup origin is configured to be the resizing Lambda (via API Gateway, as usual).

This solves most of the described problem: Requests are primarily served from CloudFront’s own cache. If CloudFront doesn’t have the objects in cache itself, it request the objects from S3. If the S3 “cache” misses as well, the image is dynamically resized. The resized image is both stored in the S3 bucket for future requests, and returned to CloudFront as response for the current request.

Caveats

It wouldn’t be much fun if it would be that easy, wouldn’t it? Of course there are caveats!

There is still the 6MB limit imposed on our Lambda response. One way to work around that is to store the output image on S3, and issue a redirect to this new object. This can be done for all responses, or only the responses that would overflow the 6MB limit. Although unconventional, you could even issue a redirect to the same URL. This will cause the client to effectively restart its request. CloudFront will re-try to fetch the object from S3. But…

One of the things that will bite, is the S3 consistency model. In the above scenario, the request to S3 will probably not return the freshly stored object! The gotcha is documented:

Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write.

Since CloudFront did an initial GET-request for the object (which, at the time, did not exist), subsequent GET-requests are not guaranteed to return the freshly PUT object.

To overcome this issue, we store (large) resized images in two buckets: the final location (with eventual consistency), and under a deterministic but unpredictable name in a temporary bucket. The issued redirect points to this temporary bucket. Since the name is unpredictable, it will not have been accessed yet, and a subsequent GET is strongly consistent and will return the new object. This temporary bucket has a lifecycle rule to delete objects after 1 day.

Note that CloudFront is a distributed cache, so it is likely for multiple CloudFront edge locations to (almost) simultaneously discover that the object is not (yet) in their local cache, and (almost) simultaneously find out that it isn’t available on S3 either, and request a dynamic resize. This could lead to an initial burst of identical resize requests. The consistency-issue of S3 worsens this situation. This is currently accepted as a known issue.

Conclusion

Diagram showing the described architecture

S3 can be (ab)used to serve as an additional cache layer augmenting CloudFront’s own cache. This can be useful to reduce the cost — either in time, money or both — of the dynamic Lambda calls. There are some caveats regarding S3’s consistency model to look out for, and additional logic may be required to evict objects from this cache layer. The whole “try cache first and fall back to dynamic”-logic is integrated into CloudFront’s Origin Failover feature.

--

--