Amazon CloudFront — why more isn’t always better!

Our app — Fashionfreax — serves millions of unique outfit images. The content is very personalized and viewing behaviour varies strongly per file. Some images are getting viewed by a large percentage of our users, some of them just by a few per day. Intuitively we chose Amazon CloudFront to enable our users to access these images in the best possible speed. Amazon trumps competitors with more edge locations every couple of months in the whole world, state of the art latency and performance. Perfect world? Far from!


Two problems

There are two problems with CloudFront that make it far less useful for large content distributions with a “long tail”.

  1. Number of edge locations
  2. Unpredictable cache eviction

Why is a high number of edge locations a problem?

The huge number of edge locations looks good on paper. It actually is great in many common situations where you host a relatively small number of pages, scripts, images and videos that rarely change and are used by most of your userbase frequently. In our case there is a huge amount only being requested a couple of times per day or even less often. The thinner you spread the requests and the more edge locations you have the more the chance of a cache hit is reduced.

All CloudFront edge locations. See http://aws.amazon.com/cloudfront/details/

Cache Eviction

On Amazon CloudFront cache evictions happen in a blackbox. Depending on the amount of space in an edge location it will free up space by evicting some content. Most likely they remove least used content first, but there might be different strategies at play here that take into account more than just the time of last access. Based on the CloudFront documentation and support communication it is a good rule of thumb to expect the chance of eviction to be higher for files that are requested very infrequently.

Let’s say you have a file that is being requested about 54 times per day on 54 edge locations around the world(the number CloudFront currently has, see here) you will have some edge locations that only get one or no requests on a specific day, which increases the chance that the image in question gets evicted.

Example of evictions in a highly stressed edge location (Miami). As you can see, depending on the time of day, the file gets evicted from the cache within a couple of hours.

There is no direct way to prevent evictions like this from happening. Cache file headers have no impact on this, and there is no option to pay more for a bigger piece of the CloudFront-pie.

Solutions

Tradeoff: Optimization of average minimal latency and hit rate. Increasing the number edge locations usually increases minimal latency but decreases hit rate.

There are multiple solutions to this, most of which require the switch to other CDN services.

Reducing number of edge locations

This is not really possible with CloudFront. There you can only select price classes, e.g. US and Europe, but not Asia. This does not help in this case. To be able to trade off edge locations (low latency vs. hit rate) you’ll want to select the maximum number of edge locations which satisfy your target hit rate (e.g. 90%) and are spread out optimally throughout your userbase’s territories. Funnelling requests to less edge locations then decreases the chance of eviction for each file.

There are not many CDNs that actually support this. One of the few is CDN77, which gives you fine grained control where you can enable and disable each of their currently 32 edge locations throughout the world. We determined that our sweet spot is at about 10 edge locations of long tail content. Compared with the previous 54 edge locations we now see files with sporadic requests five times the amount of requests, reducing the chance of eviction significantly.

We achieved a hit rate increase from 50% to 87% for our long tail content. This means that most of these images display in below 100ms, and only 13% need to be fetched from the origin server, which usually needs around 300ms in total depending on the location on the globe.

But should I really ignore most of the edge locations out there?

Not really. You should be aware of your data access patterns. It is possible to spread different files through different CDNs catered to different access patterns.

Create one CDN distribution with the maximum number of edge locations for your most used files, and one less spread out distribution for your long-tail files.

This saves cost and optimizes performance for your users. A potential set up would be to serve all static files most users need (e.g. website scripts, UI images), the most used user generated content in the Tier 1 CDN distribution with the maximum number of edge locations and to serve long-tail, less frequently used content in a Tier 2 CDN with a reduced number of edge locations.

This might seem like a lot of work, but content delivery is often more complex than one thinkgs, and if your user experience is highly affected by content loading performance you might want to look into this.


On a sidenote: CloudFront, quo vadis?

While Amazon introduces new Services to AWS seemingly every week, it seems like they somehow forget about moving their existing services forward. More granular options to select edge locations and (off-topic, but still important) support for things like HTTP/2 would be great (and in my opinion overdue) additions to the service.

Update: Amazon finally acknowledge working on HTTP/2 support for CloudFront and has plans to release it in 2016. See AWS dev forum post.


Check out Fashionfreax, a thriving platform for fashion, community and shopping. We are hiring! http://www.fashionfreax.net/careers