How AWS CloudFront makes your life managing high-load websites easier

At Elements we always try the get the most out of performance. Marc, backend developer at the Elements Barcelona office, recently investigated different website caching options and wrote this article about Amazon CloudFront.

A classic web distribution is built by handling HTTP petitions with a web server (NGINX in the example) that forwards them to a backend system that does the pertinent calculations and then hits the database to return a response.

This architecture quickly becomes inefficient with a few thousands of visits per minute in our website. Since we are talking about a website built with a CMS, all the HTML is built from the information stored in a database and this becomes a bottleneck. Very often, a standard relational database can’t handle this number of transactions.

Although a clever solution would be to add a cache system like Memcached/Redis that adds the database responses to cache or even the whole backend response. The request still needs to pass through a possible load balancer (if we have several machines handling the backend of the website), a web server and finally the backend before it ends up in a caching system.

This website’s performance would improve substantially with this change, but we can do better. Nowadays, modern web servers like NGINX can directly store responses in a file system or even access to a Memcached server to retrieve cached items there. So this fact opens two very valid options to minimize even more our response times.

Option 1: Store responses during a defined time period

We can make the NGINX server store the response directly in a public cache during 1h. This way we will limit the requests that hit the database to one per hour, the rest will load directly from cache.

Following this option we can obtain a fast, efficient and reliable cache system for the website which can handle a lot of requests. The web server will handle all the load and the database is only going to be hit once per hour.

Option 2: Permanent cache initialization/invalidation in the backend and loaded from the HTTP server

The NGINX web server can load from a Memcached system cached responses that can be set from our website’s backend. The same way we can invalidate the cache content every time a new change or page is going to be published from the CMS.

This option is the best in terms of performance. But it has an important drawback in terms of implementation cost. Furthermore it requires a perfect sync level between the backend, NGINX & cache system.

Again we came up with an excellent solution and it will perform in an excellent way allowing us to handle a significant quantity of HTTP requests with a simple architecture. But sometimes this is not enough, let’s think in a bigger scenario. What happens if our CMS is deployed into several machines behind a load balancer to reduce the load of the machines? It’s pretty efficient but all the HTTP Requests have to run by the load balancer before reaching an NGINX of one of our machines.

Then is when Amazon Web Services offers a tool to developers obsessed about website loading optimization like ourselves! This tool is called CloudFront. The most common use of this service is to build CDNs (Content Delivery Networks) to let us load static files (like images, CSS, JavaScript files, etc.) from different locations and store them in the cache, so the loading of these files can be way more efficient. But this service can also store web distributions, then we can set our DNS to point this web distribution that at the same time is in front of the load balancer that distributes the HTTP requests between our machines.

That is, the first request will pass to the load balancer, the servers generate the website with the database information, and while returning it to the user, it will be cached in CloudFront. Next time a user requests the same site, CloudFront delivers it. Additionally, CloudFront will deliver it from different points in the world providing low access latency wherever the user is in the globe.

Using CloudFront you can deploy your distributions in several world points. So if, for instance, your machines are located in Europe, once a HTML response is stored in Japan’s CloudFront, all the other visits to you website from the surrounding region will have a highly reduced load time. This fact will affect the loading time from anywhere in the world.

Now let’s talk about the numbers, I wouldn’t be writing this article if we didn’t measured the time! Check the examples of a cache hit and a cache miss:

By applying CloudFront’s web distribution system we managed to reduce the load of a website’s HTML from 1.06 seconds to 0,136 seconds, an improvement of 87% in load time!

I wouldn’t like to rewrite the entire CloudFront documentation about how to implement this solution, so instead of that you can have the direct link of how to implement it right here. If there is any doubt about this don’t hesitate to add a comment to this article, I will gladly try to give you an answer.

— 
 Also follow Elements on Facebook, Twitter and LinkedIn!


Originally published at www.elements.nl on August 1, 2017.