How we upgraded the engine behind RacingNews365.

Jordi Kroon
May 16 · 7 min read

racingnews365.nl

Formula 1 in The Netherlands is a big thing. With Max Verstappen as rising star, the attention for the sport has grown tremendously. These numbers return in overall TV statistics and F1 related websites. RacingNews365 is one of the websites that jumped on the Max Verstappen hype. With 2.7 million visitors in 2016 (when Verstappen was driving for Torro Rosso) the website had a successful start. When Verstappen made his transfer to the Red Bull Racing Team numbers exploded from 7.6 million visitors in 2017 to almost 17 million visitors in 2018. These numbers, with peak traffic during races, require a good setup and a stable foundation.

In 2018, we managed to get control over the regular traffic, but we still had a couple of bad moments during and immediately after a race peak. We were able to fix most issues, but some were laying deep in the code and the CMS behind it. These issues became a problem with visitor numbers increasing to a certain point. In the meantime Craft CMS was being updated to Craft 3 which implemented the new Yii2 Framework. These updates promised to improve performance and some issues we were facing. This made us decide to start over from scratch.

Our experiences with the previous website gave us enough learnings of what we should do and what we should not do. Loading a heavy PHP framework on every request with these amount of pageviews will make our server-load explode within seconds. That’s why we need to add caching and to cache the cache. Our caching server should do the work, not the server running the application. We were also very aware of the fact that caching adds complexity and odd side effects. When a page is flushed from the cache it will enter a grace state for a short period of time. During this period, all users will access the application server. We spent hours debugging our code because we noticed a 30 seconds downtime every 5 hours. This happened because all of our pages were cached for 5 hours. The solution was simple though: randomize the cache control values. With these notes in mind we went on and decided that:

  • Users should never directly query a database
  • XHR calls should not load the full framework
  • All pages should be cached (except XHR calls)
  • Cache should be really easy to manage and extend
  • Full cache flush should never happen

Server & code optimization

A good foundation is only possible when the developer and server engineer work together. It opens up possibilities that would normally not be possible. A developer might say that the server should be upscaled and the server engineer might say that the code performance should be improved. While in fact they should think about ways to optimize both.

  • Enable OPCache
  • Use PHP-FPM
  • Use Apache MPM event
  • Varnish full page caching on a separate server
  • Configure idle workers
  • Cache with Redis
  • Use the correct cache headers
  • Prefer XHR over regular POST calls
  • Avoid the use of (heavy) database queries
  • Use Edge Side Includes

If you need to do heavy tasks, consider moving these to a cronjob. MySQL has a certain amount of threads. If threads are full because of locked queries, the website will have to wait for a thread to open up. This will cause many issues and there’s no fun in digging into why and when this is happening.

Caching

We decided to change our caching structure. When an article went live in the previous setup, we would flush the article and trace back the URLs to the news overview, the homepage and all the other pages of the website where this article would be shown. While this seemed a reasonable way of working at that time, the constantly increasing amount of related pages and categories made it to become messy. It just did not feel right and clean anymore. Therefore we introduced surrogate keys as a replacement. We would now label all our website elements as “homepage”, “content-overview”, “article”, “entry-id-[%id%]”, “race-calendar” and so on. Instead of flushing a single article and tracing back all the other pages, we would now only have to flush a specific set of tags.

Caching full pages was not enough for us. Our goal was to never ever do a full cache flush at all. Never. When all pages are fully cached and you edit a menu item, you would normally need to flush all pages. The solution for this problem is ESI (Edge Side Includes). ESI is an HTML/XML-style syntax that looks a bit like HTML, but different.

ESI is designed to work perfectly together with Varnish. It reads your HTML output and when it detects an ESI tag, it replaces the ESI tag with the output of the “src”. Now when you update the menu, you do not have to clear the full page anymore, but just simply flush the source where the ESI tag is pointing to.

Previously, all of our assets were served by our cache server with a lifetime of 2 months. At first sight this sounds as a good alternative for the cost of a CDN (Content Delivery Network). However, the cache server keeps these assets in memory for 2 months and would cause odd effects like random cache flushes for regular pages because the cache server is burning old cache entries for newer entries. Since we are already using Amazon Web Services, it’s an easy pick to go for S3 with CloudFront as our CDN. By moving assets to S3 and using the CDN we noticed that the costs for using the CDN were almost fully covered by EC2 as most bandwidth is caused by users downloading assets.

Dynamic content is not changed as often as you might think. In our case, it’s only changed when content is added from the CMS. Why render a menu on every request when resources are limited? Rendering the menu to a static HTML-file after changing it is a simple yet effective improvement. Our ESI-tag would then point to this static HTML-file instead of a location parsed by Craft, skipping Twig and all the other overhead.

The cache server only caches pages that contain the HTTP Cache-Control header. With ESI content you usually set these headers for every file, but that’s causing a lot of mess in the code and thus not an ideal solution. Craft uses the Twig template engine. Within Twig we can write functions with custom logic.

This function will convert the path into an url. Note the numbersWithRange, it randomizes the cache_seconds to ensure that the cache will not expire together with other objects at the exact same time.

In our .htaccess file, we match all URLs that contain cache_control. We then convert the cache_seconds into a HTTP header. We tried doing the same with the cache_tag, but we cannot seem to do much manipulation on the URL values like URL decoding and concatenation. Thus we have to re-route all URLs to a static-file-loader.php file. This file is lightweight and only converts the cache_tag[]=tag1&cache_tag[]=tag2 into a HTTP header xkey=tag1 tag2. Varnish reads these HTTP-headers to determine how the ESI-content should be rendered and if a new version should be fetched from the application server.

With all these layers of cache we were aiming for a way to manage cache on a centralized place. That’s where we added a simple “cache-config” file to our project. To give you a better understanding, Craft has “globals” and “sections”. Both having an upper abstraction layer called “Elements”. Globals can been seen as data objects that you can add to any place in your website. Like a head navigation, footer items. Or just a simple text block. A section can be split up in “Singles” (homepage, overview pages) and “Channels” (detail pages like news items). Each type now having its own key in our cache-config.

Craft dispatches events on given moments. What we are mostly interested in is the event that is triggered right after an element is added, updated or removed. At that moment we run our so-called flushBehaviour starting from top to bottom. Each callable within this array is executed with a set of parameters.

When a news-item is updated, we first rewrite a bunch of twig templates. Then we flush the content-overview and special pages from the cache using surrogate labels, but only when certain fields have changed. And finally we flush the current entry, dossier and error pages.

When a new tag is added for a new or existing Craft global or section, we only have to add it to this config-file. And developers only have a one central file to look into to understand how the cache behaves and what effects adding, updating or removing an element in Craft.

Preview: cache-config.php

Conclusion

The announcement of the new major CMS version made us rethink the entire structure of the RacingNews365 website. Of course, we all hope to see a lot more prestige actions from Max Verstappen this season and expect the amount of pageviews to grow even more. By rebuilding the caching structure from scratch and implementing some core changes, we have tackled major bottlenecks which would otherwise cause us headaches in the future. The average response time decreased from 2.5 seconds to as little as 0.3 seconds and the system load has not showed spikes during race peaks. Our focus did shift from monitoring server metrics to watching Max Verstappen do overtakes in F1 races. These low metrics also opens up new possibilities for innovations and new features to keep RacingNews365 the leading Formula 1 website in The Netherlands.

TDE ❤️ Formula 1

Closer to your favorite sport, club or legend. That is how TDE is helping brands, organizations and clubs. Always utilizing the latest technology. For us, digital sports marketing is a craft and we are the craftsmen.

With unconditional love for sports our creatives, developers, online marketers and social media managers constantly improve to achieve the ultimate result. That’s how we will get you closer than ever.