Need for Speed: Analysing global CDN performance

Property Finder Group has a diverse user base spread across seven countries in the Middle East and North Africa (MENA) region. As a lead software architect, I am constantly trying to find new ways to improve our customer satisfaction and optimize their user experience, very often in regards to performance. Personally, I believe in the Kaizen approach, an idea of creating continuous improvements with small positive changes that trigger a snowball effect over time. As a part of applying this process in Property Finder, we want to continuously optimize our performance and decrease the end-user latency of the assets on our website and inside our app.

As many other companies do, we utilise a content delivery network (CDN) for serving our static assets. Using a CDN allows us to offload our static assets loading from our origin servers (in our case, AWS S3) and serve those assets as close to the end users as possible. With many points of presence across the globe, our CDN helps deliver content to our end users from a server in their city or country, instead of downloading them from the original location which could be much further away.

We’ve been a long-time customer of Akamai, one of the oldest CDNs on the market and the one with great coverage and availability in a big number of countries worldwide. However, a few months ago we had a discussion with our CTO to measure our CDN performance and investigate our current approach and evaluate whether we should change our CDN strategy, use multiple CDNs if it makes sense, or completely change our provider.

When approached with such a broad topic, there are different ways of evaluating and measuring your choices. Ideally, all of them have to be backed with some analysis of the current state, the competitors in different regions where we operate in and performance from the perspective of different ISPs. There are negligible differences throughout different browsers and operating systems; we would rule these differences out as statistically insignificant and focus solely on network performance and the end-user latency as a single metric of a CDN.

“You can’t improve what you can’t measure”.

After evaluating a couple of different options, I realized that there’s a tool that perfectly fits our needs for retrieving scientific results and a real-world measurements from many points across the globe, which can help us to clearly see which CDN performs best per country or a region.

Our quest for the fastest CDN outgrew the original idea and became a large piece of research. The final result was a lot of wildly coloured maps, including an interactive one that you can use to see the best performing CDN in your country.

RIPE Atlas

Enter RIPE Atlas: a global, open and distributed network of probes which measure Internet connectivity and reachability. The probes themselves are distributed by RIPE Network Coordination Centre, one of the Regional Internet Registries. It is the largest Internet measurement network ever made!

To add a bit of historical reference, one must not forget that people have used electric telegraph since 1835 to broadcast the weather forecast. The first weather station started collecting weather data even before that, in 1781. In a connected and digital-first 21st century world, we still make use of hundreds of thousands of weather stations worldwide to quickly see the current weather conditions on our smartphones and decide on our clothing for the night out.

Geographical spread of RIPE Atlas probes.

Such connected world needs an Internet equivalent of weather stations, that would constantly monitor the Internet itself. And so, that’s where the RIPE Atlas project fits perfectly: it gives everyone the ability to measure the connectivity of any device connected to the Internet (by solely having a publicly routable IP address) from so many different probes.

In a nutshell, RIPE Atlas gives everyone the ability to use more than 10,000 probes worldwide, currently distributed in almost all of the world’s countries, thanks to hundreds of volunteers that are hosting them.

In a nutshell, RIPE Atlas probes are small Raspberry PI boards with Ethernet plug and a micro USB power charge with a special software inside. The latest model is running on NanoPi NEO Plus2 with 512MB RAM.

RIPE Atlas reached 10k connected probes in 2017.

It’s important to note that RIPE Atlas is a credit-based system: you can get uptime credits for having the probes online, and you can also get them whenever your probe delivers results for someone else’s measurement.
To get started, you can get free credits at RIPE events; you can also send and receive credits from other RIPE Atlas members.

Being volunteer-based makes more than 50% of probes get abandoned over time.

To learn more about RIPE Atlas itself, I suggest viewing an introductory video and browsing the official website.

Content Delivery Networks

A high-level overview of how geographical spread of CDNs points of presence (PoPs) help get content from origins servers closer to the end-user. (image courtesy of Cloudflare)

I will not get into the topic of implementing a CDN, and how they work under the hood. Instead, I will rather point out the important benefits that they provide:

  • Faster loading time of assets due to decreased latency. Assets (images, Javascript, CSS files and so on) are fetched once from the origin server, and from that point on they’re being served from the server closest to the user. This helps optimize end-user experience, make our customers happy, and boosts SEO rankings because of faster page load time.
  • Cutting traffic costs. Typically, by serving your static content from popular cloud storages (i.e. AWS S3 & Google Cloud Storage) to your users you pay for the Internet traffic generated for each download/hit. A CDN helps as a man-in-the-middle: it will fetch the requested content only once from the origin server, store it, and then serve it from cache. This is a lot cheaper for you as your origin will have less outgoing traffic.
  • Caching. Using a CDN allows you to specify different dynamic caching policies and increase cache hit rate. If content is served from CDN’s server cache, it does not have to fetch it from your origin server. Cache hit rates of static content requests can often reach 90% and more, which essentially means cutting 90% of traffic costs from your data center.
  • Ensuring readiness for traffic spikes in case of sudden traffic. CDNs have invested a lot of time and knowledge in developing large infrastructures that scale well, from being featured on Reddit and Hacker News to streaming a live UEFA Champions League finals.

In the case of Property Finder and similar assets-heavy websites, the main objective is to serve all the terabytes of assets we have to our customers in the fastest way possible.
To ensure all of this as well as the best and the most efficient caching policy, in addition to using a reliable and well-spread CDN, we’re doing our best to utilise the correct Cache-Control headers with appropriate expirations for different content types; ignoring query strings to avoid cache busting, using immutable flag etc.

Who is using CDNs?

Everyone!
Nowadays, the majority of the Internet traffic is passing Internet Exchange Points where traffic is exchanged for free, or for a very low fee. There, big content providers (think Netflix, Facebook, Google/Youtube) and ISPs connect and exchange traffic with the lowest latency and the highest throughput possible.

By reading this blog on Medium.com you’ve unknowingly accessed Cloudflare, one of the most popular CDNs. Also, your device has probably established a connection to Fastly’s servers when you played your favorite song on Spotify.
Your favorite blogs and news portals are served using AWS Cloudfront, Google Cloud or other CDNs. You’ve also, like the majority of the Internet population, generated some traffic to private CDNs by accessing Facebook, Instagram, Youtube, Netflix and so on.

You might think that only big companies use CDNs, but you’d be mistaken. Nowadays, it’s almost unimaginable to start a website, an app or an online service without thinking about the best way to serve your traffic. Upfront planning to use a CDN makes a lot of sense.

Imagine a common startupreneur scenario: you have an idea for a startup that you think will grow a lot in terms of users, scale very fast and have a lot of traffic (and sudden spikes).
Should you think about a CDN from day one? Absolutely!

You want to optimize your costs upfront and achieve the best performance at the same time. To achieve this, you can rely on your gut feeling, a friend’s recommendation, a Google search…or, you can utilise scientific, statistical data with real numbers and not just guess.

If you’re still interested, keep on reading. Here’s where the research starts!

Creating measurements

To create your very first measurement using RIPE Atlas, you can use either web UI or a nice JSON API.

Using the RIPE Atlas Web Wizard is really simple; in just a few clicks you can create a measurement with the summary of all the associated costs.

RIPE Atlas Web UI.

The probes available for use are hosted at different places: on a local router at home in residential areas, racks in workplaces and offices, and inside data centers. They can also be connected to mobile 4G connection or via a satellite uplink on a very remote location. As long as there’s an Ethernet connection, the source of the connectivity doesn’t really matter!

The coverage of IPv4 and IPv6 networks in total is pretty much the same: below 10 % of worldwide autonomous system numbers (ASNs).
IPv4 ASNs covered — 3602 (5.627%)
IPv6 ASNs covered — 1446 (8.617%)

However, in a grand scheme of things, all of the major worldwide consumer ISPs and hosting companies have a sufficient number of probes hosted with them, and almost all of the world’s countries are connected — 182 (92.857%).

Research methodology

A comprehensive RIPE Atlas REST API.

As noted above, using web UI has its drawbacks in some scenarios.

Particularly in my case, as I want to analyze all the countries in the world one by one, selecting the probes and then filtering them by different tags, it would be very cumbersome to repeat this process manually for 182 countries.

Luckily, all of this can be done through a very simple REST API.

First of all, we need a key ingredient to conduct this research: plenty of RIPE Atlas credits. Luckily, I’ve had a probe connected for more than five years in which it had collected almost 60 million credits, which was more than enough to conduct this research more than a dozen times. I occasionally made some analysis for private use and also for investigations like this.

Secondly, a list of CDN providers was defined, by analysing the current CDN market and favoring companies with global presence instead of just a regional availability.

Here’s a breakdown of the CDNs chosen for this research (a total of 7):
1) Akamai: a really old player in the market
2) AWS Cloudfront: a global player with almost 200 PoPs across 30 countries. They have regional edge points of presence to which all the other POPs are connecting to as a pre-optimization step to concentrate hits to a regional edge POP.
3) Microsoft Azure: more than 130 PoPs and a very large network with different tiers.
4) Cloudflare: the most popular choice for small to medium websites with a very generous or almost limitless free tier.
5) Google Cloud CDN: use Google’s global network in conjunction with Cloud Storage or with Compute Engine instances.
6) Fastly: a popular CDN for different projects of big scale (Github, Spotify, etc). Available in more than 30 PoPs and planning on expanding to even more.
7) Cachefly: used to be a US-centric CDN, but recently grew to a global player.

Once the CDN providers list was defined, I decided to write a simple script using the Go programming language, due to its simple concurrency primitives. This small script (less than 200 lines) goes through all the ISO2 (ISO 3166) codes of the countries, combines them with all the possible combinations of CDNs that were defined before, and sends a three-packet ping measurement API request to RIPE Atlas’ API.

Cloudflare has its own, very popular, DNS resolver on 1.1.1.1. For Cloudfront and Google Cloud I had to create my own distribution, but all the others were very easy to test with some of the well-known hostnames of the companies publicly using them (FIFA, etc).

Selected target hostnames.

Using the request options we’d select up to 50 probes, and use a couple of tags to filter out all the unavailable or unstable probes that would negatively influence our results set. The probe selection tags I used were system-ipv4-capable, system-ipv4-works, and system-resolves-a-correctly to ensure that DNS resolution works correctly.

Parsing the measurements

Once we received an API response after creating a measurement, we saved the measurement ID to a results database, in the form of a CSV file. This database was used to store all the measurement IDs and their country/CDN key pairs. We’d wait for some time before fetching the results of the measurement as sometimes they can take up to 15 minutes. Also, the API calls had to be periodically paused because of the throttling on RIPE Atlas API side: up to 100 concurrent measurements and up to 1 million credits daily expenditure are allowed.

Some requests failed, as RIPE Atlas is still not distributed in all of the world’s countries; this was expected and such responses were discarded, hence some of the gray areas on the results map.

Here’s a screenshot of a single measurement result from the perspective of a web UI:

Result of a single measurement on Web UI.

We can see all the probes involved, their related ASNs, packet loss in percentage, and a round-trip time from a probe to a target host. (our metric of interest). Of course, consuming these results through an API made more sense, and that’s what we’ll focus on.

In addition to avg field, the response contains the 3 ping RTTs.

All the results were separated into a separate directory for each CDN, and within those directories, a file per country was created.

Results set is available on GitHub repository: https://github.com/emirb/ripe-atlas-cdn-analysis

After collecting all the measurements from RIPE Atlas API and storing them, I ended up with a combination in the following format:

iso2_code,cdn_name,rtt_ms

Overall, the whole research consumed around 50,000 credits.

Results

Fastest in most countries: Cloudflare.

The best performer around the globe was Cloudflare, followed by Google Cloud, Akamai and Azure.

Distribution of fastest CDNs on a world map.

On a world map, the situation is very colorful and the winners are dispersed all over the place.

Average CDN latency.

The average latency of a ping round-trip is mostly under 50ms per country. In Europe, this is usually around 10ms.

Best performing CDNs in Europe.
Average CDN latency, worldwide.

Remarks: 36 milliseconds doesn’t mean that this was the average. If the overall research would be run a few more times, it would always yield different results, because the 50 probes included in each measurement were assigned randomly. This randomness can yield biased results in countries that have large number of probes (500 or more).

Also, have in mind that not all the ASNs in every country have RIPE Atlas probe installed. Therefore, results can sometimes be artificially boosted because the results in one country consist of probes belonging to the same ASN which has a good connection with low latency to the target host; on the other hand, if a country has only 2 probes and both of them are performing badly to any hostname (with initial ping of 100+ ms to anywhere), then the results would be worsened. Again, a solution to this is to diversify probes in each country and cover as many ASNs with at least one probe if possible.

An interactive map is available here.

Conclusion

Cloudflare has the best geographical spread, and it’s clear that they’re constantly adding new PoPs; they have 180+ PoPs already.

Akamai used to be the best, but the most expensive CDN for a very long time, almost exclusively used by very big companies. They have different types of agreements with ISPs through private peering, as well as connections at a lot of IXPs. In the MENA region, they’re doing a really good job and so far, as I mentioned in the introduction, the performance is satisfying.

When taking into consideration the sheer size of Google’s network, have in mind that with Google Cloud you can opt for different network tiers. Choosing a Premium over a Standard network tier costs more, but can give you better performance and reliability due to the fact that the traffic will be routed differently.

When using Google Cloud’s Premium network tier, the traffic should flow through Google’s internal, higher quality network.

Azure is also a little bit unique and can yield different results depending on the network choice. When creating a CDN distribution on Azure, you can choose between Verizon, Akamai and Microsoft CDN, which are running on three different networks. If you want to use Akamai, using it through Azure might be the easiest way to do so; otherwise, you’d have to reach out to Akamai sales and have a rather high volume of traffic.

After concluding the results of this research, I tried to see if there are any active and maintained tools to do a near-real-time analysis of CDNs.
Some of them proved to be really nice and very useful!

CloudHarmony, for example, utilises RIPE Atlas probes as well and offers a nice web UI with filters and graphs.

On the other hand, CDNPerf utilises proprietary data to do RUM analysis. I always prefer open-source and public data if possible.

The whole research couldn’t be possible without RIPE Atlas project. If you’d like to participate, you can apply for hosting a physical probe here.

In any case, it’s clear that picking a right CDN has never been easier, and that it has never been backed by more data.

Note: this blog post is a transcription of a talk given at RIPE SEE 8 conference held in April 2019. Video is available at https://www.youtube.com/watch?v=zDm8uv8kER8

If you liked this article and want to be part of our brilliant team of engineers that produced it, why not have a look at our latest vacancies here.

--

--

Emir Beganović
Property Finder Engineering and Tech Blog

Architecting service-oriented server software and distributed cloud-native systems at scale.