Scaling Challenge Leaderboards for Millions of Athletes
Strava challenges offer a fun way for athletes to compete against themselves and others! Back in 2020, our legacy challenge leaderboard system was running into bottlenecks and scalability problems on a regular basis, and we often found ourselves putting out fires to keep the system stable. In late 2020 and early 2021, I worked on a project to replace the old leaderboard system with a new one that could handle a much larger number of athletes competing in challenges. This blog post is about that project. I drafted most of this post when the project wrapped up in 2021, but didn’t get it published before I went on paternity leave — and then I forgot about it. I think the project was interesting and worth sharing, so I’m glad I finally remembered my draft (three years later ) and found some time to put in the finishing touches and get it published! Enjoy!
In January 2020, Strava’s monthly 5K running challenge attracted more than half a million participants for the first time. This was a big milestone, doubling the participation from just a year earlier when we had only a little more than 250,000 participants in the January 2019 5K challenge. But we weren’t done growing! Just 5 months later, our challenge participants doubled again and we broke 1 million participants in a monthly challenge for the first time with the May 2020 5K, where 1.2 million Strava athletes participated! In less than a year and a half, the number of athletes participating in Strava’s monthly challenges had quadrupled.
As you can imagine, the increase in challenge participation came with a significant increase in load on our systems. In particular, we were beginning to approach the limits of what our existing challenge leaderboard system could handle. If the number of challenge athletes participating in our monthly challenges were to double again, our challenge leaderboard system wouldn’t handle it well. We needed to do something to improve the way our leaderboard system scaled with more challenge participants.
The Old Leaderboard System
Strava’s existing challenge leaderboard system wasn’t necessarily designed poorly, but it also wasn’t designed for the scale we’re currently operating at. The original challenge leaderboard implementation was designed about 8 years earlier, in 2013. It met Strava’s needs when it was designed, and operated pretty well for more than 8 years.
The original challenge leaderboard system was implemented primarily using Redis. Specifically, the leaderboard made use of Redis’s sorted sets. This is a pretty well-known use case, and there’s plenty of documentation available online about how you might implement a leaderboard using a Redis sorted set. There’s a good chance that some of the other online leaderboards you might be familiar with are using a Redis sorted set in the background. But in addition to using Redis sorted sets (which can scale well accessed in an efficient manner), the leaderboard system also made use of less efficient storage and data access patterns in Redis, and these less efficient access patterns ultimately led to the inability of the system to handle more athletes.
As the old leaderboard system scaled up to more than a million athletes, limits to the scalability of the system became clear. Due to some inefficient data access patterns in our code (that worked fine at a smaller scale), we began running into scalability problems somewhat regularly. One of the inefficient data access patterns we discovered in our old leaderboards code involved the use of a redis hash to store data related to an athlete’s leaderboard entry. We discovered old code that would call HGETALL on a hash that contained all leaderboard members for a given challenge. HGETALL is an O(N) operation, so the performance got significantly worse as more athletes joined our challenges.
One of the most memorable incidents was a problem related to the inefficient HGETALL operation that happened near the beginning of the month (either on the first of the month or on the first weekend of the month). With about a million athletes signed up to compete in our monthly 5k challenge, tens or even hundreds of thousands of athletes would complete the challenge on the same day. And when an athlete completed the challenge, we would send them an email with some of their challenge stats. This email, triggered when any athlete completed a challenge, was one of the code paths to the inefficient HGETALL operation. So if too many athletes completed the challenge at approximately the same time, it could overwhelm the CPU on our Redis instance and cause Redis queries to fail. This was bad because it not only caused the query for the challenge completion email to fail (which was processed in the background and could be retried), but also caused other queries to fail. For example, queries made by athletes trying to view the leaderboard on strava.com or in our app might also fail. Thus, if too many athletes completed a challenge at roughly the same time (like when hundreds of thousands of athletes go run a 5K on a beautiful Saturday morning), it could cause parts of strava.com to become temporarily unavailable. Of course, as software engineers, it’s our responsibility to improve the system so this can’t happen.
Besides our immediate performance and reliability concerns, we also had a small wishlist of other improvements we could make to our challenge leaderboard system. The original leaderboard system was designed to be ephemeral. That is, all of the data stored in the leaderboard was also stored in more durable databases within our system. If Redis crashed and we lost our leaderboard data, we’d be able to re-compute the entire leaderboard in a relatively short amount of time. Well, that was true when we had tens of thousands of athletes on our challenge leaderboards. It’s no longer true with more than a million athletes on a single leaderboard. While a leaderboard with under a hundred-thousand athletes could be regenerated in about an hour, a leaderboard with a million or more athletes might take more than a day to rebuild. To mitigate the risk of losing our leaderboard data, we wanted to store leaderboards in something more durable than Redis. (It’s also possible to make Redis more fault-tolerant with replication and backups, but that too requires engineering effort and we weighed this against other solutions.)
In addition to our concerns about our ability to recover leaderboard data after data loss, we were also thinking about the long-term storage of leaderboard data. Strava challenge leaderboards are different from many other online leaderboards in that once the challenge is over, the leaderboard is in a more-or-less permanent state. Old leaderboard data won’t be accessed nearly as frequently as current leaderboard data, but we still want to be able to efficiently query old leaderboards so they’re fast when an athlete views them. Redis is an in-memory data store, and while its sorted sets make a lot of sense for a leaderboard that receives many updates, they make less sense for leaderboards that are infrequently modified and infrequently accessed. In particular, there’s no reason a leaderboard from 2014 needs to be in memory instead of on a disk, and using a traditional disk-based database would be preferable for many reasons including storage cost and ease of backup.
So, our old challenge leaderboard system had scalability problems that we knew we needed to address. We could do several in-place fixes or upgrades to make the old system work much better, and we considered doing so. But we were also looking ahead to the future. We were well into the process of migrating our old monolithic Rails application to a service oriented architecture, and we already had a service for challenges. Ideally, our new leaderboard solution would not only address the scalability issues in our old system, but would also be a step in the direction of a more service-oriented approach.
Gauntlet is Strava’s new challenge back-end service (named after throwing down the gauntlet). Actually, it isn’t terribly new anymore — it’s been operating in production since September 2019 when it powered The Escape Plan (Strava’s first “streak” challenge). And we’ve written about components related to Gauntlet on this blog before. Mindy and Zack have written about a service to store rich activity data (Part 1, Part 2), and Gauntlet is a client of that service. And our intern Clara (now a full-time employee!) wrote about working on Gauntlet last summer. Gauntlet has already provided big improvements to Strava’s challenge ecosystem. Aside from introducing streak challenges, Gauntlet allows us to run challenges for any type of activity (old challenges only worked for runs and rides). Gauntlet also allows us to run challenges where activities that aren’t publicly visible can count toward the challenge badge. And of course, gauntlet scales much better than our old challenge implementation did, so challenges with a million or more athletes aren’t a problem!
But Gauntlet had always been limited because it didn’t support leaderboards. When we started building Gauntlet in 2019, we wanted first and foremost to experiment with new types of challenges. We experimented with non-traditional formats like streak challenges and challenges that allow a wider variety of activity types, making it easier for more athletes on Strava to participate. Our experiments with Gauntlet were successful, but it was time to double down on our investment and make Gauntlet better so it could serve as a foundational service for all challenges on Strava. Leaderboards are the heart and soul of challenges on Strava, and they create the competition that motivates many of us to get out and move. Adding leaderboard functionality to Gauntlet would not only make Gauntlet-powered challenges better, it would also let us use Gauntlet to power challenges that were powered by our legacy challenge service specifically because they need leaderboards. By designing a new, efficient leaderboard system for Gauntlet, we could stop using our old, inefficient leaderboards and improve the reliability of our system.
Designing a Leaderboard Solution
We considered a very wide variety of possible approaches to implement leaderboards on Gauntlet. We spent several months (not full-time) brainstorming different ideas, researching them, and prototyping some things. We considered solutions that used Redis sorted sets, similar to our previous approach. We considered solutions that use a relatively new statistical data structure called a t-digest. We briefly considered other non-traditional data stores that might be well suited for this purpose. And we considered solutions that use a traditional MySQL data store, like we’re familiar with from so many other parts of our application. When comparing different approaches, scalability was one of our main concerns. In particular, we wanted to ensure that our system would scale sub-linearly (less than O(N)) with the number of athletes on a leaderboard so that our new system wouldn’t encounter performance problems as our challenge participation continued to grow.
When designing a system like this, it’s important to consider how the system will be used. We understood our use cases pretty well. Our leaderboard system needed to:
- Add an athlete to the leaderboard when they join a challenge.
- Remove the athlete from the leaderboard when they leave a challenge.
- Update an athlete’s “score” when they upload new activities.
- Display the first leaderboard page quickly.
- Efficiently page through the leaderboard.
- Efficiently determine an athlete’s rank, given their score.
Redis seemed to fit the bill pretty well, but we weren’t confident about our ability to handle data recovery with Redis and we also didn’t want to use Redis for long-term storage of challenges that had ended — which still need accurate leaderboards but receive much less traffic than active challenges. If we wanted to use Redis, we’d need to design some way to export data to another data store either continuously or after the challenge has ended. So we didn’t rule out Redis as an option, but we weighed these factors in our decision.
MySQL was also an appealing option. Many of our engineers are very familiar with it, and our foundation (ops) team is also familiar with it. It’s a good solution for long-term storage at the scale we need, and handles almost all of our use cases very well. Traditional SQL databases like MySQL are good at performing CRUD operations while keeping a data set ordered (e.g. by using an index). The one problematic use case for MySQL is efficiently determining an athlete’s rank. (More on that below.) Ultimately, we determined that MySQL fit our needs best and was also a system we were very comfortable with. (We know the system well, we have engineers who are experts with it, and we understand its failure modes and limitations and how to work around them.)
Querying Rank from MySQL Efficiently
When we need to determine an athlete’s rank, we already know the athlete’s score (their distance in a distance-based challenge or speed in a speed-based challenge). So we need to query our database to get the rank of that particular score. Querying for the “rank” of a “score” is really just counting the number of entries (rows) ahead of that score on the leaderboard. The MySQL query might look something like this:
SELECT count(*) FROM leaderboards WHERE challenge_id = ? AND score > ?
We can make this query more efficient by using an index on (challenge_id, score). When that index exists, MySQL will use it to avoid the need to sort the data each time it performs the query. Great! But there’s still a problem that can make this query pretty inefficient. Even when using the index, MySQL will perform this query by counting scores that match the where clause. This will be very fast for the athletes near the top of the leaderboard, where it will only need to count a few rows. But for athletes near the bottom of the leaderboard, it will need to count most of the rows in the leaderboard. In other words, this query scales as O(N) with the number of athletes on the leaderboard, and we’re looking for better performance than that! We need to improve the performance of this query if we’re going to use this method at scale.
But there are some tricks we can use to make the performance better! One simple technique that could work well is to “cache” the score at specific ranks. For example, every 1,000 ranks, we could store what the score is for that rank. So we know the score for rank 1,000, rank 2,000, and so on. When we need to look up the rank for any score, we first look up the rank for the worst cached score that’s better than ours. That looks like this:
SELECT score, rank FROM leaderboard_cache WHERE challenge_id = ? AND score > ?
ORDER BY score ASC LIMIT 1
After finding that rank, we can query the full leaderboards table for the rank of the exact score we care about, but we can avoid counting records up to our cached score — we already know the rank of that score. That query looks something like this:
SELECT count(*) FROM leaderboards
WHERE challenge_id = ? AND score < ? AND score > ?
In other words, we count the (small number of) records between the cached score and the exact score we want to look up, and add that count to the cached rank to get the exact rank. It’s an O(log N) operation to find which rows to count (using the index), and an O(N) operation to actually count them. But we know we won’t count more than 1,000 rows, so the O(N) part is actually bounded by O(1,000), so we can treat it as constant time! The cache lookup is also an O(log N) operation, so our complete rank lookup can be performed in O(log N) time for a leaderboard with N athletes. Using this approach lets us use the MySQL option with sub-linear performance for all of our data access methods!
Of course, there’s some additional complexity when we update information in the leaderboards table. We not only need to update the leaderboards table itself, but we also need to update the leaderboard_cache table so it remains accurate. Without going into too much detail, we do this process in batches for efficiency. When we get a batch of leaderboard updates, we calculate the changes to our cached ranks for the batch. For example, if we’ve cached rank 1,000 with a score of 8,000m (distance), and 3 athletes in the batch have done better than that score, we know that the cached rank for the score of 8,000m should now be 1,003. Thus, when we process a batch of updates, we update each of our cache rows at most once. When necessary, we perform an additional rebalancing step to keep our caches spaced roughly how we want them, but we allow plenty of variance here so this doesn’t happen too often. And we can tune the system by adjusting our cache spacing, batch size, and rebalancing operation.
When we release big changes (or even small changes) at Strava, we like to do so in a way that’s as safe and controlled as possible. This leaderboard upgrade was no exception, and we used a dark launch to roll this out. Strava has a mature “feature switches” platform that helps us control access to different code paths and perform dark launches. In this case, we enabled leaderboard writes to our new leaderboards backend on a per-challenge basis, and enabled database writes on several production challenges before the leaderboard was visible anywhere. This allowed us to make sure our backend could take a full volume of writes for many concurrently running challenges before we ever rendered any leaderboard data to the front-end. When we were ready, we used our feature switches to enable the leaderboard front-end for select internal users on our team while we put the finishing touches on things to wrap up development. This allowed us to get early feedback from our product owners and internal users on a live version of the product before athletes could see it.
Our internal rollout was a success, and we began enabling our new Gauntlet-powered leaderboards on some athlete-visible challenges in February, 2021. You might not have even noticed the rollout — this was primarily a backend change, so the new leaderboards look very similar to the old ones on the web and identical to the old ones in our mobile apps. They have essentially the same functionality as our old leaderboards, but the new ones are much faster, and work on a wider variety of challenge configurations!
It’s hard to overstate the performance impact of the new leaderboard system. In one test, the old leaderboard system took more than 40 seconds to load the United States country leaderboard for a monthly running challenge. By contrast, a roughly-equivalent monthly running challenge on the new leaderboard system loaded the equivalent United States country leaderboard in under 250ms. On average, the response time of the new system (in blue on the graph below) is more than ten times faster than the response time of the old system (in green). (Note the log scale on the graph.)
When a website or mobile app page takes more than ten seconds to load, it feels broken, so the performance improvements we’ve made to our leaderboards make our website and app better in a meaningful way, even though we don’t necessarily expect a lot of people to notice when something simply works faster. In addition to the performance gains, our new leaderboards are an investment in our Gauntlet backend service as a platform to power challenges at Strava, and should help us continue to build new and exciting challenge features well into the future!