Scaling our Services for NFL Season

Neil Ferguson
FanDuel Life
Published in
4 min readOct 19, 2022

If you happened to ask me, “What is it like during the start of the NFL at FanDuel”? I would probably say, “Nothing really happens much, it’s pretty quiet”. But what I really mean is that nothing really happens technically, that is, not now. If you asked the same question 2 years ago, the answer would have been very different.

What is NFL?

NFL Season (American Football) runs from mid-September through to February the following year culminating in its final event, Superbowl. During both the start and the end of this period, vast surges in traffic hit our servers. If the servers, the services, and the databases can’t cope with these spikes in volume, then bad things start to happen. At peak traffic, our core services can see around 140,000 logged in users. When we talk about our core services, we are talking about our Account and Wallet where users login, where we store wagers won and lost, where we deposit money and withdraw money. So, it’s important we make sure it doesn’t break, otherwise our customers cannot login and use our product.

The start of the 2020 NFL season was very much an eye-opening experience and one that started a fascinating and very successful technical journey for FanDuel’s core engineering teams. Let’s just say that back then, there were not many “quiet” moments during our war room Zooms.

When did our problems start?

Let’s go into more detail. So, during one of the busiest games and as we approached peak traffic, we started to see warnings and errors in our monitoring systems, Datadog and AWS. In a FanDuel war room, we have many people online. Everyone constantly monitors everything in the hope that nothing happens, but then we started seeing issues with users trying to login, we saw alerts on replication lags between servers. Our customers were being impacted and we were seeing some of the systems grinding to a halt. One of our 3rd party systems was severely struggling with the increase in logins but had no way to rate limit the traffic. This failure started impacting our core system and our user experience started degrading rapidly. We had no choice but to switch some of our users to maintenance mode, at least until we could figure out why once the NFL game ended.. Our engineers began investigating the problem and agreed on a mitigating “fix” until we could get through Superbowl. This fix was to rate limit groups of users (based on specific criteria) hitting our 3rd Party systems. If we protected them, we protected FanDuel. It worked and got us through the Superbowl. However, we had to fix this and many other issues before we reached NFL Season 2021. The impact on our customers was big, but our competition also had problems and their customers tried to use our product instead resulting in a further increase in traffic.

How did we fix these problems?

The following month in March 2020, we spun up a Scaling team of 8 dedicated Engineers who started to analyse the vast amount of data gathered during this season. We posed the problem of “go fix NFL 2021, make it better”. They had only 6 months so we prioritised this work, supported the team 100% and they delivered a huge programme of work in time for NFL 2021. Way too much to write about here on this post but it was a huge success.

Here is snippet of some of the problems they solved:

  1. Rate limiting users logging in to protect our 3rd party
  2. Reducing replication lag and latency between our databases
  3. Component load testing and system performance testing
  4. Migrating our systems from VMWare to AWS Outposts
  5. Scaling our API services
  6. Game Day testing, did our fixes work?
  7. Scaling our 3rd Party Vendors
  8. What was the impact of the problems we encountered?

Do you want to hear more of the tech detail?

We would love to share these with you in much more technical detail, so please let us know in the comments which topic you would like to know more about, and we will write a detailed technical article on how we did this. Thanks for reading.

Written by Neil Ferguson, Software Engineering Senior Director (Wallet)

--

--

Neil Ferguson
FanDuel Life

Technology and people fascinate me. I work with many incredibly talented people who constantly achieve great outcomes. I would love to share this with others,