Part II — Behind the Scenes: The 2023 FOX Sports Super Bowl LVII Streaming Application Technical Architecture

Brian Gilbreath
FOX TECH
Published in
11 min readOct 30, 2023

When streaming the Super Bowl, the most watched event in the US each year, you have one shot to hit the target. The system either stays up without issues, or it goes down. If it goes down, well, you can imagine what the fallout from that would be…

As one of the technical program leads on this year’s Super Bowl, I had the job of architecting and coordinating the overall engineering enhancements to our everyday streaming system. The goal was to transform our existing architecture into a new parallel architecture that could handle the exponential increase in initial and sustained requests. This effort spanned across multiple systems, teams, countries and time zones. And, it all had to be orchestrated over Zoom meetings with decentralized teams around the globe.

Requirements

We established a few baseline requirements in order to deem the architecture successful:

  • The system needed to withstand the impact of millions of people hitting it at the same time, and run with that sustained traffic for the entirety of the game coverage.
  • We needed to be able to turn it on and off within a few seconds.
  • The app required a new UI to showcase the branded experience and live stream on the home screen of each FOX Sports app.

The team knew we needed enhancements from the get-go. First, we had to estimate the increase in streaming demand since the last Super Bowl. Streaming over the last decade has notably grown year-over-year for the Super Bowl.

Super Bowl streaming by the numbers (Source: thestreamable.com)

Second, the team knew the upper limit of what our day-to-day streaming setup could handle. While we had a good idea of the system’s max concurrent numbers from traffic sustained during the FIFA World Cup Qatar 2022™️, we knew without a doubt that the system’s support was nowhere near the expected Super Bowl numbers!

Conclusion: our infrastructure needed to handle A LOT more traffic! Translation: a LOT more work needed to be done!

So how did we build out and prepare for the Super Bowl on the tech side? By going back to the basics: simplicity, stability and scalability.

Simplicity: The Architecture

For streaming the Super Bowl you need to do two things perfectly:

  • Get the apps to load
  • Get the stream to play

Everything after that is icing on the cake. Reducing system I/O, HTTP requests, and computing are paramount. Creating redundancy and backups across CDNs and regions are a necessity. Failure though is not an option. Thus, we created a few different modes, nicknamed “Super Bowl modes”, ie “SB5” and “SB0”.

So what makes SB5 so much more simple, stable and scalable?

Quite plainly: the architecture. For SB5, it would be a stripped down version of the day-to-day system, with most of the dynamic capabilities removed or pre-rendered. We wanted little to no databases, no large queries, and no heavy network traffic in the critical path. While we estimated the expected traffic, this is the Super Bowl and we wanted to be confidently prepared for next-level scale! Therefore, every service and API request needed to be accounted for and statically generated.

Getting the apps to load

The strategy of pre-generating these service responses would account for 80% of the application source API’s requests. There are dozens of endpoints required to load the app and each would need to be indexed, cataloged and pre-rendered including: configs, content queries, screens, collections, layouts, authentication, authorization, favorites and many more. These would be uploaded and served from static files. Pre-generated files with the responses would eliminate the need for internal requests, computing and database lookups.

The team was able to do this well before the game because we knew what the state of the app would be the day of the Super Bowl… down to the hour, minute and second. An additional benefit of simplicity was that FOX was offering the stream unauthenticated, so authentication APIs could be pre-rendered as well. Critical to the SB5 setup was having the API responses from these files match all normal output of the everyday system. Due to the timelines and testing, the FOX Tech front-end engineers would only have a small window of time to deploy the apps and make sure all users had the new versions before the game. So matching the responses was key to reducing the modifications that would need to be made to the FOX Sports streaming clients.

For backups to the file hosting, the S3 files were also duplicated to multiple regions on AWS (active:passive), so if us-east-1 went down, we had a backup we could swing to.

In front of those S3 responses, we also had multiple CDN’s fronting the output of the files with reverse proxy caching enabled. We utilized a 50/50 split of two major CDN providers. Again, in case a CDN went down, we had the capability to swing all traffic to the other CDN. To balance traffic routing and decisioning between the CDN’s, we used DNS-based resolver services.

The FOX Tech team was pretty confident that with this setup, 80% of the app’s APIs could take on a tidal wave of traffic and load the app.

Getting the stream to play

This left the system’s dynamic services to deal with, which centered around playback and accounted for the remaining 20% of apps requests.

The team elected to keep these API’s dynamic for SB5 in order to collect valuable streaming data, and enable more advanced playback capabilities. These capabilities included being able to point traffic at our different video systems, and balance out traffic to different CDN providers.

Similar to the static systems, these dynamic systems had a lot of extra functionality removed to reduce I/O and computing, increase local caching, and availability bolstered by multi-region support for the EC2 instances running them. The playback services were coded in Golang which is highly performant.

Despite the simplifications on the playback services, the FOX Tech operations team took no chances. The AWS infrastructure used m5d.4xlarge EC2 types scaled up to 2000 instances per each region, for a total of 4k instances. The setup left us with plenty of room to scale vertically and horizontally, if needed. Each instance included 64gb of memory and could handle up to 10gb of bandwidth. Even at that mid-tier instance level, that’s a lot of horsepower for what is essentially a few Go endpoints!

Again, like everything for streaming the game, these playback systems also had lots of redundancy built in. The video services had two complete end-to-end streaming providers sourcing the live streams: The FOX in-house OVP called Media Cloud Live, and another trusted 3rd party provider. Within the video systems, FOX utilized a total of 5 CDN providers to host the HLS streams; so there was a LOT of distribution across the United States.

During the game, depending on the video CDN’s performance, FOX Tech’s playback services could direct a certain percentage of traffic to each of the CDN’s to maintain the best performance for the end user. This was also a critical backup in case of CDN failure.

So what about the systems we didn’t control? In case of 3rd party system downtime, almost every partner FOX Tech relied on for services–like DNS, CDN, Hosting, and Analytics–had representatives in the command room with our engineers to debug problems if they came up as well as a backup solution in case of failure. It was a true team effort both internally and externally.

Super Bowl “SB5 mode” system architecture (Source: FOX Tech)

Testing and Enabling

We employed a variety of load testing techniques and systems to replicate “day of’’ game traffic. The FOX Tech SRE teams used K6 to replicate traffic up to 15k RPS (requests per second), which was only a small fraction of the game day capacity we needed to account for. But even this small load helped reveal bugs within errors in our setup. The team would then fix the errors, and try again. We did these load tests a total of 21 times to iron out every bug we could find. With this much traffic, even a single bug or missing key:value field in a response object would be catastrophic!

For the larger tests to replicate millions of users, we utilized multi-node load testing services– which used resources from 20 data centers from around the US to slam our system with real traffic emulating the requests from the FOX Sports apps. In the end, the large scale tests were able to replicate 100m RPS from users on our system. Yes, you read that right, 100 million requests per second. Slight overkill in retrospect but well worth the exercise. After these tests, the team was pretty confident in the resiliency of the SB5 setup.

To turn the SB5 system on and redirect all traffic to it, a simple CDN proxy would be our friend.

Example of Super Bowl Mode (SB5) being enabled during large load test — origin traffic dropping to near 0, and performance of SB5 API services skyrocketing with sub millisecond response times.
(Source: FOX Tech)

Our two major CDN providers would be independently configured to point at the new SB5 origins, which would be deployed to run in parallel to the everyday system.

For enabling the front end experience, the apps would use a feature flag. This would essentially turn the FOX Sports app into a dedicated Super Bowl page that replaced the homepage. No need to navigate to a different page on our apps or websites on game day, we know what you’re there for!

Toggling the SB5 system on and off would essentially be a single click of a button on the back end with a proxy config update and on the front end with a feature flag. On Super Bowl Sunday, the proxy would be enabled a few hours before the pre-game coverage started, and the feature flag in the apps would be turned on once we were confident the SB5 APIs were live.

Keeping these switches simple was important for clarity (“are we live yet?”) and for seamless backwards compatibility to maintain the user experience before and after the game. Within a few seconds, we could tell if traffic to our SB5 services was being served. The teams were able to confirm SB5 was up, running and performing as expected using a variety of tools including: CDN logs, infrastructure dashboards, data analytics dashboards, and external monitors pinged from HTTP request tools.

Additionally, something not yet mentioned about the main architecture is that we put in place a backup to the backup. This was an even simpler Super Bowl system we could fall back to in case our dynamic video services went down. We called this SB0 mode. This mode was put in place as a “worst case scenario”.

So what would an SB0 event look like? Could be something like: our EC2 instances are getting overwhelmed, our memcache runs out of memory or our API gateway is going down.

SB0 could be triggered in two different ways, automatically or manually. The SB5 video system had failsafe switches built in to automatically fall back to this mode in the case of specific response errors or timeout. On the other hand, if a non-predicted failure was detected by engineers, the operations team could also manually trigger a CDN proxy to enable this mode as well, but only as a last resort.

Enabling SB0 mode would turn the video system into its most rudimentary form, a single file response with a single HLS stream source, making us blind to all traffic, stats and control, but maintaining the video feed for the end user.

Test verifying SB5 mode had been enabled for /watch playback service. One of thousands of tests performed as a monitor during the game. (Source: FOX Tech)

Aftermath

The SB5 system was bulletproof and never skipped a beat. During the game, 100 or so engineers huddled together in a few rooms in our Tempe, AZ Broadcast Center, the Mothership for all things Super Bowl. In fact, the Super Bowl streaming operations were pretty much uneventful. This is an amazing thing to say about such a stressful event which could turn chaotic in a matter of seconds. We did our job and prepared well. And, our engineers actually got to enjoy watching most of the game!

Overall, the system as a whole sustained well over 7+ million concurrent users playing back the streams without any issue. At the time of this writing, the most streamed Super Bowl ever.

Data dashboard showing caching efficiency on CDN between static and dynamic services. (Source: FOX Tech)

During peak traffic, the system hit a max of 10m+ RPS across the two CDNs, and the apps loaded in around ~500ms. The CDN’s had 99% cache efficiency on the static endpoints, and 100% uptime. The dynamic watch endpoints scaled across the multiple regions and the EC2 instances were never stressed. Within all this, I never mentioned the HLS stream was in HD! FOX is the only broadcaster to date to stream the Big Game in HD, and this is the second time we’ve done it–first being the 2020 Super Bowl. Additionally, a cool data point we learned after the game revealed FOX’s video feeds had only a 1 second delay from the game action and actually streamed faster than the cable broadcast feed!

So did anything go wrong? Well, sure. We had a few small issues to deal with during the game, like one of our video CDN providers going down. Due to the built in redundancy, the team was able to redirect the failing CDN’s requests to the other video CDN providers. Thankfully, we never had to fallback to SB0.

And that’s it!

A final note from myself to some technical readers: yes, this architecture might seem overkill, or even simplistic at the engineering level to pre-render API responses. But when you have ONE CHANCE to be successful, better be safe than sorry. Treat the effort like a multi-billion dollar NASA rocket launch with priceless cargo. Because for all intents and purposes it is. We wanted the game to be the headline story the next day, not us.

ABOUT FOX TECH

Make Your Mark Here.
At FOX, we pride ourselves in shaking things up and making things happen. We’re a community of builders, operators and innovators and each and every day we experiment, collaborate, and co-create to develop the next world of news, sports & entertainment streaming technology.

While being one of the most well-known brands in the world, we provide our employees with the culture of a start-up — fast paced, non-hierarchical, full of smart ideas & innovation and most importantly, the knowledge that each member of the team is making a difference in defining what’s next for FOX Tech. Simply put, we love to do great work, with great people.

Learn more and join our team: https://www.foxcareers.com/

--

--