How JioCinema live streams IPL to 20 million concurrent devices?

Published in

Debugging Diaries

2 min readMay 23, 2024

While scrolling through YouTube, I came across a recommended video that piqued my interest. Fascinated by large-scale systems, I decided to listen to the conversation between Arpit Bhayani and Prachi Sharma. Here are key take aways from this:

Handling System Failures and Setting Priorities

Things will inevitably go wrong. No matter how much effort you put into making your system robust, failures are bound to happen. It’s all about how you handle these issues and set your priorities correctly. Prachi shared valuable insights on how they classify features into P0, P1, and P2 categories. By prioritizing these features appropriately, they can take action and address issues effectively while keeping the system running smoothly.

Ensuring Seamless User Experience Despite Errors

Prachi also emphasized that customers should not always be presented with an error screen when something goes wrong. For example, if the chat feature fails during a live match, it shouldn’t interrupt the entire stream with an error message. Instead, the main content should continue running smoothly, ensuring a seamless experience of primary features for the users even if secondary features encounter issues. This approach helps maintain user satisfaction and minimizes disruptions.

The Importance of Efficiently Scaling Down

While many focus on scaling up when discussing large-scale systems, Prachi highlighted the importance of efficiently scaling down. After a game, it’s not feasible to simply shut down all the extra systems, as a significant number of users remain on the platform exploring other content. To address this, they use a laddering approach to scale down gradually, adjusting the system based on the traffic on the platform. This method ensures resources are used efficiently while still accommodating the ongoing user activity.

Implementing Panic Mode for Critical Failures

One particularly intriguing aspect of the conversation was the discussion about “panic mode.” In the event of a critical failure, such as an issue with API calls preventing users from making requests, they have designed a backend feature to handle this scenario. When activated, this feature returns static responses through the CDN wherever possible. This proactive approach demonstrates their commitment to creating foolproof systems and their strong belief in mitigating issues to the best of their ability before informing the customers.

Conclusion

Alright, folks, that’s a wrap for this blog! We’ve picked up some cool strategies for handling big system challenges. Remember, when things go wrong, it’s all about keeping the user experience seamless, whether it’s through smart scaling or innovative panic modes. So, next time you’re browsing your favorite platform, know that there’s a team working hard to keep things running smoothly behind the scenes. Thanks for joining in, and here’s to glitch-free browsing ahead!