Behind the Scenes: How JioCinema Seamlessly Streams IPL Matches to 20 Million Fans

9 min readMay 25, 2024

It’s IPL season and it feels like every single one in India is glued to their screens watching the IPL. No, seriously, I mean every SINGLE one is watching it. Your friends, family, neighbour, strangers on the road, everyone! Well, except for a few like me who get a bit impatient waiting for the fielder to pick up the ball, throw it back, and then wait for the next ball. It just takes too long — sorry, not sorry. But that’s not what I want to talk about here.

While watching the IPL match recently, my eyes immediately shifted towards the top right corner of the screen which displayed the number of people watching the live stream. A whopping 17M/1.7 crore people were watching it! So normally, I wondered how JioCinema handles such traffic with such ease. After scraping the internet I found out, it’s not that easy, and it’s because of the things they do that make it seem easy to us on the client side.

So let’s dive into some of the strategies JioCinemas employs to stream IPL matches to above 20M/2 crore fans around the globe.

Comprehensive Metrics and Audits

To ensure everything operates smoothly, JioCinema conducts regular audits and reviews of key system components:

Audit Review: Regular checks of CPU, memory, CDN, network, and databases to identify and address potential breaking points. Partners also participate in these audits to ensure comprehensive coverage.

Performance Audits: Audits span from frontend elements to backend systems, including CDNs, load balancers, services, and databases.

Frontend Optimization

Ensuring a seamless user experience starts with a stable and efficient frontend:

Feature Flags

All changes that are pushed to production are introduced behind feature flags to monitor their impact. If a feature causes an increase in crash rates, it is promptly switched off. This type of control over the features allow JioCinemas to efficiently prioritize their services.

Scenario Simulations

JioCinemas uses a tool called Charles which is used for trying out various bandwidths and latencies with your Internet connection. It allows you to simulate modern conditions using your high-speed connection. The bandwidths can be throttled to any arbitrary bytes per second and this enables any connection speed to be simulated.

Using this, various scenarios such as network requests, API calls, DNS failures, and backend errors are simulated to ensure readiness of the application.

Graceful Degradation

Failures are managed to be invisible to users. This is something even I’ve found interesting. Normally when I think about error handling my mind always shifts to:-

Obviously, handling the error
Displaying the error

But I just realized that displaying errors is not always necessary. You can hide the error and make users forget something isn’t working by utilizing clever strategies. Suppose the personalized home screen UI feature doesn’t work for a moment — there’s no need to worry! You can simply display the normal home screen instead of showing an error to the user. User satisfaction is the most important factor when developing a SaaS product; everything else comes next. They also implement a technique called exponential backoff. Suppose an API call fails the first time but works after making repetitive calls, perhaps three times. This approach is not very efficient and can lead to server overloading. In such cases, they use exponential backoff, a popular technique for handling failed HTTP requests. This algorithm increases the waiting time between retries exponentially. For example, the first retry waits for 1 second, the second retry waits for 2 seconds, and so on. This helps prevent server overloading.

Backend Infrastructure

The backend infrastructure is designed to handle high loads and ensure reliability:

Multi CDN Strategy

JioCinema uses multiple CDNs to avoid over-reliance on a single provider. They have an in-house service called the multi-CDN optimizer, which monitors the load on each CDN and selects the least busy one for the client. Essentially, when a client sends a request to the backend, the backend leverages this multi-CDN optimizer to choose the optimal CDN. The selected CDN then handles the request and returns the response.

Database Scaling

Scaling databases to handle massive concurrent traffic is one of the most challenging aspects of streaming IPL matches. You might be thinking that just switching on the “auto-scale” feature in your cloud provider will do all the work for you. But this is not the case for a huge live streaming platform like JioCinemas. The auto-scale feature for any cloud provider will take around 30–60min to work, and by the time the live video will already be impacted in a negative way. So JioCinemas employs a different method.

JioCinemas prepares for database scaling in advance because they can’t request the cloud providers for X nodes out of the blue. They need to plan and communicate with the providers months prior to live streaming. JioCinema performs a back-of-the-envelope-calculation by having each backend system owner start with a single Kubernetes pod. They set specific resource thresholds, such as maintaining CPU utilization at 60% and keeping memory usage within a safe range, to ensure that APIs respond within a few seconds. To prepare for up to 20 million concurrent users, they simulate the expected load, which involves calculating the number of API calls and subsequent database queries. These simulations help them estimate the number of requests and the resulting database calls needed to maintain performance at the desired CPU utilization and latency levels. Based on these simulations, they determine the necessary number of nodes and prescale the database infrastructure before the game begins, ensuring that the system can handle the high traffic seamlessly.

Panic Mode

Suppose the app experiences sudden unforeseen spikes, such as when Dhoni comes to bat, causing a surge on the playback page. When Dhoni gets out, many fans switch to the home page, resulting in spikes in other APIs. Yeah, typical CSK fans. In such scenarios, JioCinema activates Panic Mode.

Automated snapshots of expected API responses are taken before matches and stored in static storage. This allows the system to handle high-load scenarios without real-time processing. During Panic Mode, the CDN is configured to retrieve data from the static storage servers instead of the origin servers.

Cache Management and Offloading

Increasing cache TTL on the client side helps reduce server load. Also, cache offloading is something they really focus on. Normally a backend engineer would want all the requests to come to their origin server and to boast about how many transactions per second it can handle. But the real engineering is when you’re able to perform efficient cache offloading, at least above 90%. The way way you design the caching policy at the CDN is the most important factor.

If you don’t know what cache offloading is, it’s basically the process of serving user requests directly from the cache (CDN’s local storage) rather than fetching from the origin server. And a cache offload of above 90% means that at least 90% of the requests are served by the cache. A caching policy includes how you cache the data, what kind of data you cache when to evict, which to evict, when to replace it with new content, how long to keep it, and such. So designing a good policy is also very crucial for achieving a high cache offloading efficiency.

Ladder Scale Down

We’ve talked a lot about scaling up in the above sections, but what about scaling down? Does JioCinema know about a concept called “downscaling”? Thankfully, they do. JioCinema follows a ladder-based approach for scaling down, meaning they do it gradually. Instead of dropping from 30 million to 2 million users abruptly, they scale down smoothly from 30 million to 25 million, then to 20 million, and so on. This gradual downscaling happens after the match is over because most users don’t exit the app immediately; they might stay to watch another show, series, or match. This process is done alongside liveliness checks and readiness probes to ensure stability as traffic decreases.

Asynchronous Processing

To manage tasks that don’t need real-time processing, JioCinema relies heavily on Kafka:

What is Kafka?

It’s a distributed event streaming platform used for building real-time data pipelines and streaming applications. It was originally developed by LinkedIn and is now an open-source project under the Apache Software Foundation. Kafka is designed to handle high throughput and low-latency data transmission

Now that we know what it is, let’s see how JioCinemas uses Kafka

Kafka Utilization

Kafka handles asynchronous processes with considerations for throughput, producer/consumer rates, and partitions. Data is processed later from local storage if Kafka is down, ensuring smooth operation during live matches.

One such example is the live number of viewers tracker on the top right of the screen, which uses Kafka. So how I assume this works is, that the consumer here doesn’t want to know the exact number of viewers at the exact time, so it’s technically not real-time. Hence the producer can send data to Kafkas without waiting for the consumer to process it, and the consumers can read and process the data independently. The client side can even store the data locally and process it later when ready.

Ad Insertion Mechanism

This is a very interesting topic on its own and deserves a separate article on it. But let’s look at how JioCinemas does this without going much into depth. Delivering ads without disrupting the viewing experience is crucial from a business stand point as well as a user.

Live Stream Ad Insertion

You might be wondering, who decides when to play the Ad? Is it all automatic or does someone actually direct it? Well you probably guessed this wrong, there’s actually a “director” for each stream who listens to the commentary and watches the match, and decides when to trigger the Ad to play. The reason I say each stream is because the live-stream comes from different languages, so each language stream has its own director.

So there are different types of Ads and lets see which one JioCinemas utilizes

Static Ads

Here, a selected Ad is displayed throughout the network to every single user. Now imagine the entire 20M users getting the same Ad. This can lead to over-delivery of the Ad. Overdelivery is when an Ad platform delivers the Ad to more people than what was originally contracted or budgeted for. JioCinemas doesn’t want this.

Client-side Ads

Client-side Ad insertion is an interesting topic. It’s often triggered by SCTE-35 markers. SCTE-35 markers are a standard, which are embedded signals within the video stream that indicate the precise moments for Ad placement. SCTE is short for The Society of Cable and Telecommunications Engineers and the SCTE-35 markers are primarily used in most OTT workflows. It can be used in both HLS and MPEG-DASH stream. Here’s an example of how we can place an SCTE-35 marker in a MPEG-DASH stream.

<MPD>
<Period start="PT0S" id="1">
   <!-- Content Period -->
</Period>
 
<Period start="PT32S" id="2">
    <!-- Ad Break Period -->
   <EventStream timescale="90000"
    schemeIdUri="urn:scte:scte35:2014:xml+bin">
     <Event duration="2520000" id="1">
       <Signal xmlns="urn:scte:scte35:2013:xml">
         <Binary>      /DAlAAAAAAAAAP/wFAUAAAAEf+/+kybGyP4BSvaQAAEBAQAArky/3g==
         <Binary>
       </Signal>
      </Event>
    </EventStream> 
</Period>
 
<Period start="PT60S" id="3"> 
   <!-- Content Period -->

The above XML is part of the MPEG-DASH MPD (Media Presentation Description) which describes the structure of the media stream. It includes 3 periods: A content Period, an Ad Break Period & again a content period. If you notice, the Ad Break Period has an attribute “start=’PT32S’”, and as you can guess, it means that the Ad will start from 32 seconds.

So, when the client side detects the marker, it cues the Ad. However, this method can cause shifts in the average viewing rate (AVR), as the ad content and its playback may differ from user to user, leading to variations in the overall viewing experience

Server-side Ads

The users are categorized into cohorts based on their geography/interests/entertainment viewers/sports viewers/etc. Ads are then targeted based on user cohorts. The server then pushes the Ad to the client immediately.

Conclusion…

What we’ve looked at is a high-level overview. Before I dive into writing in-depth blogs about system design concepts, I believe I need to study them more thoroughly myself. Hopefully, I’ll be able to publish detailed system design-related blogs soon. In the meantime, I recommend watching this video by Arpit Bhayani to gain a better understanding of what we’ve just discussed. Thanks for reading!