Optimizing Content Quality Control at Netflix with Predictive Modeling
Over 69 million Netflix members stream billions of hours of movies and shows every month in North and South America, parts of Europe and Asia, Australia and New Zealand. Soon, Netflix will be available in every corner of the world with an even more global member base.
As we expand globally, our goal is to ensure that every member has a high-quality experience every time they stream content on Netflix. This challenging problem is impacted by factors that include quality of the member’s Internet connection, device characteristics, content delivery network, algorithms on the device, and quality of content.
We previously looked at opportunities to improve the Netflix streaming experience using data science. In this post, we’ll focus on predictive modeling to optimize the quality control (QC) process for content at Netflix.
An important aspect of the streaming experience is the quality of the video, audio, and text (subtitle, closed captions) assets that are used.
Imagine sitting down to watch the first episode of a new season of your favorite show, only to find that the video and audio are off by 20 seconds. You decide to watch it anyway and turn on subtitles to follow along. What if the subtitles are poorly positioned and run off the screen?
Depending on the severity of the issue, you may stop watching, or continue because you’re already invested in the content. Either way, it leaves a bad impression and can negatively impact member satisfaction and retention. Netflix sets a high bar on content quality and has a QC process in place to ensure this bar is met. Let’s take a quick look at how the Netflix digital supply chain works and the role of the QC process.
We receive assets either from the content owners (e.g. studios, documentary filmmakers) or from a fulfillment house that obtains content from the owners and packages the assets for delivery to Netflix. Our QC process consists of automated and manual inspections to identify and replace assets that do not meet our specified quality standards.
Automated inspections are performed before and after the encoding process that compresses the larger “source” files into a set of smaller encoded distribution files (at different bitrates, for different devices, etc.). Manual QC is then done to check for issues easily detected with the human eye: depending on the content, a QCer either spot checks selected points of the movie or show, or watches the entire duration of the content. Examples of issues caught during the QC process include video interlacing artifacts, audio-video sync issues, and text issues such as missing or poorly placed subtitles.
It is worth noting the fraction of assets that fail quality checks is small. However, to optimize the streaming experience, we’re focused on detecting and replacing those sub-par assets. This is even more important as Netflix expands globally and more members consume content in a variety of new languages (both dubbed audio and subtitles). Also, we may receive content from new partners who have not delivered to us before and are not familiar with our quality standards.
Predictive Quality Control
As the Netflix catalog, member base, and global reach grow, it is important to scale the manual QC process by identifying defective assets accurately and efficiently.
Looking at the data
Data and data science play a key role in how Netflix operates, so the natural question to ask was:
Can we use data science to help identify defective assets?
We looked at the data on manual QC failures and observed that certain factors affected the likelihood of an asset failing QC. For example, some combinations of content and fulfillment partners had a higher rate of defects for certain types of assets. Metadata related to the content also showed patterns of failure. For example, older content (by release year) had a higher defect rate, likely due to the use of older formats for the creation and storage of assets. The genre of the content also exhibited certain patterns of failure.
These types of factors were used to build a machine learning model that predicts the probability that a delivered asset would not meet the Netflix quality standards.
A predictive model to identify defective assets helps in two significant ways:
- Scale the content QC process by reducing QC effort on assets that are not defective.
- Improve member experience by re-allocating resources to the discovery of hard-to-find quality issues that may otherwise be missed due to spot checks.
Using results from past manual QC checks, a supervised machine learning (ML) approach was used to train a predictive quality control model that predicts a “fail” (likely has content quality issue) or “pass.” If an asset is predicted to fail QC, it is sent to manual QC. The modified supply chain workflow with the predictive QC model is shown below.
A key goal of the model is to identify all defective assets even if this results in extra manual checks. Hence, we tuned the model for low false-negative rate (i.e. fewer uncaught defects) at the cost of increased false-positive rate.
Given that only a small fraction of the delivered assets are defective, one of the main challenges is class imbalance in the training data, i.e. we have a lot more data on “pass” assets than “fail” assets. We tackled this by using cost-sensitive training that heavily penalizes misclassification of the minority class (i.e. defective assets).
As with most model-building exercises, domain knowledge played an important role in this project. An observation that led to improved model performance was that defective assets are typically delivered in batches. For example, video assets from episodes within the same season of a show are mostly defective or mostly non-defective. It’s likely that assets in a batch were created or packaged around the same time and/or with the same equipment, and hence with similar defects.
We performed offline validation of the model by passively making predictions on incoming assets and comparing with actual results from manual QC. This allowed us to fine tune the model parameters and validate the model before deploying into production. Offline validation also confirmed the scaling and quality improvement benefits outlined earlier.
Predictive QC is a significant step forward in ensuring that members have an amazing viewing experience every time they watch a movie or show on Netflix. As the slate of Netflix Originals grows and more aspects of content creation — for example, localization, including subtitling and dubbing — are owned by Netflix, there is opportunity to further use data to improve content quality and the member experience.
We’re continuously innovating with data to build creative models and algorithms that improve the streaming experience for Netflix members. The scale of problems we encounter — Netflix accounts for 37.1% of North American downstream traffic at peak — provides for a set of unique modeling challenges. Also, we partner closely with the engineering teams to design and build production systems that embed such machine learning models. If you’re interested in working in this exciting space, please check out the Streaming Science & Algorithms and Content Platform Engineering positions on the Netflix jobs site.
Optimizing the Netflix Streaming Experience with Data Science
deep analysis and predictive algorithms
Originally published at techblog.netflix.com on December 10, 2015.