Unveiling the Age-Old Question: How User Rating Scores Correlate with Audience Content Preferences

4 min readNov 22, 2019

Most of us consume content on various platforms and in different ways. Some prefer old-fashioned shows, while others seek out new or trending shows online. Some choose to watch a film or binge-watch their favorite series. This brings us to one of the most popular platforms: Netflix. This platform has transformed the status quo and changed the way we view content, whether on the go or in the comfort of our living rooms.

As a former filmmaker and a future data scientist, my curiosity lies in understanding the variety of content available for individuals to consume and the ideology behind selecting and suggesting content for viewers.

In this dataset, the individual gathering method relied on Netflix’s suggestion engine due to the vast amount of time it would take to collect 1,000 shows one by one. The suggestion engine recommends shows similar to those selected in this dataset. “As part of this data set, I took 4 videos from 4 ratings (totaling 16 unique shows), then pulled 53 suggested shows per video. T3he ratings include G, PG, TV-14, TV-MA. I chose not to pull from every rating (e.g. TV-G, TV-Y, etc.). “- Chase Willden

source: data.world Netflix Suggestion Engine

Upon delving into the data, we discovered that the dataset contained approximately a thousand shows. However, after analyzing it, we found more than half of the titles to be duplicates, leaving us with 495 unique titles to examine.

So let’s begin…

First, I sought to determine whether Netflix had a diverse library of content across a range of ratings. Within the dataset, I discovered that they have twelve categorical ratings, ranging from G to TV-MA.

The rating system from Netflix breaks down the audience into the following categories based on rating:

Little Kids G, TV-Y, TV-G
Older Kids PG, TV-Y7, TV-Y7-FV, TV-PG
Teens PG-13, TV-14
Mature R, NC-17, TV-MA

another source: Film Ratings

Second, I aimed to explore the correlation between Audience Age and individual user rating scores and how this relationship might affect the library as a whole.

I employed feature engineering by adding an audience column to our dataset, allowing me to link show ratings with specific age groups: Little Kids, Older Kids, Teens, and Mature.

Box plot as seen above is to emphasize the user rating score against our audience. Here we can check the user scores by the audience age.

One might question how a minor user’s score can be validated. Unfortunately, there isn’t enough data to support any hypothesis. However, since this dataset was created, Netflix has implemented a system that allows users to rate content with a “thumbs up” or “thumbs down.” This approach could enable minors to rate content more easily. As a parent, my theory is that the previous method was likely rated by an adult.

Source: Netflix on new Netflix Ratings & Recommendations

Lastly, I sought to identify Netflix’s popular shows by title to determine their relevance to the selection process concerning user rating scores.

I employed a word cloud to visualize the most popular titles within the dataset. As evident from this analysis, Netflix excels at meeting consumer demand by ensuring the appropriate titles are in their library.

The following titles are the most popular: Little, Life, Movie, Friends, American, Marvel, etc.

Working with this dataset, particularly with my past career in motion pictures, was fascinating. In just a few weeks, I acquired the skills to dissect, articulate, extract, and feature engineer this data set. From working on this project, it’s clear how studios select which projects to move forward with and why they adhere to these models to facilitate prediction and streamline the selection process. For a platform like Netflix, the vast array of content in their library allows audiences to enjoy content à la carte, keeping subscribers engaged and satisfied.

You can also read my blog post on ThisIsJorgeLima.com
Here is a link to my Code

Unveiling the Age-Old Question: How User Rating Scores Correlate with Audience Content Preferences

Written by Jorge A. Lima