Netflix’s UI can be quite opaque — one’s not quite sure what‘s worth watching as the results are personalised, and there’s so much high-quality and award-winning content that Netflix algorithms might have deemed unworthy of your attention.
Putting my newfound data-wrangling capabilities to use — and this Kaggle dataset of Netflix’s catalogue, all I had to do was mash it up with a dataset of top TV shows and movies from RatinGraph (which derives its data from IMDB, from the looks of it). Google Spreadsheet’s filter method should have sufficed, but it proved temperamental and wouldn’t work on the TV shows dataset. So I fired up OpenRefine, which allowed me to merge these two datasets with ease.
I added two filters — rating, and number of votes to prioritise signal over noise. Movies were filtered at a rating of 8.0+, and TV shows at 7.5+. TV shows are English only, and limited to 250, which further got filtered down to 82 — which is to say that almost one in every three Top 250 show is on Netflix!
Netflix’s movie catalogue had only 85 of the 685 movies with over 10,000 votes and a 8.0+ rating. Non-English movies are also included in this chart.
Some caveats: The Dark Knight by Christopher Nolan is on Netflix, but wasn’t there in the Kaggle dataset. There might be some other obvious misses, but the dataviz should be helpful in any case.
Filtering for what’s new, the movie recommendations come down to a handful.
H/T to Shivam Bansal for creating this dataset. It would be great to see how many movies and TV shows are on Disney+, Amazon Prime, HBO, etc, and do a grand analysis of who owns the streaming rights to so much of popular culture. Any ideas on where I can find/build a dataset of their catalog?