Similarity: Genres, Ratings & Titles

Published in

INST414: Data Science Techniques

5 min readApr 14, 2024

INTRODUCTION

The question I aim to answer in this module is “How can I find movies or TV shows based on a specific genre, user rating, and a specific title?” Stakeholders in this scenario will be people like content curators, marketing analysts, data scientists/analysts, and business executives in streaming companies like Netflix, Hulu, Disney Plus, etc. Answering this question through similarity measurement can inform decisions on personalized content recommendations for users based on their preferences. It helps optimize user engagement, retention, and satisfaction by delivering relevant and appealing content tailored to individual tastes, ultimately contributing to improved platform performance, user experience, and subscription rates.

The data crucial for answering the question of finding movies or TV shows based on specific criteria like genre, user rating, and relevance would ideally include fields such as title, genre, user rating, relevance, and other attributes like release year, director, cast, duration, and keywords/tags. Such a comprehensive dataset enables stakeholders like content curators, marketing analysts, data scientists/analysts, and business executives in streaming companies like Netflix, Hulu, and Disney Plus to perform similarity analysis. By leveraging these fields, stakeholders can make informed decisions on personalized content recommendations tailored to individual tastes, optimizing user engagement, retention, and satisfaction, ultimately contributing to improved platform performance and subscription rates.

METHODS

Data Collection

To collect the necessary data for my analysis, I utilized the Watchmode API and implemented JavaScript to interact with it. First, I made HTTP requests to specific API endpoints using the fetch function. For instance, I fetched drama titles with a designated genre ID to acquire relevant data for similarity analysis. Similarly, I retrieved details for individual titles and their sources to evaluate similarities with a target title such as “Game of Thrones.” These API responses were processed as JSON data, allowing me to extract and utilize key information like title IDs, genres, and user ratings. To present this data effectively, I created an HTML page and used tables to organize and display the fetched data from JavaScript. This approach streamlined the analysis process and made visualizing and interpreting the results easier.

Data Analysis

For the first query, I employed the Jaccard similarity metric to evaluate genre similarities. The chosen genre for the analysis was drama. By establishing clear criteria, I assigned Jaccard scores ranging from 0 to 1 based on the presence of the specified genre (in this case, Drama) in a title’s genre composition. For instance, if a title like “X” encompassed genres such as Drama, Adventure, and Fantasy, it would receive a Jaccard score of 0.33. If it contained Drama and Adventure genres only, the score would be 0.66. However, if the title belonged solely to the Drama genre, it would achieve a perfect Jaccard score of 1. This approach effectively-identified titles with comparable genre compositions, aiding in content recommendation and analysis. Below is a screenshot of it:

First query: Ten Titles Similar In Genre (i.e Drama)

The analysis of these ten drama titles reveals varying degrees of genre similarity based on the Jaccard similarity metric. Titles like “Parasyte: The Grey,” “Crooks,” and “The Brothers Sun” stand out with higher scores, indicating stronger genre overlap, while others show lower levels of similarity.

In the second query, I opted for a straightforward approach by directly fetching and displaying user ratings for each title without applying a similarity metric. This decision aimed to provide stakeholders with unaltered, raw rating data for a more accurate representation of user satisfaction. Analyzing these ten titles and their user ratings yields valuable insights into audience preferences within the dataset. Stakeholders can discern patterns and trends, such as identifying titles with consistently high ratings indicative of a strong fan base or exceptionally well-received content. Conversely, variations in ratings across titles may reflect diverse audience tastes. This comprehensive analysis empowers stakeholders to make data-driven decisions regarding content recommendations, marketing strategies, and platform optimization, ultimately enhancing user engagement and satisfaction. Below is a screenshot of it:

This user rating analysis displays a range of ratings for the ten titles, indicating varying levels of audience satisfaction. “RIPLEY” and “The Brothers Sun” stand out with high ratings, suggesting a strong positive reception. Meanwhile, “The Tearsmith” has a lower rating, possibly indicating less favorable feedback from viewers.

For the third query focused on similarity to “Game of Thrones,” I employed the Jaccard similarity metric once more. Given that “Game of Thrones” encompasses three genres; Action, Drama, and Fantasy. I assigned Jaccard scores accordingly to measure similarity. If a title shared one of these three genres, it received a score of 0.33. Titles with two genres in common were rated at 0.66, while those matching all three genres received a perfect score of 1. This scoring system allowed me to pinpoint titles most similar to “Game of Thrones” based on their genre compositions. As the Jaccard score decreased, the similarity to “Game of Thrones” also decreased, providing a clear measure of similarity.

Third Query: Ten Similar Titles to Game of Thrones

The analysis shows titles with varying levels of similarity to “Game of Thrones” based on genre composition. “American Gods” has the highest similarity score of 1.00, indicating it shares all three genres with “Game of Thrones.” Other titles like “Banshee” and “Narcos” have moderate similarity, while the rest show lower similarity scores.

Analyzing genre similarity, user ratings, and “Game of Thrones” titles using Jaccard similarity provided valuable insights for content recommendations. By measuring genre overlap, I identified titles with thematic similarities, optimizing personalized recommendations. Direct retrieval of user ratings highlighted popular and well-rated content, enhancing user satisfaction. The Jaccard method’s application to “Game of Thrones” titles further refined recommendations for fans of the series. This comprehensive analysis enables stakeholders to tailor content strategies, driving user engagement and platform performance. It addresses the initial question by leveraging data to inform targeted and impactful content recommendations, benefiting user experience and platform success.

Limitations & Bugs

I primarily focused on data validation and error handling to ensure data quality, addressing issues like inconsistent formats, missing values, and API connectivity problems. Implementing try-catch blocks, parsing data consistently, and checking for null values were crucial steps. Monitoring API usage and employing pagination for large datasets maintained data integrity. However, the analysis faced limitations due to the API’s request restrictions, leading to simplified queries and potential biases towards commonly accessed genres and titles. The segmented API structure also posed challenges, potentially resulting in missing or incomplete data and affecting the overall representativeness and generalizability of the analysis.

Github repository link: https://github.com/elmantador45/Module-Three.git

Similarity: Genres, Ratings & Titles

INTRODUCTION

METHODS

Limitations & Bugs

Written by Emmanuel Akpalu