Predicting the Popularity of a TV Show based on Neighboring TV Shows
TV producers have a sense of when a show brings in high revenue and ratings on a satellite network or streaming service through watching a live taping and a recorded episode/season. This brings the question TV Producers are wondering about how TV shows such as, The Walking Dead, quickly become popular.
TV Producers wish to know how to predict the popularity of a TV Show based on other shows that are airing around a similar year. This could benefit the producers by preparing a promotional campaign for an uprising TV show that they believe, based on a positive prediction that the TV show, will be successful in creating profit. For example, Paramount Pictures as the producers behind the new six-episode series, Knuckles based off the Sonic the Hedgehog movies, is predicted to become popular due to the anticipation for the upcoming Sonic the Hedgehog 3 movie releasing in December 2024. Paramount Pictures can pay personnel to create Knuckles merchandise, especially to appeal the young audience, and play commercials to attract a bigger audience.
The following dataset was collected from Kaggle, and it contains a sample of movies and TV shows that have labels such as watch time (in minutes), genres and ratings. These labels are important because they determine the rankings that are displayed in the dataset, and make the process of predicting a popular TV show easier.
This code uses a regression model as the purpose of this post is to predict the numeric popularity of a TV show. By using features such as the genre, rating, and release year, the supervised model should be able to predict the popularity of a TV show, regardless of its rank. The following two charts display popular TV shows using measures of the number of viewers and how long viewers watch a show (in minutes).
The graph by Statista provides their prediction for whether users will watch TV shows from cable networks or streaming services for 2025, and this ties with the popularity of TV shows due to some shows, such as Squid Game and Remastered Dora the Explorer, being exclusive to streaming services.
There are bugs within the values of the watch time and watch time (in millions) columns as the model cannot calculate for similarity with a very large integer value, but those were fixed with importing a scaler method.
The main features that are missing in the collected data are the actors and a movie/TV show ID to distinguish each show or movie in the dataset for the model. Also, there is bias with the rankings of the movies and TV shows already provided within the dataset, so some can say there is no reason to predict the popularity of a TV showing using the Knn neighbors method.