Examining 2023 Box Office Trends Based On Movie Genres

Elliott Bauer
INST414: Data Science Techniques
5 min readApr 23, 2024

For this module 4 assignment, I decided to conduct analysis on box office statistics for 2023 movie releases. The main question that I think can be answered from my data would be along the lines of “what types of movies are typically the most successful in theaters?” This would be valuable insight for many parties. For one, people within the film industry could take note of this, as it could help them decide which types of movie roles they would want to pursue. Additionally, it could help an average movie-goer decide what film they should see, as some genres have had much more success than others. Pretty much all people who are involved in the film industry to some extent could benefit from this research question. To answer this question, I obtained a data set off of Kaggle. It contained the top 200 movies of 2023 in terms of box office success. I decided to only focus on the top 100 for purposes of looking solely at the most accomplished movies. The names of the columns in the data set are as follows:

  • ‘Rank’: The film’s ranking within the top 100
  • ‘Name’: The name of the film
  • ‘Theaters’: The number of theaters that the film was distributed in
  • ‘Total Gross’: The revenue that the film accrued in the box office
  • ‘Release Date’: The theatrical release date of the film, in date time format
  • ‘Distributor’: The producing company of the film
  • ‘Genres’: A list of genres for the film

In order to conduct further analysis on this data, I needed to add on a ‘Genres’ column to the CSV, as this was crucial for me to be able to draw conclusions. My original plan was to find another data set that contained more general information on 2023 releases like genre, description, run time, etc., but I struggled to find anything that aligned with my box office data set. Because of this, I ended up manually editing the CSV file with the genres that were listed on the popular movie logging site, Letterboxd. In terms of measuring similarity, I did this by seeing which genres were paired with one another most frequently. Below is an image of a dendrogram that I created that displays movies and their similarity to one another. In other words, it shows which genres are most consistently paired with one another based off of the top 100 2023 box office trends.

As you can see, the above dendrogram shows movie combinations in a hierarchal structure. The orange and green branches are movies that are typically grouped with one another. Action, comedy, thriller, adventure, and drama appear to be paired together the most based on what I saw through my Pandas tables. I used thresholding to set my ‘k’ value — the number of values representing the amount of clusters being made — to 17. The elements of my clusters represent each unique genre that was in my data set of films. Below, I have attached an image of a bar chart showing the count of each genre within the top 100.

Based off this, it is fair to assume that comedy, action, drama, adventure, and thriller films are the most successful in the box office. This helps answer the question above, as it could certainly help a customer decide which movie they want to go view. With this in consideration, it is also important to keep in mind that there is a higher volume of films produced with these genres applied to them. The amount of war films versus the amount of comedies that are made on a yearly basis is vastly different. I would encourage people to branch out of their comfort zone and see movies that may not be their typical go-to’s.

For some frame of reference, here is an example of a few films in each of my clusters:

  • ‘History’: Oppenheimer, Big George Foreman
  • ‘Adventure’: Barbie, The Little Mermaid
  • ‘Action’: John Wick: Chapter 4; Indiana Jones and the Dial of Destiny
  • ‘Animation’: The Super Mario Bros. Movie, Spider-Man: Across the Spider-Verse
  • ‘Mystery’: Insidious: The Red Door, Knock at the Cabin
  • ‘Drama’: Asteroid City, Air
  • ‘Documentary’: Titanic: 25 Year Anniversary, BTS: Yet to Come in Cinemas
  • ‘Horror’: Scream VI, Evil Dead Rise
  • ‘Sci-Fi’: Guardians of the Galaxy Vol. 3, Ant-Man and the Wasp: Quantumania
  • ‘Music’: Chevalier, Whitney Houston: I Wanna Dance with Somebody
  • ‘Comedy’: Dungeons & Dragons: Honor Among Thieves, Cocaine Bear
  • ‘Fantasy’: Haunted Mansion, Shazam! Fury of the Gods
  • ‘Crime’: Fast X, A Thousand and One
  • ‘War’: The Covenant, Sisu
  • ‘Romance’: No Hard Feelings, Past Lives
  • ‘Family’: Elemental, The Amazing Maurice
  • ‘Thriller’: Mission Impossible — Dead Reckoning Part One, Talk to Me

It is definitely important to consider some of the limitations when looking at this analysis. For one, I only included the genres that were listed on Letterboxd, and they might not list films the same ways as other movie platforms. For example, there were a few sports movies that were popular this year, like Ben Affleck’s Air, Taika Waititi’s Next Goal Wins, and Sean Durkin’s The Iron Claw. None of these were labeled as a “Sport” film, as the platform does not have this filter on their site. Other popular genres that they do not include are Coming of Age, Biopic, and Satire. This lack of information is certainly a limitation, as it does not convey how successful or prominent these types of films are at the box office. Another limitation of this data is that a few of the rows in this data set are re-releases. For example, 2022’s Everything Everywhere, All at Once and 1997’s Titanic were included in here. They did great in theaters, as they were critically acclaimed films that many people loved, and desired to see on the big screen. However, they are not 2023 releases, so this slightly effects the accuracy of the data considering they are not as recent.

Below, I have attached a link to my GitHub repository:

https://github.com/elliottbauer99/INST414/blob/main/Module%204%20Assignment.ipynb

--

--