Sample EDA Movie Project

Prompt: FakeCompany sees all the big companies creating original video content, and they want to get in on the fun. They have decided to create a new movie studio, but the problem is they don’t know anything about creating movies. They also want to capture a niche and create only sci-fi movies. Your team is charged with doing data analysis and creating a presentation that explores what type of sci-fi films are currently doing the best at the box office. You must then translate those findings into actionable insights that the CEO can use when deciding what type of films they should be creating.

Our Approach: We first looked for data on sci-fi movies that included indicators of success — i.e. revenue, popularity score and average voter score. FakeCompany wants to create a successful movie so we needed to be able to quantify which movies were successful. We also sought data with certain characteristics of each film — i.e sub-genre and rating. With those characteristics, we could further break down exactly what kind of sci-fi movie should be made for success. All data that was pulled was for movies released in the 2000s since the movie that FakeCompany is looking to make would align with the trends of ‘current’ movies.

Next, after compiling a data set incorporating these indicators and characteristics for each sci-fi film (288 films), we were able to perform our analysis. We asked ourselves questions and tested hypotheses to determine previous movies that were successful and look for trends in characteristics.

Data Sources: Box Office Mojo (US & Canada only), The Movie DB ,OMDB


The sci-fi sub-genre of supernatural performed the best out of all sci-fi movies in terms of popularity (how many times a movie was reviewed).

The revenue median for all films is highlighted in orange at $90M and the mean at $138M. There’s quite a gap between the two because the distribution is skewed a bit to the right due to all of the outlier films that made $600M or more. These are your big hit blockbusters like Avatar and Avengers:Endgame.

But you can see that the green line, the median revenue for Supernatural films is at $329M. The sub-genre alone is almost four times as much as the average revenue of all the sub-genres combined.

This graph summarizes the trends that we found throughout our data analysis. As you can see each sub-genre’s average revenue is compared — with supernatural ahead of the other sub-genres. We also compared the combination of each sub-genre and their various movie ratings to revenue, resulting in PG-13 (purple) with a higher count than the other ratings.

Further Analysis:

Marvel films were about 60% of the average total revenue for the supernatural sub-genre.
Knowing this, we would then extend our research by excluding all films associated with Marvel and see if our data changes dramatically.
We would also like to see a regression analysis between a population that watches a Marvel film solely because it’s Marvel and a population that watches a Marvel film because it is of the supernatural sub-genre.