Predicting Movie Ratings — Based on Death

Anika Nacey
4 min readAug 30, 2019

--

So, you’re an aspiring movie-maker. There’s a lot of things to consider: what genre of movie are you interested in making? How long does it need to be to tell its story? And, of course, how many extras can you violently kill off, or, even better, how many tear-jerking main character kills can you get away with, before it starts to hurt your ratings?

Well, I’m here to answer that all-important question.

PC: Felix Mooneeram

First, a few notes: this is based on a dataset which includes data from 545 movies made between 1949 and 2013. All of these movies had at least one on-screen death, and the “rating“ which will be referenced is their average IMDb rating. (The MPAA rating is also included in this dataset, but isn’t as exciting or, surprisingly, as predictable.)

The features I was working with included the following: the body count for the movie, the year it was released, the title of the film, its genre, its length in minutes, and, as mentioned, both the MPAA rating and the IMDb rating.

My goal was to predict the IMDb rating, so the first thing I did was run the data through a Random Forest Regression and a pipeline to determine which of these features would be most important in predicting that rating. The movie’s length was the most important factor, but coming in second (by a margin of only about 10%) were character deaths. (MPAA rating was the least significant to the model — I guess audiences don’t really care to differentiate between R or PG-13). With all features combined, I was able to get to a XGBoost validation accuracy of .161 — not too bad, if I do say so myself!

From there, I wanted to examine a little more precisely the relationship between body counts and the IMDB rating. I made a scatter plot of the information, which resulted in the following:

IMDb rating on the y axis, deaths on the x axis

There does seem to be a general upward trend in rating as the body count increases (which, I’ll admit, I was surprised to see.) There were only 8 movies in the data — about 1% — which killed more than 400 people, but all 8 of those movies scored higher than a 7 on IMDb, so maybe they knew what they were doing.

Still, I was curious about the massive blob of data containing all the movies which had between 1 and 200 deaths — would it illustrate the same generally-upward trend? The answer is “kind of.” See the zoomed-in version of the plot below:

So, yes, there is a generally upward trend, but a couple odd movies out didn’t benefit as much as the rest from killing off its characters, including the lowest-rated movie in this whole dataset. If we disregard the outliers, the ratings all hang out around the average of 6.8, and get closer to averaging 7 or 8 as more characters die.

In short, if you’re a movie-maker, do what’s best for the story you’re telling. But if you start to kill characters off? Go big or go home.

A few fun facts:

The first movie featured in this dataset is a film-noir called “The Third Man,” released in 1949, in which 4 people die.

The longest movie in this dataset is “Lawrence of Arabia,” which is a whopping 3 hours and 36 minutes long. For that incredible runtime, only 216 people died.

The lowest-rated movie in this dataset the 2003 Action movie “House of the Dead,” with a 2 out of 10, and the highest-rated is “The Shawshank Redemption,” at 9.3 out of 10.

The average body count is 72.1, and the median is 44.

Finally, the stat you’ve all been waiting for: the movie with the highest body count (remember, the data is capped in 2013) is “Lord of the Rings: Return of the King,” with a grand total of 836 kills. That’s an average of about 4 kills per minute.

--

--