Movie Night Coming Up? An Analysis of What Goes Into Bad IMDb Ratings

Tristan G. Paul
6 min readFeb 7, 2019

--

Personally, I like to throw MST nights with the most laughably bad movies I can find. The only problem is there’s a small canon of so-bad-it’s-good films, and everyone’s already seen ‘The Room’, ‘The Wicker Man’ and ‘The Island of Dr. Moreau’ plenty of times. So to find the theoretically perfect MST material I dug deep into the databases of IMDb.

IMDb isn’t the perfect dataset. It has a known demographic bias that overrates aggression and grit. In cleaning the humongous set with over 50 thousand feature films, I had to drop entire directors I love because they didn’t have more than 5 movies with more than 1,000 votes. I didn’t confirm but I suspect most foreign movies were dropped in the final set. IMDb makes pages for all title variations of a movie, and to preserve my sanity I could only keep the original titles (and the original title isn’t necessarily the most famous title). But the dataset’s massive scope and user submitted nature is still valuable for getting in the head of the general moviegoer.

I started this journey by asking what the worst genre, by average rating, could be. Then I sought the worst director by average rating of their filmography (tragically I couldn’t access the data to get this knowledge for actors as well), the worst runtimes, and the worst release decade. My work is available at Github. I present you the formula for the perfect bad movie…

The Highest And Lowest Rated Genres

This is no surprise to the average moviegoer. Both genres have issues with worldbuilding and effective special effects. Horror, in particular, loves its unnecessary sequels and monster crossover events.

This isn’t the complete picture yet. I’ve graphed the genre’s rating combined with all permutations (so ‘Comedy, Romance, Sci-Fi’ and ‘Action, Horror, Sci-Fi’ are being rated under the same umbrella). To get the very worst genre the groups need to be broken apart again:

The lowest rated genre on IMDb is pure sci-fi, just below a mediocre 4.0, the likes of which include ‘Robot Holocaust’ and ‘Terror from the Year 5000’. We can imagine early sci-fi was cheesy and early sci-fi was less likely to dip its toes into other genres. We can confirm this with a single line of code:

The Highest and Lowest Rated Directors

As I alluded to, I only included directors with five or more popular films so I wouldn’t get averages with overly small sample sizes. We’d get tons of one-hit wonders otherwise.

The lowest rated director on IMDb, Bert I. Gordon, is exactly the name we should hope to see billed on our sci-fi flick. Gordon is an infamously prolific b-movie sci-fi director, with a 60-year filmography filled with titles with such delightful names as ‘Attack of the Puppet People’ and ‘The Amazing Colossal Man’. Albert Pyun is also big in sci-fi but is known for grittier Eighties and Nineties titles. Bob Clark appears to be an alias for movies producers thought would bomb, because IMDb has no one page for him and the search results range from 1928 to 2017. Sam Firstenberg is the genius behind ‘Breakin’ 2: Electric Boogaloo’ and makes movies with posters like these:

Let’s say Mark Lester is too good for us and move on.

For the curious, the directors with the average highest rating are who you’d expect: Kubrick and the Japanese masters.

How Does Runtime Effect Rating?

A much harder question to answer than I thought it would be. Just the cutoff for my dataframe was difficult to decide; the max I went with 250 minutes, just after 4 hours, but I feel like it has sampling problems later on. I tried 200 minutes for a time but thought it was wrong to publish an analysis of long movies that excludes the Lord of the Rings movies (and this version had small samples regardless). There’s no right answers in data cleaning.

Regardless of whether the cutoff is 200 or 250, we see a trend where average rating goes from habitually dropping to under 6 to never dropping. After 200 minutes it never drops to 7.

The most interesting find is that poor ratings (0 to 3) sharply stop after 140 minutes and average ratings (4 to 5) stop shortly after 160. Everything after is GOOD. I suspect there is a sampling bias where the only movies longer than 3 hours that 1000 or more IMDb voters sit through are also the ones that are word-of-mouth legendary. It’s nonetheless impressive there were enough of them to show a trend.

I think our bad movie should be TV-movie length, personally (unfeatured because on IMDb they are their own medium). Given only 90 minutes, its worldbuilding and character arcs would have to be quite jerky.

Finally, what were the worst decades for film?

I don’t even try to answer the best. The crosstabulation frequencies were very strange; some decades had next to no releases (my cutoff was 100 or more), others had motherloads. I ultimately cut out all films before 1950 even though I calculated the Twenties as having the highest average rating. Film preservation was not the same as it was now so we can assume only the
decade’s best got saved. It’s at least clear what decade had the highest quantity of dookie:

Our time. If our perfect bad movie doesn’t exist we can make it. We’re in the only decade where 0 to 1 ratings exceed the frequency of any another, and it’s also the decade with the most total.

So the perfect bad movie is…

  • pure science fiction
  • directed by Bert I. Gordon or Sam Firstenburg
  • under 2 hours and 20 minutes
  • released between 2010 and now
  • probably involves Nic Cage

There you have it! Happy trails on your Netflix journey and data science journey (so if you have an itching question, you can answer it too with some Python and a spreadsheet).

--

--

Tristan G. Paul

I’m student data analyst located in the northern Bay Area. All of my articles are also posted to my portfolio site at tristan-paul.github.io.