Published in


Predicting the “Best Animated Film” nominees for 2017

For animated movies, the announcement of the Oscar nominations is a much more exciting announcement than the announcement of the winners. That’s because animation occupies a odd space in Hollywood. Animated films are massive money earners, but still aren’t taken seriously as excellent films (just look at how many animated films make critics “best of the year” lists every year.) Additionally, traditionally the nominees for best animated feature have been pretty different from the film that have actually won. Generally, lots of lesser known independent films are nominated and then a big-budget, brightly colored CGI film ends up winning. My guess is that most academy voters ignore the nominating vote and cast their first vote once nominees are announced. Many voters haven’t even watched all of the nominees or are spectacularly ignorant and dismissive of animated movies.

The Red Turtle is the most critically acclaimed movie submitted this year

Because the winners of best animated picture are often whatever nominee did best at the box office, regardless of quality, actually winning the award doesn’t mean as much as one might think. However, getting nominated is a big deal, especially for small or independent animators. Last year’s nomination of “Boy and the World” brought international attention to Alê Abreu and likely contributed to its development into a TV series. It’s hard to assess how significant the nomination was for Abreu because “Boy and the World” just came out last year. For older movies from small studios, a surprise Oscar nomination has proved invaluable. When “The Secret of Kells” was nominated in 2009 it catapulted the fledgling studio “Cartoon Saloon” to international recognition and significant growth.

I compiled a dataset of all of the movies that have been submitted for the best animated feature since the category was created in 2001. This dataset contains the following information for each movie: Critic’s rating (from Metacritic), audience rating (from IMDB), US box office success, US distributor, animation studio, country of origin, animation style (for example stop-motion animated), whether is was nominated for an Annie award, and whether it won an Annie award. Using this data set I designed some models to predict what films will be nominated this year. First I decided to explore the dataset to see if there were any other notable patterns.

Are the Oscars biased towards Disney/Pixar?

Disney/Pixar have won 10 out of the 15 years the category has existed, including 8 of the previous 9 years (and the one time they didn’t win was because their submitted films were Cars 2 and Winnie the Pooh). This has prompted allegations that the category is biased towards Disney.

To get an idea of the dataset I ran an “extra-trees” classifier on the data. What is an extra-trees classifier? First, let’s look at decision tree classifiers. A decision tree classifier looks at your dataset and generates the optimal flowchart to explain how to arrange data. Here’s what the decision tree classifier looks like on the dataset where I examine all features except for the Annie nominations/winner and use them to predict whether a film will be nominated:

Output of DecisionTreeClassifier on the dataset

This might look like gibberish because I have feature numbers instead of names in this diagram, but it’s basically saying that the first split is between films with a Metacritic score above 71.5 and the films below. For the films above 71.5, the decision tree next checks if they made over 254,324.50 at the domestic box office. If a movie got below 71.5 on Metacritic, the tree checks if IMDB audiences rated the film above a 6.55. So loosely, we can say that for critically successful films, they’re in good shape for a nomination if they did well at the box office too. For films critics didn’t like, box office success matters less — their best chance is if audiences really liked them.

This is a really really loose analysis though. You might notice that this tree is ridiculously overcomplicated. It’s predicting every single film and a lot of those films might just be outliers. Plus, it’s pretty hard to tell how important Disney/Pixar is in this — we know it’s probably less important than critics rating, audience rating, or box office, but we could have guessed that already. Plus, it’s hard to tell how much less important it is.

This is where an extra-trees classifier comes in. This runs numerous decision tree classifiers on samples of the dataset, then aggregates them together to assess which features were most important most often. It then assigns a GINI importance score to each feature. Here’s the top 10 most predictive features for our dataset:

1. Metacritic (0.197314064044)
2. IMDB Audience Score (0.17475990759)
3. Box Office (0.130903039694)
4. style_Stop-Motion (0.0432636770802)
5. distributor_Disney (0.0366881233503)
6. studio_Pixar (0.0291710802773)
7. distributor_Dreamworks (0.0243805161646)
8. studio_Laika (0.0194433541049)
9. studio_Ghibli (0.0173533542961)
10. studio_Disney (0.0171622616021)

From this, we can tell that the Disney/Pixar effect appears to be real, but pretty small. It’s interesting that stop-motion films seems to do better at getting Oscar nominations than one would expect all other factors being equal. I’d hypothesize that’s because stop motion films have historically done worse at the box office than CGI films, but it’s hard to say. Regardless, any Disney/Pixar effect is minor when it comes to getting nominations.

Let’s look at a different question then — what about for actually winning? Once the nominees are announced, does being a Disney/Pixar film increase the odds of winning, outside of other factors? I took a subset of the data (only looking at nominees) to determine which factors were most predictive of which nominees won.

1. Metacritic (0.25727099084)
2. Box Office (0.209933134487)
3. IMDB Audience Score (0.171551923213)
4. studio_Pixar (0.104374704601)
5. distributor_Disney (0.0639492037188)
6. studio_Animal Logic (0.0224925816021)
7. distributor_Dreamworks (0.0191533480009)
8. style_CGI (0.0183743488223)
9. studio_Disney (0.0121086127242)
10. style_Traditional (0.0111092145826)

Pixar films and films distributed by Disney do unusually well. That’s a little misleading, as Pixar films tend to be distributed by Disney, so there’s definitely overlap between the features here. But still, it seems like these films are doing surprisingly well, all things being even. That said, the fact that Disney/Pixar have won 2/3 of the time and other factors are still more important show that Disney and Pixar’s success is not unwarranted. Their films win because they are critically acclaimed, loved by audiences, and box office smashes. There might be other stuff happening too, but our sample size is really small. Only 59 films have ever been nominated. Unfortunately, I expect that the sample size is too small to draw any strong conclusions from the data.

Finding Dory is the most commercially successful movie submitted this year

Who will be nominated this year?

Ok, ok, this is the question that you actually care about. This is a bit tricky because there’s less data on this year’s movies. For example, consider the film “Your Name”. It had a limited release in the Los Angeles area so it would qualify for the oscars prior to it’s main release in 2017. That means there isn’t really any box office data. Additionally, its audience rating is incredibly high — the highest of any movie in the entire dataset. Maybe the movie really is that incredible. More likely however, because the movie has been so hard to see the only people who have seen the movie so far are those who rabidly sought it out. They unsurprisingly loved it. So both the box office data and the audience review data is pretty wacky for any movie that did a limited release prior to a wider theatrical release in 2017. I’m planning on posting all of my code on Github once it looks less ugly if anyone’s really interested in looking through the feature engineering.

I tested a few models and a random forest classifier seemed to work pretty well. I’m using this also because it’s similar to the decision tree and extra-trees classifier. A random forest model takes a bunch of subsets of the data and builds decision trees out of them, then aggregates all of the mini-decision trees. This prevents the over-specific monstrosity I showed earlier. And the top 10 most likely movies are:

1. Kubo and the Two Strings
2. Zootopia
3. Finding Dory
4. My Life as a Zucchini
5. Moana
6. The Red Turtle
7. Your Name
8. Miss Hokusai
9. Kung Fu Panda 3
10. April and the Extraordinary World
Your Name is the most highly rated movie by audiences submitted this year

Note that this is just the films that are most likely to be nominated — predicting the winners would require a different model. My guess is that the composition of the nominees could dramatically affect the odds of a winner in ways that it’s hard for the model to predict. For example, if the nominees were:

Finding Dory
Kubo and the Two Strings
Kung Fu Panda 3

I think Kubo would have a solid chance of winning. It would be the only non-CGI film, the most critically acclaimed, and (in my opinion) the most daring. The other four films might attract similar voters, thus splitting their potential votes four ways. On the other hand if the nominees were:

Kubo and the Two Strings
My Life as a Zucchini
The Red Turtle
Your Name

Then I would expect Zootopia to win easily. Zootopia is the only CGI film in the group and by far the most commercially successful. Lots of voters would never have even heard of many of the films on the list and would vote for Zootopia by default (I’m not criticizing the film or saying a win would be undeserved by the way — Zootopia is excellent). Outside of Kubo and Zootopia, it’s difficult to see a path for other films to win, unless one of those two films gets snubbed. Another possibility would be if there are only two big-budget CGI films nominated and Zootopia lost out. My hunch is that this would be most plausible if Moana or Finding Dory beat out Zootopia and no other CGI films get nominated (which the data shows isn’t that implausible).

This might all sound like idle speculation with no data to back it up. This sort of analysis is useful for forming hypotheses that you then verify with data. I’m interested in working to see if nominee composition has any effect on eventual winner, although I suspect there just isn’t enough data to work with to make strong claims.

Still, my fantasy scenario is that someone starts a write-in campaign for Isao Takahata’s 1991 masterpiece “Only Yesterday” that was finally released in the US this year(!) and Takahata finally gets the Oscars he deserves. Even if that doesn’t happen, there were plenty of really excellent animated movies released this year, so I’m fairly confident the nominees and winners will deserve their recognition.

Kubo and the Two Strings is the film I predict is most likely to be nominated (although probably not to win)



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store