Bimodal Distribution and Bhayankara Fans
It was a fine Friday evening. I was looking for some movie options to watch that night. I started with some random search on IMDB. I am that kind of a person who if had 3 hours of spare time to watch a movie, will spend 2 hrs 47 mins in choosing which one to watch in spite of having a watchlist sorted by interest already. So the search started with some Quentin Tarantino went on to some Alfred Hitchcock movies and finally somehow I ended up in a film directed by AR. Murugadoss. (I shouldn’t have clicked on this movie)
The movie was Sarkar and it had an IMDB rating of 7.2. On a normal day, I wouldn’t bother how they come up with that number. But that particular day, as I had some 1 hr 13 mins left of that 2 hrs 47 mins, I was curious to learn how IMDB calculates the ratings for a movie. Compared to any other reviews or rating system, IMDB had always been a go-to rating and I should say that the rating has been fair most of the times, if not always. This is what I found in their site at first glance on how they calculate the movie ratings.
“IMDb registered users can cast a vote (from 1 to 10) on every released title in the database. Individual votes are then aggregated and summarized as a single IMDb rating, visible on the title’s main page.” Other than the single visible rating, you can also see the distribution of votes received by each title. Here is the distribution of votes received for the Sarkar movie.
Ah! The famous Bimodal distribution. Any distribution which has more than one peak is commonly known as Bimodal distribution. Here there are two peaks visible, one at 10 and one at 1. By the belief of Almighty and being a believer in the quote “Truth always Triumphs” we all know that Sarkar is a movie that deserves neither 10 rating nor a 1 rating unless you belong to the close circles of Sundar Pichai. Then why this happens? This happens mainly because of bhayankara fans of the hero who wanted to pull up the rating of the movie to a higher number. The Peak at 1 is because of the fans of the hero from the opposite clan who wants to pull the rating down. As you see here the peak at 10 is much higher than that at 1. But there are a lot of cases where they both are very close to each other. This sort of behaviour is even more clearly visible in movies that did not go well as both the clans have a level playing field.
Yes, those movies were below average if not average. But pulling the rating down with so many votes with 1 rating is mean.
This is not the case only for a bunch of movies, but almost all the movies of this hero and hero from the opposite clan. This is a scenario that is prevalent across industries. Wherever you see two huge clans of fan following for two different actors you can see this type of distribution. Here is the cumulative vote share distribution of movies of four top heroes across two industries.
The bimodal distribution is not clearly visible because this data is an aggregated distribution across all years since the start of their career. The fans of these actors will go to an extent of giving a rating of 10 to all the movies this hero has featured in, but the fans from the opposite clan are not motivated enough to delve into the history and give it a rating of 1. Hence the distribution is not clearly visible considering the whole history.
This internet fan wars, the launch of Jio, internet data becoming cheaper than some of the cheapest people I have known, everything happened in the last five years. Considering these are big names in the industry and they rarely do more than one movie per year let’s redo the same distribution for the last 5 movies of each of these actors.
Here the twin peak is well Visible. It is surprising to know that more than 50 per cent of the people who have cast vote for the last 5 Vijay movies have given a 10 rating! Yes, Mersal definitely deserves a 10 for its original story and screenplay.
The extent to which Bayankara fans go amuses me. Here is the vote distribution of the movie called Vetri, which is listed under Vijay’s filmography on IMDB as his first.
Of course, Vijay was part of this movie but as a child artist. You might argue that this is not Vijay fans but Vijayakanth’s (lead actor of the movie) fans who are doing this. But Sadly Vijayakath’s first movie listed under his filmography has only 1 vote which is rated 10.
These sort of distribution affects the mean badly. The mean of this distribution will be skewed towards the higher end and will always have a good rating. This is called a vote stuffing and IMDB handles it smartly. The number you see across each title isn’t the mean but a weighted average score of the ratings. Here is a snip from the IMDb's FAQ page.
Good. It’s because of the weighted average computation, Sarkar got a rating of 7.2 or else it would have got 7.9 to go by the mean. IMDB’s computation does a fair job in minimizing the vote stuffing, but that makes it difficult to compare between movies. For Eg Let’s take the top 5 works of Vijay till date according to IMDB’s rating and total votes.
Mersal is Vijay’s best movie to date? IMDB are you serious? Why because it had scenes from Aboorva Sagotharargal and Ramanaa which both have an IMDB rating of 8.0 or more? While some of you might agree to this order, I don’t. I believe Vijay has better movies than Mersal under his belt. While IMDB does a good job of weeding out vote stuffing in some cases it fails because it miscategorises some user rating as genuine and gives it more weight. One hypothesis that I could think of is, as Mersal had some scenes and dialogues opposing the government and demonetization some users who are classified by IMDB as genuine would have given it a 10 and tried to push the rating of the movie up. But let’s not get into hypothesis proving and disproving at least for now.
So, if a user who is not a Vijay fan and has seen none of his movies, wanted to see one, logs into IMDB and picks the one at the top and watches it, he might possibly decide not to watch the ones below it and will miss the better ones Thuppaki and Kathi. Let’s think of a way to give him a suggestion list that is in no way biased by these Bhayankara fans.
Let’s resort to the power of middle 80 as that is where most of the unbiased opinion lies. Even some of the schools publish middle 80 averages to showcase how its students really perform. The middle 80 are the ones who decide who wins the election and the middle 80 are the ones who have the power to overthrow governments. Let’s use the power of middle 80 here to help us to curate a list of top Vijay movies which is not biased by Bhayankara fans.
This is how we will do it. For every movie, we will remove the votes which have a rating of 1 and 10. We will compute the mean score on the remaining 80%.
For eg. Let’s take the movie Thuppaki. This movie has a weighted average IMDB score of 7.9 while writing this blog. Here is the distribution of votes.
The arithmetic mean of these votes will give a rating of 8.2. Let’s compute the arithmetic mean of the middle 80% of the votes. The arithmetic mean of middle 80% of the rating is 7.51. But, here is the interesting part. This rating is not on the scale of 1 to 10, but on the scale of 2 to 9. How will you consume a rating that is 7.51 out of 9 on a scale that does not start from 1? This score in more difficult to consume than some of the worst movies on the list.
So lets us standardize this. But how? You cannot simply multiply the score 10 and divide by 9 or 8 as the scale while moving from 2–9 to 1–10 does not grow only in one direction but both. Let’s get the help of percentiles. Let’s create a list of numbers starting from 2 and ending with 9 with 2 decimal places. Let us see on which percentile, the number 7.51 falls into and then we will use that percentile score to pick a number from a list of numbers starting from 1 and ending with 10 with two decimal places. This can be done even with numbers with one decimal places. I wanted to increase the precision so unlike the list now, the probability of more than one movie getting the same score will be pretty less.
So the percentile value of 7.51 on the scale of 2–9 is 78.80
The 78.80 percentile number in the list with values starting from 1 to 10 is 8.09. So the DC Score for the movie Thuppaki is 8.09. Let’s compute this score for all the movies in our list and pick the top 5. Let’s see how it fares with the original IMDB list.
This list is pretty different from the IMDB one as this has very less contribution from Bhayankarans. So if that guy who wants to watch his first Vijay movie asks for a recommendation, it is safe to recommend Poove Unakkaga over Mersal.
But there isn’t much debate between IMDB and Middle 80 score when it comes to bottom 5 movies. Except for the fact that IMDB score looks little ruthless.
It will be cruel to have the blog titled after Bhayankara fans and have nothing for them. Here is the same snip from IMDB’s website again.
Even though they do not disclose the exact method to weight user votes we can take an educated guess. I guess it keeps track of how much a person vote falls away from the current rating and how active is he on IMDB and how many titles he has a vote and the quality of his reviews and all that. So from next time if ever wanted to bring your favorite Hero’s movie on the top of the IMDB chart, become an active IMDB user, start picking up random titles and give it a score closer to the current rating of the movie, write reviews and then give 10 rating your favorite hero’s movie which they will find it difficult to classify as vote stuffing activity and it will help your vote to gain more weight. Oh, well there is, of course, another way to fetch a good rating for a movie. But that would require help from your hero, which is to genuinely do good films.