IMDb Movies, Ratings, and Votes Analyzed.

Puja P. Pathak
The Startup
Published in
6 min readFeb 12, 2021

As per Wikipedia, IMDb (an acronym for Internet Movie Database) is an online database of information related to films, television programs, home videos, video games, and streaming content online — including cast, production crew and personal biographies, plot summaries, trivia, ratings, and fan and critical reviews.

We have the data for the 100 top-rated movies from the past decade along with various pieces of information about the movie, its actors, and the voters who have rated these movies online. We will try to find some interesting insights into these movies and their voters, using Python.

We load the dataset with name movies. The dataset has 100 rows (for 100 movies) and 62 attributes(columns) for our analysis.

Let’s talk Profit !!

We have 2 columns Gross that indicates how much the movie made in dollars and Budget that indicates how much money in dollars was spent in making the movie. We can get Profit by subtracting budget figure from Gross.

Sorting the dataset by profit column and extracting first 10 rows, we get top 10 profit making movies as follows :

Top 10 Profit making movies.

There are some movies that at popular worldwide. Let’s try to find those movies based on the IMDb rating and Metacritic rating.

Universally popular movies.

Most popular Trios

We have facebook likes of all the actors of the movies. There are 3 actors names displayed for every movie. Based on their facebook likes we find top 5 popular trios.

Top 5 actor trios based on their FB likes.

Runtime Analysis

We use histogram to find the distribution of “Runtime” attribute.

Runtime for movies.

We can see that most of the movies appear to be sharply 2 hour-long.

Under 18 years audience watching R-rated movies?

Although R rated movies are restricted movies for the under 18 age group, there are vote counts from that age group. Among all the R rated movies that have been voted by the under-18 age group we find top 10 movies.

Top 10 R Rated movies watched by under 18 years audience.

Demographic Analysis

Every movie has a main genre followed by two sub genres. We group the dataset according to genres and perform demographic analysis. After grouping the movies as per genres, we extract top 10 genres and create a new dataframe names genre_top10.

Top 10 genres.

Now let’s derive some insights from this dataframe.

Visualizations

  1. Let’s see which genre is popular among the audience. For this we plot a countplot of various genres.
Count-plot showing various genres.

2. Gender vs Genre : Let’s plot count of votes for various genres among males and females of all age groups and see if we can derive any insights.

Count of votes as per gender and age.

A few inferences that can be seen from the heatmap above is that males have voted more than females, and Sci-Fi appears to be most popular among the 18–29 age group irrespective of their gender.

Sci-Fi, Adventure, Animation appear to be top favorite genres among females while Sci-Fi, Action and Thriller appear to be favorite genres among males.

Irrespective of gender, the population watching movies (hence rating), between the age group 18 to 29 years is highest among all 4 categories of age groups. It appears that women love ‘Drama’ and ‘Romance’ movies more than men.

3. Now let’s see if how rating differ as per gender and age group :

Ratings as per gender and age.

Sci-Fi appears to be the highest rated genre in the age group of U18 for both males and females. Also, females in this age group have rated it a bit higher than the males in the same age group.

The higher ratings given by male and female audience under 18 years indicates, that this group tends to rate movies of all genres higher as compared to audience of other age groups.

‘Comedy’ movies appear to be liked by both men and women, almost equally.

Women have rated Action, Adventure and Animation movies higher than men. This indicates that they enjoy movies of these genres more than men.

4. US vs Non-US Cross Analysis:

We create a column IFUS in our original dataframe -’movies’ and map a value USA if movie is made in the US that is hollywood movie and non-USA if it is not a hollywood movie.

Now let’s plot number of votes for US and non-US movies by audience in USA and other countries.

Count of Votes as per US/non-US movies by US /non-US audience

Number of votes for hollywood movies by non-US people is more as compared to US people. This may be because hollywood movies are marketed well in other countries. They are watched more by non-US people and become popular.

Outliers for hollywood movies in both the plots above show that some hollywood movies are very much popular worldwide.

The median of US and non-US made movies in the first plot is comparable, which shows that average audience of USA enjoys watching movies made by other countries as likely as hollywood movies.

5. Ratings for US/non-US movies by US/non-US audience:

Ratings for US/non-US movies by US/non-US audience

On an average, hollywood movies are rated in the range (7.8 to 8.1) by the USA people, while non-hollywood movies are rated in the range (7.8 to 8).

Average rating given by non-US people to hollywood movies is roughly in the range (7.6 to 7.9) and for non-hollywood movies it is (7.6 to 8 approx). This means that hollywood movies are better rated in the US, but the average rating for non-hollywood movies is better than hollywood movies, ouside the US.

Some hollywood movies are so popular among the audience worldwide, that they have recieved a rating more than 8.5.

6. Top 1000 voters and genres : Let’s see which genre is popular among top 1000 voters of IMDb.

Genre for movies by top 1000 voters

Sci-Fi is the most popular genre while Romance is the least popular genre among the voters. Action, Thriller, Adventure are the next favourites. If we compare the results with those of the heatmaps in subtask 3.3, we can infer that a large percentage of the top 1000 voters on IMDb, are males.

This completes our analysis.

--

--

Puja P. Pathak
The Startup

Data Enthusiast | Daughter | Sister | Wife | Mother | X-Banker | Reader | Loves to write | Ideas, opinions, views are personal |