Marvel vs DC Data Analysis in Python
MCU vs DC. Which one is better? Which has more high-rated movies? Analysis of Marvel and DC movies based on gross value.
Marvel Cinematic vs DC Universe, it’s a never-ending debate, right? Fans got crazy when you oppose any of these cinematic universes. But in the article, we are going to do a fight over Marvel vs DC based on some data. Data always tells the truth. So, let’s start this data war, with a cup of coffee.
MCU vs DC
The link to download the dataset is available at the end of this article.
The video tutorial of this project is here:
You can write the Python code in Jupyter Notebook, Google Colab, or any other preferred editor. I will recommend you Google Colab because I use it more often.
Attention all developers seeking to make social connections and establish themselves while earning passive income — look no further! I highly recommend ‘From Code to Connections’, a book that will guide you through the process. Don’t miss out, grab your copy now on Amazon worldwide or Amazon India! You can also go for Gumroad
Code & Analysis
- Import the libraries
#for mathematical computationimport numpy as np
import pandas as pd
import scipy.stats as stats#for data visualizationimport seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import plotly
import plotly.express as px
% matplotlib inline
- Let’s load the data and take a sneak peek at the data.
df = pd.read_csv("/content/mdc.csv", encoding='latin-1')
df.head()
We have names of movies, year of release, genre, IMDB rating, IMDB gross, entity, and so on.
- Gather some more information of data.
df.describe()df.info()
Check out the null values in each column. We got lucky that there are no null values in our dataset.
After that, get more information about our dataset with the type of each column attributes.
- Heatmap of the data we have
f,ax = plt.subplots(figsize=(14,10))
sns.heatmap(df.corr(), annot=True, fmt=".2f", ax=ax)
plt.show()
Run the cell you will see an output something like this on-screen.
If you want to look at Marvel-only movies or DC-only movies, you can do this using the entity of the dataset. Something like this:
df[df.entity == 'MARVEL'].tail(5)df[df.entity == 'DC'].tail(5)
Run the above code in the python cell, and you will get the desired output.
- Who makes more movies?
fig = plt.figure(figsize = (10,10))
ax = fig.subplots()
df.entity.value_counts().plot(ax=ax, kind='pie')
ax.set_ylabel("")
ax.set_title("MARVEL VS DC (No. of Movies)")
plt.show()
The above code will give us the following output:
The above pie chart clearly shows us that Marvel makes more movies than DC. Here the MCU is winning with a large margin. Why DC?? Why??
- Movie Genre Focused More By MCU
fig = plt.figure(figsize = (10,10))
ax = fig.subplots()
df[df.entity == 'MARVEL'].genre.value_counts().plot(ax=ax, kind='pie')
ax.set_ylabel("")
ax.set_title("Marvel Movie Genre Type")
plt.show()
Run the cell you will see an output something like this on-screen.
As you can see that the genre of most of the Marvel Cinematic Universe is Action, Adventure, Sci-Fi, Comedy, and Fantasy. Now take a look at the DC.
- Movie Genre Focused More By MCU
fig = plt.figure(figsize = (10,10))
ax = fig.subplots()
df[df.entity == 'DC'].genre.value_counts().plot(ax=ax, kind='pie')
ax.set_ylabel("")
ax.set_title("DC Movie Genre Type")
plt.show()
The above code will give us the following output:
You can see that the DC movies are more diverse than Marvel movies. DC tries to touch more genre types at a time. I think this is the best part of DC Universe.
- Top-rated Marvel and DC movies, based on IMDB:
dc_movies = df[df.entity == 'DC']
marvel_movies = df[df.entity == 'MARVEL']#Average and highest rated of dc moviesavrg_dc_imdb = dc_movies['imdb_rating'].mean()
highest_dc_imdb = dc_movies['imdb_rating'].max()print("Average: ",avrg_dc_imdb, "\n Highest: ",highest_dc_imdb)#Average and highest rated of marvel moviesavrg_marvel_imdb = marvel_movies['imdb_rating'].mean()
highest_marvel_imdb = marvel_movies['imdb_rating'].max()print("Average: ",avrg_marvel_imdb, "\n Highest: ",highest_marvel_imdb)
The output of the above code is:
###DC###
Average: 6.133333333333335
Highest: 9.0###MARVEL####
Average: 6.794736842105261
Highest: 8.4
The average rating of DC movies is 6.133 and for Marvel movies, it’s 6.794. DC has one of the highest-rated movies of all time.
- IMDB rating vs IMDB gross (Marvel and DC)
plt.scatter(data = marvel_movies, x = 'imdb_rating', y = 'imdb_gross')
plt.scatter(data = dc_movies, x = 'imdb_rating', y = 'imdb_gross')
plt.title('Marvel vs. DC in imdb ratings and gross')
plt.xlabel('IMDb Ratings')
plt.ylabel('IMDb Gross')
plt.legend(['Marvel', 'DC'])
The above code will give the following output.
In terms of gross, few Marvel movies are far away from DC Movies.
Most of the Marvel movie has IMDB ratings lies between 6.7 to 8.2.
DC movies ratings are evenly distributed across the graph.
DC movies are performing well on IMDB gross but if you compare it with Marvel then they fall short.
- Tomato meter vs IMDB rating (Marvel vs DC)
imdb_vs_tm = sns.lmplot(data=df, x="imdb_rating", y="tomato_meter", hue="entity", height=7)
imdb_vs_tm.set_axis_labels("IMDb Ratings", "Tomato meter Score")
The output of the above code is:
- Top DC movies list based on IMDB rating
top_dc_movie = dc_movies.groupby('title').sum().sort_values('imdb_rating', ascending=False)
top_dc_movie = top_dc_movie.reset_index()
px.bar(x='title', y ="imdb_rating", data_frame=top_dc_movie)
The output of the above code is:
The Dark Knight is the Top-rated DC movie. It has an IMDB rating of 9. If you didn’t watch it yet then do watch. You will witness the legendary act of Sir Heath Ledger. This movie shows that what DC Universe is capable of.
- Top Marvel movies list based on IMDB rating
top_marvel_movie = marvel_movies.groupby('title').sum().sort_values('imdb_rating', ascending=False)
top_marvel_movie = top_marvel_movie.reset_index()
px.bar(x='title', y ="imdb_rating", data_frame=top_marvel_movie)
The output of the above code is:
Avengers Endgame is the Top-rated Marvel movie. It has an IMDB rating of 8.4. The data has some errors, that’s why it shows the fantastic four with the highest IMDB rating.
- Marvel vs DC (Runtime)
###Marvel###
avrg_marvel_runtime = marvel_movies['runtime'].mean()
highest_marvel_runtime = marvel_movies['runtime'].max()###DC###
avrg_dc_runtime = dc_movies['runtime'].mean()
highest_dc_runtime = dc_movies['runtime'].max()print("Marvel\nAverage: ",avrg_marvel_runtime, "\n Highest: ",highest_marvel_runtime)
print("DC\nAverage: ",avrg_dc_runtime, "\n Highest: ",highest_dc_runtime)
The output of the above code is:
Marvel
Average: 124.54385964912281
Highest: 244
DC
Average: 123.45454545454545
Highest: 164
The average runtime of both the Marvel and DC movies is almost equal. But there is a huge difference in their highest runtime movies.
- Top Marvel movies based on IMDB gross
top_marvel_movie_gross = marvel_movies.groupby('title').sum().sort_values('imdb_gross', ascending=False)
top_marvel_movie_gross = top_marvel_movie_gross.reset_index()
px.bar(x='title', y ="imdb_gross", data_frame=top_marvel_movie_gross)
The above code will give the following output.
As we all know that the Avengers Endgame tops the chart followed by Black Panther, Infinity War, and more. Endgame has an IMDB gross of more than $800.
- Top DC movies based on IMDB gross
top_dc_movie_gross = dc_movies.groupby('title').sum().sort_values('imdb_gross', ascending=False)
top_dc_movie_gross = top_dc_movie_gross.reset_index()
px.bar(x='title', y ="imdb_gross", data_frame=top_dc_movie_gross)
The above code will give the following output.
According to the above chart, The Dark Knight has the highest IMDB gross. In terms of gross collection, DC is far away from Marvel. For more clear Picture, look at the following graph.
- Marvel and DC over the year based on gross earning
fig = px.line(df, x="year", y="imdb_gross", color='entity')
fig.show()
The output figure is:
Over the past few years, MCU has far more IMDB gross collection than DC Universe.
- How frequently Marvel or DC makes movies
fig2 = px.line(df, x='year', y='title', color='entity')
fig2.show()
From the above picture graph, we conclude that after the year 2002, Marvel makes movies quite frequently as compared to DC. Maybe this is the reason for the slightly low fan base of DC in comparison to Marvel.
We can not compare both the Movie Making production house because who knows the future. In the future, DC may overshadow Marvel. But the best part is that both these productions houses are making good movies and entertained the audience for the past few decades.
Well, that’s it for this article.
If this article sounds informative to you, make sure to follow and share it with your geek community.
The Google Colab link to the Code is here.
You can download the dataset from this link
More Data Science Projects
Top Cyber Data Breaches (2004–2021): Data Analysis and Visualization
Medium Articles Data Visualization and Analysis using Python
Spotify Data Visualization and Analysis using Python
IPL Data Analysis (2008–2020) using Python
Zomato Data Analysis with Jupyter Notebook
Data Analysis and Visualization of Co2 Emission by Different Countries
Hello, My Name is Rohit Kumar Thakur. I am open to freelancing. I build react native projects and currently working on Python Django. Feel free to contact me at (freelance.rohit7@gmail.com)
More content at plainenglish.io. Sign up for our free weekly newsletter here.