Exploratory Data Analysis (EDA) on MyAnimeList data
An EDA to know what features influence an anime to have a higher score than others.
In the recent years, the anime industry has grown significantly, Japan and around the world, consequently, many series, movies, ovas, etc. have been created by that industry. Either focused for children, teens or adults.
Therefore, we would like to get significant features of each anime to know how they influence the score of viewers with respect to other animes.
Finally, we are going to give some recommendations on how an anime could have a better chance of being more highly rated.
Data is provided by Kaggle and can be found in this link: Anime Recommendation Database 2020 | Kaggle
You can visit my notebook in this link: Personal_Projects/MyAnimeList.ipynb at main · AlexRoman938/Personal_Projects (github.com)
Step 1: Let’s see the anime dataframe
The dataset that we are going to choose is anime.csv, and it will be in the variable called df_anime.
Next, let’s see df_anime.
We found some columns (e.g: Genres, Producers, Licensors, Studios) that they have a list in their rows.
Also, there are many ‘Unknown’ data that represent a null values. However, the null values are always reprented with ‘NaN’.
Then what do those mean?.
For to answer those question, let’s see the information of df_anime
Wow, we found that data types of the columns that have list and ‘Unknown’ are object data. In other words, both are object data.
Hence, we need to deal with these data in the next step…
Step 2: Data Cleaning and Data Transformation
In this section, before dealing with the data from the previous step. We are going to clean the columns ‘Aired’, ‘Premiered’ and ‘Duration’ to create new columns with more useful information for analysis.
AIRED → START YEAR
PREMIERED → ANIME SEASON
DURATION → DURATION IN MINUTES
Next, it is time to deal with the data from the previous step. Then, we will transform the columns “Start year”, “Duration in minutes”, “Anime Season”, “Type”, “Episodes”, “Source” and “Rating”.
Finally, we are going to eliminate some unimportant columns to start the exploratory analysis.
Step 3: Exploratory Data Analysis
First, we want to know how the score variable is distributed.
We can see the highest peaks are between 6 and 7. This means that most of the anime are within this range.
Next, we want to know the top 10 quantity of Genres, Producers, Licensors and Studios.
Top 10 of Genres
Amazing, we found really well-known genres. They can be found in most anime. Especially, comedy, it helps a lot to make an anime very entertaining.
Top 10 of Producers
Producer is the company in charge of financing the production of an anime.
We found Aniplex is in the top 1. It is a famous producer has well-known anime such as Charlotte, Kill la Kill, Anohana, Angel Beats, etc. Anohana and Angel Beats make us cry :( .
Top 10 of Licensors
Licensors are authorized to distribute anime in other countries.
We found Funimation is in the top 1. It is owned by Sony.
Top 10 of Studios
It is the company in charge of developing the animation.
We found Toei Animation in the top 1. This company has famous anime such as Dragon Ball, Sailor Moon, One Piece, etc. We recommend watching One Piece. Although it is long, it has a great story.
Now, we going to know how score relates to the other variables such as Episodes, Duration in minutes, Genres, Type, Producers, Licensors, Source and Rating.
SCORE vs EPISODES
We can say that as the number of episodes increases, it does not mean that it will have a higher value. So the number of episodes does not influence. It may be that a good anime can have 12 chapters and be very good . Or it does not mean by having many chapters , it will have better valuation.
SCORE vs DURATION IN MINUTES
We can say that if the length of the anime increases, the score of the anime will also increase. Because it has a positive correlation upwards. Although, the highest concentration of the data is between 20 and 40 minutes.
SCORE vs GENRES
We decided to use the top 10 genres, as it gives us a higher value of the population.
We can say Shounen genre is the most valued by viewers. No wonder anime such as Dragon Ball, Naruto, One Piece and Kimetsu no Yaiba are very popular.
SCORE vs PRODUCERS
We decided to use the top 10 producers, as it gives us a higher value of the population.
We can say that the best producers are Dentsu and Aniplex, as they have more highly rated anime than the other producers.
As a curiosity, Dentsu and Aniplex are the ones that financed anime such as Kill la Kill, Angel Beast,etc.
SCORE vs LICENSORS
We decided to use the top 10 licensors, as it gives us a higher value of the population.
Wow, to our surprise. We found AnimePlex of America in the top1.
SCORE vs STUDIOS
We decided to use the top 10 studios, as it gives us a higher value of the population.
We can say A-1 Pictures studio is the most valued by viewers. This studio made popular anime such as Sword Art Online, Darling in the FranXX, Kaguya-sama:Love is war, Fairy Tail, etc.
Although very close is Production! G. This studio made popular anime as well such as Shingeki no Kyojin, Haikyuu!!, Kuroko no Basket, Kimi ni Todoke, etc.
SCORE vs TYPE
Viewers value TV-type anime the most. It must be because they are long series. Either 10 or more chapters. They are the most common.
SCORE vs SOURCE
Anime from light novels are the most highly rated by viewers.
SCORE vs RATING
Rating is what age group the anime is aimed at.
Anime for viewers over 17 years old are the most highly valued.
CONCLUSIONS
During this exploratory analysis we sought to know the features that influence an anime to be more valuable than others. What we found was:
- It does not matter the number of episodes of the anime.
- The length of an anime is a factor that helps but is not necessarily fundamental.
- If the anime is shounen, TV type, rating over 17 years old or coming from light novels. It has a good chance of being highly rated. However, remember that there are anime that have less than 4 in the score.
- If somebody works with dentsu or aniplex producers in the creation of an anime. It migth have a better chance for anime to be more valued. Likewise, with AnimePlex of America as a licensor or A-1 Pictures as a studio.
Also, it is very important to perform data cleaning and transformation. Without these important steps, we could not perform our exploratory analysis.
RECOMMENDATIONS
We recommend analyzing more information on other variables that are not in this dataset. For example, expenses.
These data are from the MyAnimeList platform, so if you want to have a more representative sample. Search elsewhere.
This is my first personal project
Finally, thank you for reading this post. If you would like to contact me. This is my LinkedIn: Alexander Daniel Roman Gabriel | LinkedIn