Data Analysis and Visualization on MyAnimeList Data

Project

Navi Ubhi
11 min readOct 22, 2023
Image retrieved from https://tinyurl.com/tm4pu5hu

Tools Used: SQL Server, Tableau, Excel Spreadsheet

INTRODUCTION

Anime has always held a special place in my heart. From the captivating stories and vibrant characters to the endless creativity of the anime industry, it’s a world I’ve cherished since I was young. This project is a celebration of that passion, as I dive into the data behind the anime universe to uncover trends, preferences, and stories that enrich our experience as anime enthusiasts.

OBJECTIVE

The objective of this project is to leverage SQL for a comprehensive exploration of the MyAnimeList dataset. Ultimately, I will be extracting meaningful insights and presenting them in an engaging and interactive dashboard on Tableau.

Scope of Analysis

  1. Genre Analysis: Investigating the most popular anime genres and their influence on user engagement and popularity.
  2. Time & Seasonal Trends: Identifying seasonal patterns in anime releases and user engagement to uncover trends in seasonal preferences.
  3. Studio Performance: Analyzing the production output of studios and assessing the correlation between quantity and quality.
  4. User Engagement: Examining user behavior, including watchlists, favorites, and their relationship with anime popularity.

Through this all-around analysis, I aim to provide valuable insights for anime enthusiasts and stakeholders in the anime industry, offering a comprehensive understanding of the anime landscape today.

PREPARE

For this project, I will be using the “MyAnimeList Dataset” found on Kaggle. MyAnimeList is a platform where anime enthusiasts from around the world gather to catalog, rate, and review anime series. This dataset is considered public and safe to use.

The dataset contains 13,380 rows and 38 columns. It includes columns containing information such as anime titles, types (e.g., TV series or movies), user scores and ratings, episode details, genre categorizations, studio production credits, popularity metrics, and textual synopses, providing a comprehensive view of anime series characteristics for analysis and visualization.

PROCESS

For this process, I will be using SQL Server to clean and transform the dataset to ensure data quality and consistency. This includes handling missing values, standardizing formats, and structuring the data for analysis. I will also use SQL queries tailored to specific questions to gain a deeper understanding. These queries will provide insights such as genre preferences, viewer engagement patterns, and the impact of seasonality on anime popularity.

Transitioning to Tableau, I’ll be using the cleaned dataset and SQL-generated insights to create a visually engaging dashboard that showcases a comprehensive view of the anime dataset. These visuals will make the data accessible and informative to a broader audience.

Cleaning & Transforming the Data

Step 1: Removing duplicates and unnecessary columns

--Looking for duplicates
SELECT title, count(*) as duplicate_count
FROM Anime..AnimeData
GROUP BY title
HAVING COUNT(*) > 1;

SELECT anime_id, count(*) as duplicate_count
FROM Anime..AnimeData
GROUP BY anime_id
HAVING COUNT(*) > 1;

--Removing unnecessary columns
ALTER TABLE Anime..AnimeData
DROP COLUMN main_pic
, pics
, clubs
, score_01_count
, score_02_count
, score_03_count
, score_04_count
, score_05_count
, score_06_count
, score_07_count
, score_08_count
, score_09_count
, score_10_count
, total_count
, start_date
, end_date
, members_count
, source_type
, synopsis

Step 2: Dealing with Formatting

--Formatting the Start Date
SELECT CAST(start_date as DATE) AS start_date_formatted
FROM Anime..AnimeData;

ALTER TABLE Anime..AnimeData
ADD start_date_formatted DATE;

UPDATE Anime..AnimeData
SET start_date_formatted = CAST(start_date as DATE);

--Formatting the End Date
SELECT CAST(end_date as DATE) AS end_date_formatted
FROM Anime..AnimeData;

ALTER TABLE Anime..AnimeData
ADD end_date_formatted DATE;

UPDATE Anime..AnimeData
SET end_date_formatted = CAST(end_date as DATE);

--Formatting the Season column to show ONLY the season of the year
UPDATE Anime..AnimeData
SET season = SUBSTRING(Season, 1, CHARINDEX(' ', Season) - 1)
WHERE CHARINDEX(' ', season) > 0;

Step 3: Dealing with NULL values

First, I remove the following rows:

  • Rows where Score = NULL and Status = Finish Airing
  • Rows where Score = NULL and Status = Currently Airing and favorites_count = 0
  • Rows where Score = NULL and Status = Not Yet Aired and favorites_count = 0

These rows give no meaningful information because the score, popularity, and favorites are extremely low or 0. By excluding these entries, I ensure that my analysis is focused on active and ongoing anime series, where user engagement and ratings are more indicative of current preferences and trends.

--Removing rows where Score = NULL and Status = Finish Airing
SELECT *
FROM Anime..AnimeData
WHERE score IS NULL AND status = 'Finished Airing'
ORDER BY favorites_count DESC;

DELETE FROM Anime..AnimeData
WHERE score IS NULL AND status = 'Finished Airing';

--Removing rows where Score = NULL and Status = Currently Airing and favorites_count = 0
SELECT *
FROM Anime..AnimeData
WHERE score IS NULL AND status = 'Currently Airing' AND favorites_count = 0;

DELETE FROM Anime..AnimeData
WHERE score IS NULL AND status = 'Currently Airing' AND favorites_count = 0;

--Removing rows where Score = NULL and Status = Not Yet Aired and favorites_count = 0
SELECT *
FROM Anime..AnimeData
WHERE score IS NULL AND status = 'Not yet Aired' AND favorites_count = 0;

DELETE FROM Anime..AnimeData
WHERE score IS NULL AND status = 'Not yet Aired' AND favorites_count = 0;

Then, I fill in the NULL values in the Season column based on the start date column.

UPDATE Anime..AnimeData
SET season = CASE
WHEN MONTH(start_date_formatted) IN (1, 2, 3) THEN 'Winter'
WHEN MONTH(start_date_formatted) IN (4, 5, 6) THEN 'Spring'
WHEN MONTH(start_date_formatted) IN (7, 8, 9) THEN 'Summer'
WHEN MONTH(start_date_formatted) IN (10, 11, 12) THEN 'Fall'
ELSE NULL
END
WHERE season IS NULL;

Step 4: Create a temp table for the Genres and Studios column

The initial dataset presented a challenge in the genres and studios column, where multiple genres and studios were concatenated within each row, separated by ‘|’ (pipe) characters. For example, genres appeared in the format “mystery|romance|action”.

To make the data more manageable to analyze and to enable detailed exploration of individual genres and studios, a critical step was taken in this phase. A temporary table was created to split the genres and studios into separate rows, transforming the dataset into one where each row represents a unique anime-genre and anime-studio combination.

--Creating a temp table so that each genre is considered separately for counting and analysis
WITH SplitGenres AS (
SELECT anime_id, TRIM(value) AS Genre
FROM Anime..AnimeData
CROSS APPLY STRING_SPLIT(Genres, '|')
WHERE Genres IS NOT NULL
)
SELECT * FROM SplitGenres

--Creating a temp table so that each studio is considered separately for counting and analysis
WITH SplitStudios AS (
SELECT anime_id, TRIM(value) AS Studio, score
FROM Anime..AnimeData
CROSS APPLY STRING_SPLIT(studios, '|')
WHERE studios IS NOT NULL
)
SELECT * FROM SplitStudios

This transformation yielded two new datasets that will serve as complementary data sources for the primary dataset. In Tableau, I will be establishing a relationship between the main dataset and these newly created datasets, enabling seamless integration and the ability to perform in-depth analyses related to genres and studios.

ANALYZE

Note: you can review all the SQL queries I performed, each tailored to address specific questions, on my GitHub repository.

Also, you can view my full interactive visual dashboard for this project here on my Tableau Public profile.

Genre Analysis

Figure 1

The chart above presents the top 10 genres with the highest average ratings. This chart serves as the starting point for examining the interplay between high ratings, genre popularity, and user engagement. In this analysis, I aim to understand the relationships and dynamics between these factors and how a genre’s high rating influences its popularity and user engagement.

Figure 2

Popularity vs. Average Rating

Looking at Figure 2, one key insight is that a higher average rating for a genre does not inherently translate into its popularity. In Figure 1, genres like “Award-Winning,” “Mystery,” and “Shounen” have impressive average ratings, yet they do not secure top positions among the most popular genres. Instead, it’s genres like “Comedy,” “Action,” and “Fantasy” that dominate the popularity charts, suggesting that users/viewers gravitate towards engaging and entertaining content rather than simply targeting highly rated genres.

Figure 3

User Engagement vs. Average Rating

With Figure 3, I can analyze the relationship between average ratings and user engagement, particularly plan-to-watch counts. The data reveals that genres with high average ratings may not necessarily yield high plan-to-watch counts. Once again, genres like “Comedy,” “Action,” and “Romance” stand out as being more popular in terms of user engagement. This observation reinforces the idea that, in the eyes of viewers, prioritizing engaging and entertaining content outweighs the significance of high genre ratings.

Figure 4

Anticipating Trends in Anime Genres

A pattern emerges when exploring the most sought-after anime genres among those that have yet to air (see Figure 4). Genres like Action, Comedy, Fantasy, and Adventure continue to capture viewers’ attention. Additionally, we witness the emergence of newer genres like Demons, while traditional genres such as Shounen and School make a reappearance. This dynamic landscape highlights and reinforces how the industry values exciting and enjoyable content, making user engagement and popularity key factors.

Time & Seasonal Trends Analysis

Figure 5

Seasonal Popularity: Figure 5 reveals that the Winter season consistently sees a higher volume of anime releases compared to other seasons, indicating a traditional peak in new content during this time. Notably, the fall season follows closely, hinting at strong competition for viewer attention during these two seasons.

Figure 6

It’s notable that even though Winter has the most anime releases, they place third in terms of viewership among the seasons (see Figure 6). The Fall season is where anime series consistently receive higher watching counts compared to other seasons. One possible explanation for the fall season’s dominance in watching count could be linked to the historical release patterns of popular anime series during this time. A significant portion of the top 100 most popular animes, based on factors like ratings and user engagement, were traditionally released during the fall season. Examples include Naruto, Death Note, and One Punch Man.

Figure 7

Genre Evolution: It is evident in Figure 7 that anime genres have experienced dynamic shifts in popularity over the last two decades. Comedy and Action have consistently topped the charts, while Fantasy’s rise in popularity since 2012 indicates evolving viewer tastes. The decline in school-themed anime and the increased interest in slice-of-life and adventure genres starting in 2017 reflect a changing landscape of viewer preferences.

Industry Growth: The substantial increase in user viewership over the years highlights the growing popularity of anime as a form of entertainment. Factors such as increased media exposure, online streaming platforms, and internet connectivity have contributed to this surge in viewership. This presents opportunities for the industry to further expand its reach and impact.

Studio Performance Analysis

Quality vs. Quantity

Figure 8

Studios with a lower quantity of productions, such as P.I.C.S., K-Factory, Studio Find, and Egg Firm, stand out with higher average user ratings (see Figure 8). Each of these studios has produced less than 5 animes. This observation implies that these studios place a strong emphasis on delivering quality in their projects, ultimately leading to greater viewer satisfaction.

Figure 9

On the flip side, studios that have a higher production volume, including A-1 Pictures, Studio Pierrot, Madhouse, and Toei Animation have attracted a substantial viewer base (see Figure 9). This can be seen in their higher popularity ranks and viewer engagement. This suggests that these studios manage to engage a broad audience. Their diverse range of anime series caters to various tastes, leading to a larger and more engaged viewership.

User Engagement Analysis

For this analysis, I find relationships between three pivotal metrics: “Score (rating)”, “Popularity Rank”, and “Plan to Watch Count”.

Note: The lower index of popularity means the better.

Score vs. Popularity Rank

Figure 10

It is evident in Figure 10 that anime series with higher “Score” (rating) values tend to secure better positions in the “Popularity Rank”. This observation highlights the inherent connection between viewer ratings and the perceived popularity of an anime series.

Score vs. Plan to Watch Count

Figure 11

As evident in Figure 11, as the rating of an anime series improves, so does the number of viewers expressing an intention to watch it in the future. This finding underscores the pivotal role of ratings in guiding viewer anticipation and future viewing plans.

Popularity Rank vs. Plan to Watch Count

Figure 12

Notably, as anime series ascend the ranks in popularity, they exhibit a heightened level of interest among potential viewers, as indicated by an increase in “Plan to Watch Count” (see Figure 12). This aligns with the understanding that higher-rated anime series have the potential to generate greater attention and enthusiasm among viewers.

The relationships between these metrics showcase a clear picture of what resonates with audiences, offering creators and producers a roadmap to craft successful content.

Final Insights and Takeaways

  1. Genre Preferences: While high average ratings are appreciated, the anime industry leans towards genres like Comedy, Action, and Fantasy that drive popularity and user engagement. To studios and creators, this suggests that creating engaging and entertaining narratives should be a top priority. However, these stakeholders should exercise caution when delving into these popular genres, as they are highly saturated. Exploring niche sub-genres or adding unique twists to well-worn genres could help content stand out.
  2. Seasonal Trends: The Winter and Fall seasons dominate anime releases, with Fall consistently generating higher plan-to-watch counts. For studios planning release schedules, consider leveraging the Fall season’s popularity to introduce captivating series.
  3. Studio Performance: The quality vs. quantity debate in studio performance suggests that studios with fewer productions prioritize quality, while high-production studios engage a broad audience, thanks to their diverse range of anime series. Balancing quality and quantity is key for studio success, as diverse content can engage a broader audience.
  4. User Engagement: It’s evident that while high average ratings are valued, they don’t necessarily guarantee an anime’s popularity or viewers’ intention to watch. What truly resonates with audiences are genres that prioritize engagement, entertainment, and compelling narratives. To thrive in this dynamic landscape, creators and studios must focus on crafting content that captivates and excites viewers. Additionally, timing plays a crucial role, with the Fall season emerging as a prime period for attracting audiences.

Data Limitations

  • The analysis is based on a specific dataset, and there may be variations and trends not represented here.
  • Demographic and geographic data of users are not included, limiting the ability to understand regional and demographic preferences.

Future Work

For future work, it’s worth exploring additional datasets to expand the scope of analysis. Incorporating user-specific data, social media sentiment analysis, and geographic data can provide a more comprehensive view of viewer preferences and industry trends. By analyzing the geographic dimension, we can gain insights into regional variations in genre preferences and viewing habits, guiding targeted marketing and content creation strategies. Moreover, analyzing user data, such as demographic information and viewing habits can shed light on the diverse and evolving anime fanbase.

Additionally, predictive modeling could be applied to forecast anime popularity and viewer engagement, aiding producers in decision-making.

These future avenues of research hold the potential to offer deeper insights and opportunities for the anime industry to thrive in a rapidly evolving landscape and provide valuable insights for creators and enthusiasts.

Navi Ubhi | LinkedIn | Github | Tableau

--

--

Navi Ubhi

Just a data enthusiast sharing exciting data projects and case studies I've worked on.