Exploring the Video Game Sales with Ratings Dataset: Unveiling Insights into Gaming Trends

Aman B
5 min readMay 15, 2023

--

Below you can see a glimpse of the basic Exploratory Data Analysis done with this data to gather insights. Although this dataset is not that big and diverse, we can still gather a lot of insights from this. This a great example to get started with EDA specially for beginners. Hope this helps you in your quest for becoming the best data geek!!

Introduction: The world of video games has witnessed exponential growth in recent years, captivating millions of players around the globe. With an abundance of genres, platforms, and titles to choose from, understanding the dynamics of the gaming industry becomes crucial for developers, marketers, and enthusiasts alike. In this article, we delve into the “Video Game Sales with Ratings” dataset available on Kaggle, offering a comprehensive overview and shedding light on the valuable insights it holds.

Understanding the Dataset:

The “Video Game Sales with Ratings” dataset is a treasure trove of information, providing a comprehensive look at video games and their associated sales figures and ratings. It encompasses a wide range of attributes, including the game’s title, genre, platform, release year, publisher, global sales, and critic and user ratings. This dataset serves as a valuable resource for conducting in-depth analysis and exploring various facets of the gaming industry.

Dataset overview

Exploring Gaming Trends:

#Distribution of video game genres
plt.figure(figsize=(12, 6))
sns.countplot(x='Genre', data=df)
plt.title('Distribution of Video Game Genres')
plt.xlabel('Genre')
plt.ylabel('Count')
plt.xticks(rotation=90)
plt.show()

Insight:

Most games produced fall in the Action, Sports, Shooter, Adventure and Racing genres.

#bar plot for sales by genre
plt.figure(figsize=(12, 6))
sns.barplot(x='Genre', y='Global_Sales', data=df)
plt.title('Global Sales by Genre')
plt.xlabel('Genre')
plt.ylabel('Global Sales')
plt.xticks(rotation=90)
plt.show()

Insight:

The sales figures compliment the previous visualization justifying the volume of production of the games in their respective genres.

# Global sales by platform
plt.figure(figsize=(12, 6))
sns.barplot(x='Platform', y='Global_Sales', data=df, estimator=np.sum)
plt.title('Global Sales by Platform')
plt.xlabel('Platform')
plt.ylabel('Global Sales (in millions)')
plt.xticks(rotation=90)
plt.show()

Evolution Over Time: With information on release years, we can track the evolution of gaming trends. By analyzing sales, ratings, and genre preferences over different time periods, we can observe how the gaming industry has changed and adapt our strategies accordingly.

# Group the data by year and genre and calculate the total sales
genre_popularity = df.groupby(['Year_of_Release', 'Genre'])['Global_Sales'].sum().reset_index()

# Pivot the data to have years as columns and genres as rows
genre_popularity_pivot = genre_popularity.pivot(index='Genre', columns='Year_of_Release', values='Global_Sales')

# Create the line plot
plt.figure(figsize=(12, 6))
sns.lineplot(data=genre_popularity_pivot.T)
plt.title('Genre Popularity Over the Years')
plt.xlabel('Year')
plt.ylabel('Global Sales')
plt.legend(genre_popularity_pivot.index, loc='upper left', bbox_to_anchor=(1, 1))
plt.show()


Genre Popularity: The dataset’s genre attribute allows us to examine the popularity of different game genres. By identifying the genres with the highest sales or most highly rated games, we gain insights into the gaming preferences of players and market trends.

#Top genres 
plt.figure(figsize=(8, 6))
sns.barplot(x=top_genres.index, y=top_genres.values)
plt.title('Top 5 Genres by Global Sales')
plt.xlabel('Genre')
plt.ylabel('Global Sales (millions)')

plt.show()

Top-Selling Games and Platforms: By analyzing the global sales figures, we can identify the best-selling games and platforms in the dataset. This information helps us understand the preferences of gamers and discern the most successful titles and platforms over the years.

#Top platforms
top_platforms = df['Platform'].value_counts().nlargest(5)

plt.figure(figsize=(8, 6))
sns.barplot(x=top_platforms.index, y=top_platforms.values)
plt.title('Top 5 Most Preferred Platforms')
plt.xlabel('Platform')
plt.ylabel('Count')
plt.show()

Insight:

PS dominates the console market followed by Wii and Xbox 360

#Top selling games
top_selling_games = df.sort_values(by='Global_Sales', ascending=False).head(10)

# Create the horizontal bar plot
plt.figure(figsize=(12, 6))
plt.barh(top_selling_games['Name'], top_selling_games['Global_Sales'])
plt.title('Top 10 Selling Games')
plt.xlabel('Global Sales')
plt.ylabel('Game')
plt.show()

Insight:

Surprisingly all-time top selling games are retro arcade games like Mario kart etc. Wii games which involve motion controls are also quite popular as they feel for real and are fun as well.

Publisher Analysis: The dataset provides details about the publishers behind each game. By studying the performance of different publishers in terms of sales and ratings, we can gain insights into their strategies and identify publishers that consistently release successful games.

# Calculate the total sales for each publisher
publisher_sales = df.groupby('Publisher')['Global_Sales'].sum().sort_values(ascending=False)

# Select the top N publishers
top_publishers = publisher_sales.head(10)

# Plot the bar graph
plt.figure(figsize=(12, 6))
top_publishers.plot(kind='bar')
plt.title('Top 10 Publishers by Sales')
plt.xlabel('Publisher')
plt.ylabel('Global Sales')
plt.xticks(rotation=45)
plt.show()
#Top games acc to users
top_20_games_user_score = df.nlargest(20, 'User_Score')

top_20_games_user_score[['Name', 'Publisher', 'User_Score']]

Insight:

This table shows the best rated games by users and their respective publishers & genres. It is crucial to know what the gamers like as they are users of the product and it's their opinion and choices which matter the most. By knowing what the customer likes it becomes a little easier to know the direction of production.

Benefits and Applications: The “Video Game Sales with Ratings” dataset offers numerous benefits and applications to various stakeholders within the gaming industry:

  1. Game Developers: By examining successful titles and genres, developers can identify market trends and make informed decisions during the game development process. Insights from user ratings can also help shape future updates and improvements.
  2. Marketers and Publishers: Understanding the impact of ratings and identifying successful publishers can guide marketing strategies and partnerships. This dataset provides a foundation for data-driven decision-making in terms of advertising, promotion, and distribution.
  3. Industry Analysts: Researchers and analysts can leverage this dataset to study the gaming industry’s growth, identify emerging trends, and uncover patterns that shape consumer preferences. These insights can inform business strategies, investment decisions, and market forecasting.

Conclusion: The “Video Game Sales with Ratings” dataset on Kaggle offers a comprehensive and detailed overview of video game sales and ratings. Exploring this dataset enables us to uncover valuable insights about gaming trends, understand the impact of ratings on sales, and gain a deeper understanding of the gaming industry as a whole. With its wide range of applications, this dataset serves as a valuable resource for developers, marketers, researchers, and analysts, empowering them to make informed decisions and adapt to the ever-evolving world of video games.

--

--