The Sales of Video Games Domestically and Globally

Michael Kelley
INST414: Data Science Techniques
3 min readMay 12, 2022

Using an existing dataset on video game stats from Kaggle, I conducted an analysis of video games, along with their sales in both North America and in the world at large. I did this with the goal of gaining some insight into the correlations between the commercial success of video games in North America and that same success globally, along with any potential unexpected patterns in the sales of these games throughout the world.

One insight that I hope to extract from this data is the presence and extent of any clear relationship between the sales of video games in North America and the sales globally. While this may seem like a trivial exercise with an obvious outcome, the strength of any correlation could be observed as an indicator of just how much North America, as one of the biggest markets for video games, could be used by companies as an indicator of the potential for commercial success with their products globally. If a weaker correlation than expected is observed, video game companies may not want to put as much stock in the success of video games in North America when making marketing decisions worldwide.

The selection of a k value (4) for this particular dataset was done with a predefined value. After observing the visible trends of the raw data, I determined that grouping it into a total of four clusters was the most representative way of arranging this information.

Looking at the clusters produced by my code, the distinction of each cluster is fairly simple. They represent different levels of commercial success for their respective groups of video games. This is an easy conclusion to draw, given the relatively linear trend in the data points. The four clusters can be considered to represent low sales, moderately low sales, moderately high sales, and high sales, looking at them on an ordinal scale.

In order to facilitate this analysis, I used Jupyter Notebook to type Python code to import data, transform it, and plot it. I used the Scikit library to import methods for easily conducting k-means analysis.

The following plot was produced for the purpose of this analysis. Clusters are color coded for convenience.

K-means clusters

This was a challenging analysis to conduct. I faced setbacks in the form of data not being read properly by my code. I was able to resolve this when I eventually determined that the issue was resulting from string values in my data causing errors as the code tried to read numbers. Since the numerical values were the focus of this particular analysis, I was able to create a copy of the dataset, filtered to include only the columns reporting the sales of each game. I also initially had a plot that was far too small to properly convey the data. I was able to clean this up by resizing the plot so that all of the clusters were much more visible to the viewer.

Overall, this analysis is limited by the relatively linear results produced by the observed metrics. Since the trend between North American sales of video games and global sales was rather linear, with few observable outliers, the clusters followed a similarly linear path on the plot. A more in-depth analysis would probably make use of more details of the games and factors in the sales and commercial success of video games to gain more valuable insight into what attributes and levels of success in some regions tend to correlate with global success, and how different types of video games could be grouped into different clusters that have their own niches in terms of worldwide commercial success.

Overall, I believe that I was able to gain a fairly decent amount of insight into the relationship between the commercial success of video games in the North American market and in the global market. The relationship is highly linear, as expected, and there turned out to be significantly fewer outliers than I initially expected. Perhaps different regions would show more outliers or different trends between local success and global success of video games.

--

--