What Makes A Game On Steam Popular?

Eric Ramon
4 min readNov 22, 2019

--

In my last article, I looked at the effects of popularity on the total average playtime of a game on steam (you can find it here).

Using different steam data (thanks to Nik Davis), let’s look at what factors are the strongest predictors of the amount of owners of a game on Steam.

The dataset was pretty clean (it wasn’t missing values) but still required a little work to get it ready for my predictive models. I dropped several co-linear features and engineered a feature “age_in_days”, which took the release_date feature and found the length of time from it to when the data was scraped (I set it to June 1, 2019 as the data was collected at the end of May 2019).

After choosing the features it is helpful to see the distribution of the target (what I’m trying to predict). Let’s look at the distribution of total owners of a game (separated in the data by categories)

Distribution of Game Ownership on Steam

It looks like something we would expect, as the user-base increases, there are fewer and fewer games, where the category 100,000,000–200,000,000 million owners has only 1 game. The 0–2000 category is 68.6% of the total target values, so this will be my baseline (the value which I hope my predictive models can outperform).

In order to find the importances of the various features in the data (how much they effect a prediction) I first used several prediction models. I did this by splitting the data into different sets, fitting a model on a part, then predicting values and comparing them to the real values I separated in order to verify the model’s accuracy.

Using a Logistic Regression model, I was able to get an accuracy of about 73%.

(Note: 0–10 are the categories in distribution graph at the top, ‘0–2000’ being zero and so on)

The image above shows where my model predicted a value and the actual value, a useful way to visualize the strength of a model.

The best total results I found were with a Random Forest model, giving me a predicition accuracy of about 81%

Looking at the above image, you may have noticed that there are much fewer false predictions. So, I decided to use this model to find the most important features that can predict the owner category.

So according to our predictive model, the most important features (factors) of a game’s popularity (by ownership) are ratings, age, and playtime.

Unfortunately, these important features may vary, so I decided to look at more specific predictions and how they are affected.

Shapley plot visualizing the feature effects of a game with ownership of 0–2000 (blue on right is price (0.99) and age_in_days (1017)
A game with ownership of 200,000–500,000

You may have noticed how the values have shifted in importance, but are in the same range of magnitude.

What have I learned? The best predictors for a game’s popularity by ownership are ratings, age, and playtime followed closely by price. What could this mean for game developers? Increase community interaction, support your game after release, and have a solid amount of gameplay. Seems logical to me, however I find it interesting that many factors we would think are more important (price, genre, publisher) simply are not.

Great games inspire people to criticize them, play them now, and play them in the future.

Eric Ramon 2019 — nephylum.github.io

--

--