Among Us Google Play Store Rating Exploratory Data Analysis using Python

Ricky Nauvaldy
The Startup
Published in
4 min readNov 5, 2020

Hello Everyone! Do you know the Among Us Game? This phenomenal game is growing fast in Google Play and App Store Platform. Some Colleagues and I who have graduated from IYKRA Data MBA Batch IV, tried to scrap, analyze, and visualize comment sections of this games.

In this notebook, we will try to do simple Exploratory Data Analysis (EDA) from the Google Play Store User Ratings and Comments for Among Us Game. The scrapping process have already been done before with the data range of September 22nd to October 20th and saved in Microsoft Excel format.

We begin by importing the required libraries and read the Excel file.

# Import required libs
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly
import plotly.express as px
import plotly.graph_objects as go
# Get the data and show its top 5 data
drive_path = '/path/to/file/'
df = pd.read_excel(drive_path+'AmongUsRating.xlsx',index_col='Unnamed: 0')
df.head()
png

We then observe the data by using the dtypes to see the type of each column.

# Get the type of each column
df.dtypes
Star object
User object
Comment object
DateReview datetime64[ns]
Rate int64
dtype: object

and describe to see the count, mean, min max, and some statistics of each numerical column.

# Get the description of each numerical column
df.describe()
png

Graphical Representation

Based on the dataset, we want to observe how many users of each rating each day so it might be seen as a comparison from time to time. Therefore we used the groupby method by the rate and date column. Notice that we assign the new data frame to a new variable to keep the original data frame values for different analysis.

# Count the total user of each day groupped by its rate and date
df_user_rating_day = df.groupby(['Rate','DateReview'])[['User']].count()
df_user_rating_day.reset_index(inplace=True)
df_user_rating_day.head()
png

We plot the Total User per Rate per Day data as a line chart (as it’s the best way to represent time-series data).

# Plot the user_rating_per_day data using plotly
fig = px.line(df_user_rating_day, x='DateReview', y='User', color='Rate', title='Among Us Google Play Rating 22 September to 20 October 2020')
fig.show()

We also observe total users per week to see how this application attract users over week.

# Count the total user of each week groupped
df_user_week = df.groupby(pd.Grouper(key='DateReview', freq='1W')).mean()
df_user_week.index = df_user_week.index.strftime('%Y-%m-%d')
df_user_week.reset_index(inplace=True)
df_user_week
png
fig = px.line(df_user_week, x='DateReview', y='Rate', title='Among Us Rating per Week')
fig.show()

Last but not least, we want to see the distribution of total user per rating from the dataset, represented by pie chart.

# Count the total user of each rate
df_user_rate = df.groupby(['Rate'])[['User']].count()
df_user_rate.reset_index(inplace=True)
df_user_rate
png
# Plot the user_rating data using plotly
fig = px.pie(df_user_rate, values='User', names='Rate', title='User Rating Distribution')
fig.show()

Conclusion

We showed a basic EDA using Python from the Google Play Store User Ratings and Comments for Among Us Game data that we scrapped with the data range of September 22nd to October 20th. The result showed that the peak reviews are in the 24th of September, while this game reviews are mostly positive (4 stars and above). There are still more to analyze from this data especially regarding to text analysis on the users comments, and we hope to be able to do it soon.

Remarks

Please find the Jupyter Notebook used for this analysis here. We also have already done the analysis using Tableau which you can find it on this post. Feel free to discuss with us for anything :D

--

--