Data Analysis of Google Play Store Apps(Part 1)

Mohammad Aakash
3 min readOct 11, 2020

--

Data Preprocessing and Cleaning

This project is based on the dataset of Google play store apps. Like how many apps are in the dataset and what type of apps like their category , ratings, reviews of the apps, their size, how many times these apps will be installed and also is there apps are free or paid. So this is all about dataset that what it contains. And now in this analysis i am doing many working with this data like finding what is in this dataset though Python Libraries(numpy, pandas, matplotlib, seaborn). And also doing visualizations of data through many kinds of graphs and plots , etc. So, let’s start working

Downloading the Dataset

I find this dataset on the Kaggle from this link https://www.kaggle.com/lava18/google-play-store-apps This is an interesting dataset to explore what type of Applications are on the Google Play Store. And Now we explore this data in many ways given below

Let’s begin by downloading the data, and listing the files within the dataset.

The dataset has been downloaded and extracted.

Data Preparation and Cleaning

So, in this step first we load our dataset through pandas And next working with dataset through Numpy we is usually used for numerical operations like how many rows or columns in the dataset , handle missing , incorrect and invalid data and many more functions So, Let's prepare and clean our data

How many columns are in there in our Dataset we know this through .column method

The .shape method tells us also about how many rows and columns is there in the dataset

The .info method gives us some information about dataset

So, Now we can clean our data right now to know that is there any null or missing value in the columns. So basically here we can check is there any null or missing value

Now we know that there are the missing or null values in the dataset So, now we can drop those rows where is the null or missing values

Again is there we are going right with drooping the rows . Like checking is there any other missing value or null value in the dataset

After the drooping of rows where is the missing or null values. Now , checking how many rows and columns we have in the dataset

There is a Genres column in our dataset . Let's checking how many apps of Each Genres in our dataset

using .head method we check the first 10 rows of the Genres column

using .tail method we check the first 10 rows of the Genres column

Many genres contain only few record, it may make a bias. Then, I decide to group it to bigger genre by ignore sub-genre (after “ ; “ sign)

Now again checking how many types of Genres in the dataset

We can group those Genres where is the Small number. Now there is a music and Audio Genres which is the only 1 number. Let's replace this with other Genre Music

Now we have an other column in our dataset which is Last Updated . It's datatype is object but we know that the data is in this column is in the Date format. Let's work with this column to change the datatype from object to Date

Now this is perfectly changed to Date time Datatype. So, now we can clean our dataset much that is good for Analysis.

Thanks Everyone to read this story. Hopefully you learn something from this story .
Okay Bye Bye. Meet all of you in the next part of Data Analysis of Google Play Store Apps where we discuss about Exploratory Analysis and Visualization of Data .

--

--