Data Analysis of Google Play Store Apps(Part 1)
Data Preprocessing and Cleaning
This project is based on the dataset of Google play store apps. Like how many apps are in the dataset and what type of apps like their category , ratings, reviews of the apps, their size, how many times these apps will be installed and also is there apps are free or paid. So this is all about dataset that what it contains. And now in this analysis i am doing many working with this data like finding what is in this dataset though Python Libraries(numpy, pandas, matplotlib, seaborn). And also doing visualizations of data through many kinds of graphs and plots , etc. So, let’s start working
Downloading the Dataset
I find this dataset on the Kaggle from this link https://www.kaggle.com/lava18/google-play-store-apps This is an interesting dataset to explore what type of Applications are on the Google Play Store. And Now we explore this data in many ways given below
Let’s begin by downloading the data, and listing the files within the dataset.
The dataset has been downloaded and extracted.
Data Preparation and Cleaning
So, in this step first we load our dataset through pandas
And next working with dataset through Numpy
we is usually used for numerical operations like how many rows or columns in the dataset , handle missing , incorrect and invalid data and many more functions So, Let's prepare and clean our data
How many columns are in there in our Dataset we know this through .column
method
The .shape
method tells us also about how many rows and columns is there in the dataset
The .info
method gives us some information about dataset
So, Now we can clean our data right now to know that is there any null or missing value in the columns. So basically here we can check is there any null or missing value
Now we know that there are the missing or null values in the dataset So, now we can drop those rows where is the null or missing values
Again is there we are going right with drooping the rows
. Like checking is there any other missing value or null value in the dataset
After the drooping of rows where is the missing or null values. Now , checking how many rows and columns we have in the dataset
There is a Genres
column in our dataset . Let's checking how many apps of Each Genres in our dataset
using .head
method we check the first 10 rows of the Genres
column
using .tail
method we check the first 10 rows of the Genres
column
Many genres contain only few record, it may make a bias. Then, I decide to group it to bigger genre by ignore sub-genre (after “ ; “ sign)
Now again checking how many types of Genres
in the dataset
We can group those Genres
where is the Small number. Now there is a music and Audio Genres which is the only 1 number. Let's replace this with other Genre Music
Now we have an other column in our dataset which is Last Updated
. It's datatype is object
but we know that the data is in this column is in the Date
format. Let's work with this column to change the datatype from object
to Date
Now this is perfectly changed to Date time
Datatype. So, now we can clean our dataset much that is good for Analysis.
Thanks Everyone to read this story. Hopefully you learn something from this story .
Okay Bye Bye. Meet all of you in the next part of Data Analysis of Google Play Store Apps where we discuss about Exploratory Analysis and Visualization of Data .