Microsoft Windows Store apps(Data analysis using Orange GUI)

Microsoft Store

Objective:Analyzing the dataset of apps in Microsoft Windows store and extracting information through plots.

Description:Microsoft Windows Store ,digital distribution platform owned by Microsoft started as app store for Windows 8 and Windows Server 2012 as primary means of distributing Universal Windows Platform apps.With Windows 10, Microsoft merged its other distribution platforms (Windows Marketplace , Windows Phone Store , Xbox music , Xbox video, Xbox store , and a web storefront also known as “Microsoft Store”) into Microsoft Store, making it a unified distribution point for apps, console games , and digital videos . Digital music was included until the end of 2017, and E-books included until 2019. Some content is available free of charge from the store.

In 2015, over 669,000 apps were available on the store. Categories containing the largest number of apps are “Books and Reference”, “Education”, “Entertainment”, and “Games”. The majority of the app developers have one app.

As with other similar platforms, such as the Google Play and Mac App Store , Microsoft Store is curated, and apps must be certified for compatibility and content. In addition to the user-facing Microsoft Store client, the store has a developer portal with which developers can interact. Microsoft takes 30% of the sale price for apps. Prior to January 1, 2015, this cut was reduced to 20% after the developer’s profits reached $25,000

Hypothesis:

Free apps providing good functionality should expect a decent rating on Microsoft store.

Understanding the Project:

After reading an individual will be able to know :

  1. How much average rating his app will get (under books category ) by the number of people voted in Microsoft Store.
  2. Which are the apps rated by most of the people and which apps have got high rating.
  3. How old is the app by looking at the Date when it was posted and how many apps were posted in all different years.

Acknowledgements:

Myself Shardul Patil and my team member Hritik AgraHari created this notebook/blog as part of the course work under “Pandas, bamboolib & Orange workshop” at Suven, under mentor-ship of Rocky Jagtiani .

Suven to https://datascience.suvenconsultants.com

Rocky Jagtiani to https://www.linkedin.com/in/rocky-jagtiani-3b390649/

Dataset:

This data consists of apps in Microsoft windows Store with their corresponding necessary details .It has following variables:

  1. Name: Name of the app
  2. Rating: Rating for the app
  3. No of People Rated : No of people who rated the app
  4. Category : Category of the app.
  5. Date. : Date when it is posted.
  6. Price. : Price of the app
Windows Store dataset

We have taken 50 observations (no of rows) from which we are extract information through exploratory data analysis and visualization.

Features- 2 Categorical , 3 numeric

Meta attributes- 1 Text

Some facts:

  1. App with most number of people voted: Crafting Planner(992 votes)
  2. App with least rating: IPC Sections(rating is 1)
  3. Weight average rating of all apps : 3.545519 (calculated by formula of weighted mean)

Histograms:

Rating : The following histogram shows frequencies of all different ratings of apps as mentioned in the dataset.

From this we can understand that highest number of apps have rating as 4 (frequency:12)and only 1 app has rating as 1,1.5 and 2.

People can choose the apps which have good rating and leave the apps with poor rating.

No of people rated:The following histogram shows frequencies of all different amounts of people who have rated as mentioned in the dataset.

From this we can understand that many of the apps have around 680–700 people giving their ratings.

People prefer mainly those apps on which more people have rated as more of them have used those apps.

Boxplots:

Boxplot of Rating

Through this Boxplot of rating we can understand that mean rating of apps is 3.59184, standard deviation is 0.7605,2nd quartile or median is 3.5,1st quartile is 3 and 3rd quartile is 4.

People can see that the app which they are using is above , below or equal to average rating of all apps.

Boxplot of no of people rated

Through this Boxplot we can understand that mean of the no of people rated is 533.76, standard deviation is 238.7,2nd quartile or median is 550,1st quartile is 355 and 3rd quartile is 692.

A person can check whether the app he is using is being rated by many people or very few people.

Mosaic Plot:

From the Mosaic table we can see when maximum number of people give rating in free app then the rating lies between 3.75 and 4.25 and highest rating we get when 541.5–689.5 people give their rating(Highlighted in green).

Pie Charts:

Pie chart of Rating

The following Pie chart shows distribution of number of apps having different ratings.4,3.5 and 3 carries highest weightage whereas 1 and 2 carries least weightage.

People can choose the apps which have good rating and leave the apps with poor rating.

Pie chart of No of people voted

The following Pie chart shows distribution of the number apps on basis of number of people voted.The range 600–700 has highest weightage whereas 100–200 and 200–300 has the lowest weightage.

People prefer mainly those apps on which more people have rated as more of them have used those apps.

Correlations :

Correlations determine the relation between the variables.Positive correlation shows that the variables are directly proportional to each other(one quantity increases ,the other quantity also increases) whereas Negative correlation shows that the variables are inversely proportional to each other(one quantity increases ,the other quantity decreases). In this case , all the possible correlations are negative correlation .The highest negative correlation is between No of people Rated and Rating ,ie, -0.136.The higher the correlation between is the scatterplot.

Scatterplots:

Scatterplots were formed using 70% of the data (sampling) rather than using the entire dataset.

Rating vs No of people Rated

A scatterplot was formed between Rating and No of people Rated as the negative correlation between them was the highest among all the variables to check the outliers .We can understand that most of the apps have a rating of more than 3.Most of the apps have around 400-700 people as number of people rated.

Rating vs Date(color on basis of no of people rated)

A scatterplot was formed between Rating and Date (color on basis of no of people rated).Most of the apps were posted between 2014–2016. All the color groups are not together ,the data is not grouped properly.

Rating vs No of people Rated(color on basis of Date)

A scatterplot was formed between Rating and No of people Rated (color on basis of Date).All the color groups are not together ,the data is not grouped properly.

Pivot table:

Through the pivot table of Price vs Rating ,we can understand that 12 free apps being highest in number have a rating of 4 ,followed by 11 having rating of 3,5 and 4.5 . Only 1 free app has a rating of 1 and 2.

Through the pivot table of Category vs Rating ,we can understand that 12 Books being highest in number have a rating of 4 ,followed by 11 having rating of 3,5 and 4.5 . Only 1 Book has a rating of 1 and 2.

Classification Tree:

classification tree with rating as target

The classification Tree has Rating as target .It has the following parameters:

  1. It is a induced binary tree
  2. Min no of instances in leaves : 2
  3. Do not split subsets more than :5
  4. Limit the maximal tree depth to : 100

Classification stops when majority reaches 95%

People can get a detailed analysis on how apps are classified if rating is a parameter . If they looking for an app with a particular rating , then this is best way to search.

Classification Tree with No of people Voted as target

The classification Tree has No of People Voted as target .It has the following parameters:

  1. It is a induced binary tree
  2. Min no of instances in leaves : 2
  3. Do not split subsets more than :5
  4. Limit the maximal tree depth to : 100

Classification stops when majority reaches 95%.

People can get a detailed analysis on how apps are classified if no of people voted is a parameter . If they looking for an app on basis of number of people voting it , then this is best way to search.

Vote of Thanks :

I would like to humbly and sincerely thank my mentor Rocky Jagtiani. He is more of a friend to me then mentor .The data Analytics taught by him and various assignments we did and are still doing is the best way to learn and skill in Data Science field.

Recommended https://datascience.suvenconsultants.com/

--

--

Shardul Patil
Microsoft Windows Store apps(Data analysis using Orange GUI)

Studying Computer Science at National Institute of Technology Silchar ,Data Science Enthusiast