Apple Store Application Analysis (SQL Project)

Taofeek Salaudeen
5 min readNov 27, 2023

The App Store is a digital storefront that allows users to download apps, games, music, books, and podcasts for their Apple devices. It is the default app store for iOS, iPadOS, macOS, watchOS, and tvOS.

The App Store was launched in 2008 with 552 apps. As of October 2023, it has over 2 million apps available for download, with over 100 billion downloads to date. The App Store is the largest and most profitable app store in the world, generating over $64 billion in revenue in 2022.

The App Store is organized into different categories, such as Games, Entertainment, Education, Business, and Productivity. Users can browse the App Store by category or use the search bar to find specific apps.

Overall, the App Store is a valuable resource for Apple users. It provides a convenient way to find and download apps, and it helps to ensure that users only have access to high-quality apps.

The dataset contains information about various apps in different categories, such as Games, Productivity, Weather, Shopping, and more. We will discuss some interesting insights and trends from the dataset.

Click link to view the Dataset

Data Structure

The dataset has the following columns:

  • id: A unique identifier for each app
  • track_name: The name of the app
  • size_bytes: The size of the app in bytes
  • currency: The currency used in the app (e.g., USD, EUR, JPY)
  • price: The price of the app
  • rating_count_tot: The total number of ratings for the app
  • rating_count_ver: The number of verified ratings for the app
  • user_rating: The average rating given by users for the app
  • user_rating_ver: The average rating given by verified users for the app
  • ver: The version of the app
  • cont_rating: The number of contributing devices for the app
  • prime_genre: The primary genre of the app (e.g., Games, Productivity, Weather)
  • sup_devices_num: The number of devices the app is compatible with (e.g., iPhone, iPad, iPod Touch)
  • ipadSc_urls_num: The number of iPad screen URLs for the app
  • lang_num: The number of languages supported by the app
  • vpp_lic: The App Store license type (e.g., Free, Paid, Subscription)

The tool used for analysis

SQLiteOnline: The SQLiteOnline is a web-based tool that allows users to create, edit, and manage SQLite databases online.

  1. Exploratory Data Analysis (EDA): is the process of analyzing and summarizing a dataset to gain insights into its characteristics and patterns. EDA is an important step in the data analysis process as it helps to identify trends, patterns, and relationships in the data.

This SQL query is used to check the total number of the apps in the dataset which is 7197.

This code above shows the number of Unique App in the Dataset.

2. Data Cleaning: This involves checking for missing values, duplicates, and outliers in the dataset. It is important to ensure that the data is clean and ready for analysis. it can be deduced that the ther are not Null, 0 or missing values in the dataset.

The Dataset has 0 Missing Values.

3. Descriptive Statistics: This involves calculating summary statistics such as mean, median, mode, standard deviation, and range for each column in the dataset. This can help to identify the distribution of the data and any outliers.

By calculating number of Apps per genre in the dataset, we can gain insights into the distribution of the data and identify any patterns or trends that may be useful for further analysis.

Finding out the number of applications per genre

Number of Apps per Genre

Get an Over view of App Rating.

4. Correlation Analysis: This involves calculating the correlation between Free and Paid Apps in the dataset. Correlation analysis can help to identify relationships between variables and can be useful in predicting future trends.

This is to check the correlation between free and paid apps with the use of rating by checking average rating for both free and paid apps.

it can be deduced from the query result that free apps have 3.39 average user rating while paid apps have 3.74 average user rating.

5. Correlation Analysis: This involves calculating the correlation between Apps that support more languages and others that support fewer languages in the dataset. Correlation analysis can help to identify relationships between variables and can be useful in predicting future trends.

From that result we got it can be said that apps that support more languages have a better popularity and user rating, 10–30 languages have great impression on users and can attract more user for tha app.

Final Recommendation
1. Paid apps generally achieve higher rating than free apps, apps users perceived more value when they pay for apps and have higher engagement of the app which led better user rating.
2. Language support: apps that support high number of languages from like 20 above have better rating or focus on key languages tends to have higher engagement and user rating.
3. Finance and books apps have lower rating: maybe the user needs are not fully met.
4. The length of the app description is a positive correlation with user rating, users likely appreciate having a clear understanding of the app features and capabilities before they download.
5. A new app should aim an average of 3.5 and above to be able to stand out in the market competition.
6. The game and entertainment have a very high volume of apps which make it very competitive, so entering the spaces might be challenging due to high competition, however high demand also matters.

--

--