Product Sales Analysis and Visualisations using Python.

Emmanuel Ochieng
10 min readNov 25, 2022

Libraries used [Pandas, Numpy, Matplotlib, Seaborn]

You can find the whole Python project on my github here

Introduction

Every modern company that engages in online sales or maintains a specialized e-commerce website now aims to maximize its throughput in order to determine what precisely their clients need in order to increase their chances of sales.

What is Sales Analysis?

For each product sold by your business, it is recommended that you perform a product sales analysis to compare the profit contribution of different products. Product sales analysis is a judgment on the market performance of a product.

How often should you do Sales Analysis?

That’s what we’re going to share here.

As you’ll see, Product sales data analysis provides a wealth of intelligence about your Products sales strategy, the performance of your team, and much more. It’s a competitive advantage you can’t afford to miss out on. So let’s get started with the basics.

Image source(link)

Purpose of Analysis

The purpose of a product analysis report can be broadly broken down into three major facets:

1. Internal Analysis: which focuses on how you, as the business, can better improve, tweak and market your product.

2. External Analysis: which focuses on your potential customers, analysing how you can convince them that your product is worth buying, and why they should choose it over a similar competitor’s product.

3. Cost Analysis: which focuses on the end-to-end costs involved from manufacturing to sale — allowing you to analyse where you can potentially cut costs while still maintaining the quality of your product.

source (link)

Things To Consider before doing a Sales Analysis

i.) Understanding the Business Model

Business model refers to a company’s plan for making a profit. It identifies the products or services the business plans to sell, its identified target market, and any anticipated expenses.

image source (BMI Lab)

ii.) Problem we are trying to solve (Problem Analysis)

Problem analysis is the process of understanding real-world problems and user’s needs and proposing solutions to meet those needs. The goal of problem analysis is to gain a better understanding of the problem being solved before developing a solution.

Some important suggestions for creating problem trees

•Involve stakeholders who can contribute relevant technical and local knowledge

•Complete several problem tree exercises with different stakeholder groups, to help determine different perspectives and differing priorities

•Recognize that the process is as important as the product. The exercise should be presented as a learning experience for all those involved, and as an opportunity for different views and interests to be presented and discussed. However, don’t expect from all stakeholders complete agreement about the problems and their relative importance

•Recognize that the product (the problem tree diagram) should provide a simplified but nevertheless robust version of reality

  • Aim for simplicity. If the exercise is too complicated, it is likely to be less useful in providing direction to subsequent steps in the analysis
image source (Toyota)

iii.) How it is and how is it going to be consumed by the consumer?

Understanding how consumers will use the output of your model will allow you to create features targeted to them. For example, are you building models that serve internal users and influence company strategy, or are you building models that are customer-facing.

iv.) The economic impact of this project

It is essential in the permitting process to show decision makers the benefits a project will have on a product (e.g., revenue increase , sales etc.). Alternatively, the report may be used to illustrate the economic impact on the company if a product was to be done away with.

image source: (ufi.org)

v.) What type of decisions will our data drive?

Data-driven decision-making (sometimes abbreviated as DDDM) is the process of using data to inform your decision-making process and validate a course of action before committing to it.

image source: (png)

vi.) Target in mind to quantify success of the project

Measuring the success of a project once it’s brought to completion is a valuable practice. It provides a learning opportunity for future undertakings, and, the opportunity to assess the true effectiveness of the project. In order to have a holistic view, objective and subjective criteria need to be considered.

image source (link)

Overview

This dataset gives us electronics sales data at Amazon. It contains user ratings for various electronics items sold, along with category of each item and time of sell. Link to Data Set(here)

We will use Python libraries (Pandas, Numpy, Matplotlib & Seaborn) to analyse and answer business questions for sales data. The data contains hundreds of thousands of electronics store purchases broken down by month, product type, cost, purchase address, etc. The dataset can be downloaded here. In this analysis, we will be using Jupyter Notebook.

STEP 1:

Exploratory Data Analysis [EDA]

This is the process by which we shall critically perform initial investigations of the data we have to discover patterns, to spot anomalies, test hypothesis and to check assumptions with the help of summary statistics and graphical representations.

It is is how we get to understand the data we have and gather many insights from it. It is more of making sense of the data we have before working with it.

# To start with, we import libraries we are going to use. In this case being Pandas, Numpy, Matplotlib and Seaborn.

# Then we load the dataset as shown below.

To take a look of the first five rows we use the pandas function “ .head()”. Similarly “.tail()” returns last five observations of the data set.

Last five rows

To know the total number of rows and columns in the data set we use “.shape” as shown below.

Data Shape

Dataset comprises of 1,292,954 observations and 10 characteristics.

Out of which one is dependent variable and rest 9 are independent variables.

It is also a good practice to know the columns and their corresponding data types, along with finding whether they contain null values or not.

Dataset Info

No Variable column has null/missing values

We can see that the dataset contains 5 columns and 10000 rows.

# The columns are as follows:

1. User ID

2. Product ID

3. Rating

4. Timestamp

5. Category

# The data types of the columns are as follows:

1. User ID — int64

2. Product ID — object

3. Rating — int64

4. Timestamp — int64

5. Category — object

We can see that the columns User ID and Rating are of int64 data type, while the columns Product ID and Category are of object data type there are no null values in the dataset. The column Timestamp is of int64 data type, but it is actually a timestamp. We can convert it to a timestamp using the following code:

The column Product ID is of object data type, but it is actually a string, the column Category is of object data type, but it is actually a string.

# We can convert them to strings using the following codes below:

The column Rating is of int64 data type, but it is actually a float, User ID is of int64 data type, but it is actually a string.

# We can convert them to strings using the following codes:

To get a better understanding of the dataset, we can also see the statistical summary of the dataset using the function “.describe()”. This includes count, mean, median (or 50th percentile) standard variation, min-max, and percentile values of columns as shown below.

Statistical Summary

The statistical summary of the dataset gives us the following information:

# the statistical summary of the dataset gives us the following information:

1. The mean rating is 4.

2. The minimum rating is 1.

3. The maximum rating is 5.

4. The standard deviation of the ratings is 1.4.

5. The 25th percentile of the ratings is 4.

6. The 50th percentile of the ratings is 5.

7. The 75th percentile of the ratings is 5.

We can also see the number of unique users and items in the dataset.

Number of Unique Users and Items

Dealing With Missing Values

There can be multiple reasons why certain values are missing from the data.

Reasons for the missing data from the dataset affect the approach of handling missing data. So it’s necessary to understand why the data could be missing.

Some of the reasons are listed below:

Past data might get corrupted due to improper maintenance.

Observations are not recorded for certain fields due to some reasons. There might be a failure in recording the values due to human error.

The user has not provided the values intentionally.

checking sum of Null Values

Finding Answers with the Data Using Visualizations

To make it easier for understanding, we are going to use matplotlib and seaborn that we earlier imported to visualize our results with simple bar charts. This will make it easier to answer questions that might arise from the data set.

i.) What was the best year of sales?

From the graph we just plotted we can see that year 2015 had the best sales out of all years. There was an increase steady increase of sales from the year 2007 to 2015 then a slight decline in 2016. That decline in sales was big in the following years of 2017 and 2018.

ii.) Which was the best month for sales

Best month for sales

The first month of the year (January) was when most sales were made across the product categories.

iii.) What brand sold the most in the highest selling year(2015)

Mpow was the brand with the most sales in 2015 followed by Bose

iv.) What products Sold the most in the last three years 2016, 2017 & 2018

There has been one consistent Brand products with the most sales in the last 3 years and it is Bose. The second most sold brands products have been Logitech which was later overtaken by Mpow in 2018.

  • 2016 (Bose and Logitech)
  • 2017 (Bose and Logitech)
  • 2018 (Bose and Mpow)
Brands with the most sales in 2016
Brands with the most sales in 2017
Brands with the most sales in 2018

v.) What product by category sold the most?

We can see that the category of Headphones sold the most, computers and accesories were sold the second most while camera & photo sold the third most followed by Accesories and supplies.

Product by Category that sold the most.

vi.)What product by category sold the least?

We can see that the category of Security and Surveillance sold the least followed closely by Computers & Accesories.

Product by Category that sold the least

vii.) What product by brand name sold the least?

Koolertron sold the least followed closely with DURAGADGET as shown below.

Product by brand name sold the least

viii.) Ratings Distribution

Most Products were rated 5

ix.) Best rated brands

Best brands by rating

Plemo and Savage were the brands with the highest ratings.

Conclusion

  • 2015 was the best year in terms of sales and profit
  • Headphones was the category with most sales followed closely with Computer and Accessories while the least sales were made in the Category Security & Surveillance.
  • There has been a steady rise in sales from 2007 to 2015 and a sharp decline from 2016 to 2018.
  • The brand name Bose sold the most followed by Logitech.
  • The brand name of Koolertron sold the least followed by DURAGADGET.
  • Most products were rated 5.
  • Best rated brand was Plemo.

--

--