Empowering Retail Decision-Making: Harnessing Data Analytics for Contoso’s Sales Strategy Success.

Edwin Mbugua
10 min readApr 7, 2023

--

Power BI Dashboard

The objective of this task was to help a retail store make a data driven sales strategy based on it’s sales records over the years.

About the dataset

The Contoso BI Demo dataset is used to demonstrate DW/BI functionalities across the entire Microsoft Office product family. This dataset includes C-level, sales/marketing, IT, and common finance scenarios for the retail industry and support map integration. In addition, this dataset offers large volumes of transactions from OLTP and well-structured aggregations from OLAP, along with reference and dimension data.

Problem Definition
How well can we improve the sales in determining which products get recommended to customers given their profile and purchasing behavior?

The features used for the analysis included :

contoso_cleaned.columns

['CustomerKey', 'FullName', 'Gender', 'BirthDate', 'DateFirstPurchase',
'Age', 'YearlyIncome', 'Occupation', 'Education', 'MaritalStatus',
'TotalChildren', 'NumberChildrenAtHome', 'HouseOwnerFlag',
'NumberCarsOwned', 'CustomerType', 'FullDateLabel',
'CalendarDayOfWeekLabel', 'FiscalYear', 'FiscalYear.1',
'FiscalHalfYearLabel', 'FiscalQuarterLabel', 'CalendarMonthLabel',
'IsWorkDay', 'EuropeSeason', 'NorthAmericaSeason', 'AsiaSeason',
'ChannelName', 'SalesKey', 'SalesQuantity', 'UnitCost', 'UnitPrice',
'DiscountQuantity', 'DiscountAmount', 'SalesAmount',
'SalesProfitAmount', 'ProductName', 'ClassName', 'BrandName',
'StoreType', 'StoreName', 'Status', 'OnlineDiscountAmount',
'OnlineUnitCost', 'OnlineUnitPrice', 'OnlineTotalCost',
'OnlineSalesAmount', 'OnlineProfit', 'PromotionName', 'PromotionType',
'PromotionCategory', 'PromoDiscout%', 'PromoStartDate', 'PromoEndDate',
'PeriodBeforePromo(months)', 'PromoPeriod(months)', 'StateprovinceName',
'RegionCountryName', 'CustomerIsParent', 'CustomerFamilySize',
'CustomerFamilySizeAtHome', 'TotalCumulativeSales',
'TotalCumulativeProfit'],
dtype='object'

Database Entity Relationship Diagram

ER Diagram

Handling Missing Values

We don’t have missing data reason being we only obtained none null values from the database based on our query. Earlier we had handled missing values by dropping customers who there were no names and also where there were NAN values in numerical columns we replaced them with 0.

Database Query

Data Cleaning

This is an important step in data analysis and correct data formats need to be well put. In our case the date columns and Age needed to be properly set.

#Changing the date types to their correct data type
consoto_data[['FullDateLabel','PromoStartDate','PromoEndDate','BirthDate','DateFirstPurchase']]=consoto_data[['FullDateLabel','PromoStartDate','PromoEndDate','BirthDate','DateFirstPurchase']].apply(pd.to_datetime,format='%Y-%m-%d')
#Change Age to integer data type
consoto_data['Age']=consoto_data['Age'].apply(np.int64)
consoto_data['NumberChildrenAtHome']=consoto_data['NumberChildrenAtHome'].apply(np.int64)
print('Education Levels \n',consoto_data['Education'].value_counts())

Education Levels
High School 27603
Partial College 27127
Bachelors 20962
Graduate Degree 14288
Partial High School 10020
Name: Education, dtype: int64

Also in education we needed to compress the levels into 3 categories namely: High School ,Undergraduate and Post Graduate.

Partial High School :The students in the selective classes participate in separate English, mathematics and science classes. They generally join the non-selective students (High School)for classes in other subjects.

Bachelors A bachelor’s degree is typically a four-year degree program that is earned at the undergraduate level.

Graduate Degree Graduate education encompasses research, study and teaching beyond the bachelor’s degree. While undergraduate education leads to a bachelor’s degree, graduate education leads to master’s degree and doctorate, also called a doctoral degree.

Partial College The term college generally refers to an institution that offers only undergraduate degree programs.

Feature Engineering

#Finding Customers who are Parents
consoto_data['CustomerIsParent']=consoto_data['TotalChildren'].replace({0:0,1:1,2:1,3:1,4:1,5:1})

#Calculating the family size and the actual family size at home
consoto_data['CustomerFamilySize']=consoto_data['MaritalStatus'].replace({'M':2,'S':1})+consoto_data['TotalChildren']
consoto_data['CustomerFamilySizeAtHome']=consoto_data['MaritalStatus'].replace({'M':2,'S':1})+consoto_data['NumberChildrenAtHome']

#Calculating cumulative total Sales (Total Sales + OnlineTotalSales)
consoto_data['TotalCumulativeSales']=consoto_data['SalesAmount']+consoto_data['OnlineSalesAmount']
#Calculate cumulative total Profit
consoto_data['TotalCumulativeProfit']=consoto_data['SalesProfitAmount']+consoto_data['OnlineProfit']
  1. Creating an additional column in finding the number of customers who are parents and those who are not.Based on if a customer has a child or not.
  2. Creating an additional column to obtain the family size of the customer which will be getting the total number of children a customer has adding to their marital status (Married =2,Single =1).
  3. Creating another column obtaining the cumulative sales from both instore and online store.
  4. Creating another column obtaining the cumulative profit from the sales profit added to the online sales profit.

Checking for Outliers

We shall be looking at the outliers from the customer’s Age and Income and remove them as trying to remove outliers in all columns we might miss some key customer data points.

Checking for outliers from customer’s Income
Checking for outliers from customer’s Age
#Remove outliers
consoto_income=consoto_data["YearlyIncome"]<130000
consoto_age= consoto_data["Age"]<70
consoto_data=consoto_data[consoto_income & consoto_age]

After removing the outliers looking at the income distribution of the customers we find the minimum yearly income is $10,000 while the maximum yearly income is $120,000.The average yearly income is $46701.49.

When we look at the age distribution we find that 25% of them are of the age around 33 years while 75% of them around 48 years. The average age is around 42 years. The oldest customers are of the age of 69 years while the youngest are of the age of 23 years.

Handling Duplicate Field Entries

#Removing Duplicates from Germany 
consoto_data['RegionCountryName']=consoto_data['RegionCountryName'].replace({'Germany ':'Germany'})
#View Customer's Region Country Name
print('Customer Region Country Names\n',consoto_data['RegionCountryName'].value_counts().sort_values(ascending=False))
#View Customer's Region State province Name
print('\n Customer Region distribution by State Province Name \n','\n',consoto_data.groupby(['RegionCountryName'])['StateprovinceName'].value_counts().sort_values(ascending=False))
  • Looking at the sales dataset by region name we find that a majority of the customers come from United states followed united kingdom then Australia, Canada and Germany respectively.
  • Breaking it down by customer’s region state province name ,a majority of the customers from united states bought from Washington and also Texas who were the least from the listed states. Customers from England followed followed by those from New South Wales, then British Columbia, North Rhine-Westphalia Alberta and Bavaria.

Exploratory Data Analysis

Comparing the Income and Parental Status with Education Level

contoso_cleaned.groupby(['CustomerIsParent','Education'])['YearlyIncome'].sum().sort_values(ascending=False)

CustomerIsParent Education
1 Undergraduate 1.858040e+09
High School 1.545460e+09
0 Undergraduate 3.040000e+08
1 Postgraduate 2.984700e+08
0 Postgraduate 2.934500e+08
High School 2.830700e+08

contoso_cleaned[['CustomerIsParent','Education']].value_counts().sort_values(ascending=False)

CustomerIsParent Education
1 Undergraduate 37928
High School 25722
0 High School 11899
Undergraduate 8661
1 Postgraduate 8167
0 Postgraduate 5746
dtype: int64
  • When we compare the yearly customer income by parental status and education level we find that those with the highest yearly income are customers who are not parents and their education level is undergraduate .Followed by those who are parents with postgraduate, then those who are not parents with the same education level.
  • Looking at customers who are parents and are of the high school education level earn the least followed who are parents and have undergraduate education level.
  • Customers who are parents with an Undergraduate education level are the majority followed by those who are parents and have high school education level.Then customers who are not parents with high school and undergraduate education level respectively.
  • Customers who have post graduate education level are the minority especially those who are not parents.

Comparing the Marital and Parental Status with Yearly Income

contoso_cleaned.groupby(['MaritalStatus','CustomerIsParent'])['YearlyIncome'].sum().sort_values(ascending=False)

MaritalStatus CustomerIsParent
M 1 2.393510e+09
S 1 1.308460e+09
0 4.992600e+08
M 0 3.812600e+08
Name: YearlyIncome, dtype: float64

contoso_cleaned[['MaritalStatus','CustomerIsParent']].value_counts().sort_values(ascending=False)

MaritalStatus CustomerIsParent
M 1 43084
S 1 28733
0 15466
M 0 10840
dtype: int64

Size of the family in relation to Income and Education Levels

contoso_cleaned.groupby(['CustomerFamilySize','Education'])['YearlyIncome'].sum().sort_values(ascending=False)

CustomerFamilySize Education
3 Undergraduate 656970000.0
4 High School 624650000.0
5 High School 487780000.0
Undergraduate 406090000.0
4 Undergraduate 321440000.0
2 Undergraduate 320640000.0
6 Undergraduate 219030000.0
1 High School 211620000.0
2 Postgraduate 180780000.0
6 High School 171160000.0
1 Undergraduate 146110000.0
Postgraduate 141530000.0
7 High School 136900000.0
3 High School 104140000.0
2 High School 92280000.0
7 Undergraduate 91760000.0
3 Postgraduate 83980000.0
4 Postgraduate 67540000.0
5 Postgraduate 52970000.0
7 Postgraduate 36440000.0
6 Postgraduate 28680000.0
Name: YearlyIncome, dtype: float64
  • When looking at the top 3 yearly income total we find customer’s family size and education in relation to their income we find that those with a family size of 3 and with an undergraduate education level have the highest yearly income followed by those with a family size of 4 having high school education level followed by those with a family size of 5 with the same education level.
  • The largest family size based on customer data is 7 and the least yearly income come from customers with post graduate education level and with a family size 6.
  • Customers with a family size of 3 and have an education level of undergraduate are the majority followed by those with a family size of 2 with an undergraduate level ,then those with a family size of 1 with a high school education followed by those with a family size of 4 with the same education level ,then those with a family size of 5 with high school education level.
  • We find that the customer’s who have the highest yearly income have a small family size of 3,those who have the least yearly Income have a large family size of 6

Comparing the Sales Per Customer and Parental Status with Education Level

  • Comparing customer’s purchasing behaviour we find that customers who have high school education level spend more and than those who are married with the same education level.
  • This is followed by customers who have undergraduate education level and are single spend slightly more than those married.
  • Looking at customers with a postgraduate education we find that those who are married spend more than those who are single.
  • Looking at the cumulative sales all customers had cumulative purchases between 4000 and 5500.
  • Comparing customer purchases a majority of the customers are married and have an undergraduate education followed by those who are single and have high school education level then those who are single with undergraduate education closely followed by those who are married with high school education level.
  • The least number of customers are those who have a post graduate education level with those who are single being few than those married.

Comparing customer’s Cumulative Sales with Marital Status and Family Size At Home.

contoso_cleaned.groupby(['MaritalStatus','CustomerFamilySize'])['TotalCumulativeSales'].sum().sort_values(ascending=Fa

MaritalStatus CustomerFamilySize
M 3 8.781109e+07
S 1 7.271409e+07
M 2 5.452840e+07
5 5.339423e+07
S 3 4.650854e+07
2 4.241539e+07
4 4.109075e+07
M 4 3.002006e+07
6 1.998873e+07
7 1.494718e+07
S 6 1.254128e+07
5 9.712881e+06

contoso_cleaned[["Age","CustomerFamilySize","Education"]].value_counts().sort_values(ascending=False)[:10]

Age CustomerFamilySize Education
46 4 High School 2511
31 3 Undergraduate 1981
36 3 Undergraduate 1910
31 1 High School 1850
35 2 Postgraduate 1828
36 2 Undergraduate 1679
45 3 Undergraduate 1661
37 3 Undergraduate 1307
32 2 Undergraduate 1290
42 3 Undergraduate 1270
dtype: int64
  • Looking at the cumulative yearly sales ,customers who are single with a family size of 5 spend the most followed by those who are married with a family size of 3 ,then those who are single and with a family size of 1.
  • Looking at the customer’s family size and marital status in regards to their purchase we find that customers who are single with a family size of 6 spend the least, followed by those who are married with a family size of 7 and 6 respectively.
  • A majority of the customers have a high school education and are 46 years with a family size of 4.While the minority of them have an undergraduate education and are 42 years.
  • Customers who have post graduate education have a family size of 2 and are 35 years.

Customer Purchasing Behavior

  • The most selling product at the store is Fabrikam SLR Camera 35" X58 Blue followed by touch screen phones 4 wire/on wall M302 Grey then a budget movie maker from Fabrikam . The top 10 most selling products comprises of camera and phones together with their accessories.
  • Looking at the list of the 10 most frequent customers based on their age groups young adults spend more and form 70% of the list followed by middle aged adults.
  • When we look at the sales based on customer’s occupation we find that customers whose occupation is manual work spend more by purchasing Fabrikam cameras and have a family size of 1. Customer’s whose occupation is on a management level spend the least are middle aged and have a family size of 6 .

Sales Distribution Across the years

Customer Monthly Sales Distribution
  • Looking at the sales across the years we find that 2009 had the highest cumulative sales followed by 2008 then 2007.
  • In 2007 more sales happened during the weekend than week day which is different for 2008 and 2009 as most sales took place during the weekday.
  • Comparing the sales on weekends we find that 2007 had the highest while 2008 had the least sales ,while looking at the weekdays we find that 2009 had the highest while the still in the same year weekends sales were the least over the past two years.

Summary

  1. Most of customers are from Germany followed by United States then United kingdom. Despite United Kingdom having the least number customers they spend more than those from United States with Germany leading in sales.
  2. Looking at the sales overview we majority of the sales came from customers who are young adults with a family size of 2 and their prefer shopping online.
  3. In the year 2007 the retail store recorded the most sales at 320.14M with a profit of 197.56M followed by 2009 with total sales of 99.02M and a total profit 62.11M and then 2008 with total sales of 66.51M and a profit of 41.40M.

Recommendation

  1. Customers who are young adults purchase more in February, June and July especially on Monday’s. Stock more of Fabrikam SLR Camera 35" X58 during this period.
  2. Introduce European Holiday Promotion to other towns in Germany as a growth strategy targeting young adults and middle aged customers since it results in more sales.
  3. Increase the number of states in the North America Spring promotion with a focus on physical store and reseller store visit especially for the Old-Adults especially on weekends.
  4. Contoso Carrying Case sells more during the North America Holiday promotion it would be nice to increase the areas covered as it takes place in one state only.
  5. Introduce Christmas festive period product offering towards the end of the year to increase sales.

Links : Project Repository on Github ,Power BI visualization

Help a brother be better comment ,share feedback.

--

--

Edwin Mbugua

Loves to build things that solves day to day African challenges using technology currently biasharabook.com formerly moviemtaani.co.ke