# Data Analysis Project for Retail: Sales Performance Report Using SQL

## Analysis of DQLab Store sales by using MySQL

# About Dataset

This project provided by DQLab, here was given the dataset contains the transactions report from 2009 until 2012 consist 5500 rows, which is the order status field has the value ‘Order Finished’, ‘Order Returned’ and ‘Order Cancelled’. All data for this project are saved in the table called **dqlab_store_sales**. You can see the appearance of the dataset below :

#table information

DESC dqlab_sales_store;#first 5 records

SELECT *

FROM dqlab_sales_store

LIMIT 5;

# What the task given?

Through the data has given, the manager of DQLab Store wants to know :

- Order numbers and total sales from 2009 until 2012 which order status is finished
- Total sales for each sub-category of product on 2011 and 2012
- The effectiveness and efficiency of promotions carried out so far, by calculating the burn rate of the overall promotions by year
- The effectiveness and efficiency of promotions carried out so far, by calculating the burn rate of the overall promotions by sub-category of product on 2012
- The number of customers transactions for each year
- The number of new customers for each year

# Let’s solving this project !

**1. Order Numbers and Total Sales by Years**

`SELECT YEAR(order_date) years,`

SUM(sales) sales,

COUNT(order_status) 'number of order'

FROM dqlab_sales_store

WHERE order_status = 'Order Finished'

GROUP BY 1;

*order_date *field has a *date* format, we can get the year part of *date* format by using YEAR() function. Next, we use the SUM() function to get total sales, then COUNT() function to get the number of order. Don’t forget to add WHERE clause to filter *order_status *just for finished order. Finally, to get the value by years, we should put GROUP BY 1 on the query since year field is located in the first column.

output :

We can see, total sales of DQLab store are changed over the year. The highest total sales were in 2009 and it doesn’t get higher after that. But different from the number of order, it goes ride except in 2011. Although the change isn’t too significant over the years.

## 2. **Total Sales by Sub-Category on 2011 and 2012**

`SELECT *,`

ROUND((sales2012-sales2011)*100/sales2012, 1) 'growth sales (%)'

FROM(

SELECT product_sub_category,

SUM( IF( YEAR(order_date) = 2011, sales, 0)) sales2011,

SUM( IF( YEAR(order_date) = 2012, sales, 0)) sales2012

FROM dqlab_sales_store

WHERE order_status = 'Order Finished'

GROUP BY product_sub_category

) sub_category

ORDER BY 4 DESC;

Here the pivot table was used to compare the total sales in 2011 with 2012. We can use SUM() function followed by IF() function to do it. SUM() is used to get the total sales and IF() used to filter by year that we want to specify.

output :

Most the growth sales are lead the increases, shown by a positive value. But there are some sub-category products that got a decline in sales from 2011 to 2012 which shown by a negative value. *Labels, Copiers & Fax and tables* are the categories that got a decline in sales the most.

## 3. Promotion Effectiveness and Efficiency by Years

`SELECT YEAR(order_date) years,`

SUM(sales) sales,

SUM(discount_value) 'promotion value',

ROUND( SUM(discount_value)*100/SUM(sales), 2) 'burn rate (%)'

FROM dqlab_sales_store

WHERE order_status = 'Order Finished'

GROUP BY 1;

In this project, burn rate analysis is used to knowing the effectiveness and efficiency of promotions. It does by comparing the sum of promotion value

with total sales. Here DQLab hopes that the burn rate doesn’t exceed 4.5%. Burn rate calculating is shown below :

output :

The results inform us that burn rates are above 4.5% for each year as overall. This indicates that the promotions have been carried out haven’t been able to reduce the burn rate to a maximum of 4.5%. We can figure out what is the product which made a significant contribution causing the burn rate to be higher than expected by grouping the query by each product.

## 4. Promotion Effectiveness and Efficiency by Product Sub-Category

`SELECT product_sub_category,`

product_category,

SUM(sales) sales,

SUM(discount_value) promotion_value,

ROUND(SUM(discount_value)*100/SUM(sales),2) 'burn rate (%)'

FROM dqlab_sales_store

WHERE YEAR(order_date) = 2012

AND order_status = 'Order Finished'

GROUP BY product_sub_category,

product_category

ORDER BY 5;

output :

There are only five sub-category of product that have the burn rate bellow 4.5 %. It shown on the first five rows, they are starting on *Rubber Bands* to T*elephones and Communication*. Whereas the *Labels* have higher 0.02% from the maximum value of expected burn rate by DQLab Store.

It is very interesting because from these results we know that still many sub-category of product which have the burn rate higher than 4.5%.

## 5. Customers Transactions per Year

`SELECT YEAR(order_date) years,`

COUNT(DISTINCT customer) 'number of customer'

FROM dqlab_sales_store

WHERE order_status = 'Order Finished'

GROUP BY 1;

The calculation of customers number for each year doesn’t involve the duplicate value. That is why DISTINCT is used in this query for getting the unique value of customers number.

output :

The number of customers isn’t changing significantly overall. But fortunately, we didn’t get a significant decline in customers. We can see that the number of customers tends to be in the values around 580–590.

## 6. New customers over the years

`SELECT YEAR(first_order) years,`

COUNT(customer) 'new customers'

FROM(

SELECT customer,

MIN(order_date) first_order

FROM dqlab_sales_store

WHERE order_status = 'Order Finished'

GROUP BY 1) first

GROUP BY 1;

To get the number of new customers for each year, we only need the data that shows the first time transaction from each customer. We can get it by applying MIN() function on the *first_order* field then calculate the number of customers.

output :

The growth of new customers for each year is decreasing. It gets extreme in 2012 that only there 11 new customers. But if we back on the result before (fig.7) the number of customers tends to remain overall. This informs us that many previous customers still back to DQLab Store to do the transaction besides the new customers are decreasing.

# Summary

According to data analysis that we have done, we can conclude that :

- The total sales and order number of DQLab Store are fluctuated as overall, where the highest total sales happened in 2009. Meanwhile, the number of orders increased except in 2011 though the increase is not too significant.
- Total sales based on the sub-category of products got increasing between 2011 and 2012 as overall. But there some of them is got sales decreasing, they are Appliances, Bookcases, Tables, Labels, Copiers & Fax.
- The burn rate over the year still above 4.5%. This indicates that the effectiveness and efficiency of promotions haven’t been reaching as DQLab Store hopes.
- There are many products that have a burn rate above 4.5%. This causes the overall burn rate is above 4.5% for each year. The only products that have a burn rate bellow 4.5% are
*Rubber Bands, Envelopes, Chairs & Chairmats, Tables and Telephones & Communication*. - The number of customers over the years tends to remain, it is around 580–590.
- The number of new customers decreasing over the years, with the lowest number of new customers is 11 in 2012.