E-Commerce Analysis Project using Big Query and Tableau

Ardaaaa
11 min readAug 31, 2023

--

E-commerce is one of rapidly growing business in this era, especially in Indonesia, E-commerce usually have a data driven culture and I believe it played an important role to their success. that’s why data analysis is very important in E-commerce.

I want to share about my recently project about SQL and Data Visualization using Google Big query and Tableau Desktop. The aim for this project is to answer business question from E-commerce sales dataset.

Lately I got the case from my connection on Linkedin who shared some business questions for e-commerce dataset on Google Big Query then I was interested to answer those questions and practice my SQL and Tableau skills.

To generate visualizations, you can connect google big query with tableau desktop using your google account and write SQL based on the challenge.

The challenge was posted by Muhammad Taruna Aldiramadan and here is the challenge ecommerce_case.

There are 4 Section for this challenge

  1. Sales Analysis
  2. Customer Analysis
  3. Product Analysis
  4. Traffic Analysis

The dataset used is one of Big query public dataset, you can check the dataset on this link bigquery-public-data.thelook_ecommerce.

Ok now let’s go to the challenge

Sales Analysis

  1. Total revenues, profits, orders, customers, and average purchase sizes for each day of the year 2023 (January — June).
SELECT 
DATE(oi.created_at) as dates,
sum(oi.sale_price) as revenue,
sum((oi.sale_price-p.cost)) as profit,
count(distinct o.order_id) as count_orders,
count(distinct o.user_id) as count_users,
avg(o.num_of_item) as average_purchase
FROM `bigquery-public-data.thelook_ecommerce.order_items` oi
join `bigquery-public-data.thelook_ecommerce.products` p on oi.product_id=p.id
join `bigquery-public-data.thelook_ecommerce.orders` o on oi.order_id=o.order_id
where CONCAT( CAST(EXTRACT(YEAR FROM DATE (oi.created_at)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (oi.created_at)) AS STRING) ) between '2023-1' and '2023-6'
and oi.returned_at is null
group by dates
order by dates
result of no. 1

there was a fluctuation in each days for the revenue, number of orders and profit. but overall In Januari — June 2023, it shows a good trends which is increased gradually. it means that our sales perform good in the first half of 2023 and we can focus on keep doing the good work to keep this perform stand. the averages of quantity item per order was the same from Januari to June 2023.

2. Calculate total revenues, profits, orders, and customers of each month of the year 2023 (January — June) and compare it to last year data (January 2022 — June 2022).

SELECT 
DATE(oi.created_at) as dates,
sum(oi.sale_price) as revenue,
count(distinct o.order_id) as count_orders,
count(distinct o.user_id) as count_users,
avg(o.num_of_item) as average_purchase
FROM `bigquery-public-data.thelook_ecommerce.order_items` oi
join `bigquery-public-data.thelook_ecommerce.orders` o on oi.product_id=o.order_id
where CONCAT( CAST(EXTRACT(YEAR FROM DATE (oi.created_at)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (oi.created_at)) AS STRING) ) between '2023-1' and '2023-6'
or CONCAT( CAST(EXTRACT(YEAR FROM DATE (oi.created_at)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (oi.created_at)) AS STRING) ) between '2022-1' and '2022-6'
and oi.returned_at is null
group by dates
order by dates

The Monthly Revenue for the half of the year 2023 perform better than half of the year 2022. Monthly revenue in half of the year 2023 is much higher than half of the year 2022. It means that our sales have improved in a year.

3. Calculate total profits that are supposed to be generated in June 2023. Compare it to the data in June 2022.

SELECT 
CONCAT( CAST(EXTRACT(YEAR FROM DATE (oi.created_at)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (oi.created_at)) AS STRING) ) as dates,
sum(oi.sale_price) as revenue
FROM `bigquery-public-data.thelook_ecommerce.order_items` oi
join `bigquery-public-data.thelook_ecommerce.orders` o on oi.product_id=o.order_id
where CONCAT( CAST(EXTRACT(YEAR FROM DATE (oi.created_at)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (oi.created_at)) AS STRING) ) = '2022-6' or
CONCAT( CAST(EXTRACT(YEAR FROM DATE (oi.created_at)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (oi.created_at)) AS STRING) ) = '2023-6' and
oi.returned_at is null
group by dates
order by dates

This is comparation between profits that generated in June 2023 and June 2022. There are significant difference between the profit, and it shows that it has increased sharply to June 2023.

Customer Analysis

  1. For June 2023, calculate total revenues, profits, orders, and customers based on the customer’s origin country.
SELECT
u.country,
sum(oi.sale_price) as revenue,
sum((oi.sale_price-p.cost)) as profit,
count(distinct o.order_id) as count_orders,
count(distinct o.user_id) as count_customers,
FROM `bigquery-public-data.thelook_ecommerce.order_items` oi
join `bigquery-public-data.thelook_ecommerce.products` p on oi.product_id=p.id
join `bigquery-public-data.thelook_ecommerce.orders` o on oi.order_id=o.order_id
join `bigquery-public-data.thelook_ecommerce.users` u on o.user_id=u.id
where EXTRACT(YEAR FROM DATE (oi.created_at)) = 2023
and EXTRACT(MONTH FROM DATE (oi.created_at)) = 6
group by u.country

China, United States and Brazil are the top 3 country to contribute revenue, customers and orders in June 2023.

2. Focusing on the sales for the top 5 countries based on their profits contribution in June 2023, what is the proportion of the male and female customer for each country.

with top_5_profits_country as (
SELECT
u.country,
sum((oi.sale_price-p.cost)) as profit,
count(u.id) as sum_users
FROM `bigquery-public-data.thelook_ecommerce.order_items` oi
join `bigquery-public-data.thelook_ecommerce.products` p on oi.product_id=p.id
join `bigquery-public-data.thelook_ecommerce.orders` o on oi.order_id=o.order_id
join `bigquery-public-data.thelook_ecommerce.users` u on o.user_id=u.id
where EXTRACT(YEAR FROM DATE (oi.created_at)) = 2023
and EXTRACT(MONTH FROM DATE (oi.created_at)) = 6
group by u.country
order by profit desc LIMIT 5
)
select country, gender, count(id) as count_gender,
case
when gender = 'F' then count(id)/(count(id)+lead(count(id), 1) OVER (PARTITION BY country ORDER BY gender))
when gender = 'M' then count(id)/(count(id)+lag(count(id), 1) OVER (PARTITION BY country ORDER BY gender))
end as proportion
from `bigquery-public-data.thelook_ecommerce.users`
where country in (select country from top_5_profits_country)
group by country, gender

For every countries, the proportions of Female and Male is balanced, there are no domination for one gender, it shows a same proportions. it means that our customers are from Female and Male.

3. Compare the daily average purchase sizes of female and male customers from the highest contributing country based on the orders generated in June 2023.

with highest_country_june as (
select country, count(id) as total_orders
from `bigquery-public-data.thelook_ecommerce.orders` o
join `bigquery-public-data.thelook_ecommerce.users` u on o.user_id = u.id
where
CONCAT( CAST(EXTRACT(YEAR FROM DATE (o.created_at)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (o.created_at)) AS STRING) ) = '2023-6'
group by country
order by total_orders desc limit 1
)
select date(o.created_at) as dates, o.gender, avg(num_of_item) as average_purchase_size
from `bigquery-public-data.thelook_ecommerce.orders` o
join `bigquery-public-data.thelook_ecommerce.users` u on o.user_id = u.id
where country in (select country from highest_country_june)
and CONCAT( CAST(EXTRACT(YEAR FROM DATE (o.created_at)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (o.created_at)) AS STRING) ) = '2023-6'
group by dates, o.gender
order by dates

Graphic above shows that daily average purchase size of female and male in June 2023 was the same. again there are no gender domination for daily average purchase size in June 2023.

4. Calculate average purchase sizes grouped by customer’s age on orders created in June 2023.

select age, round(avg(num_of_item)) average_purchase_size
from `bigquery-public-data.thelook_ecommerce.orders` o
join `bigquery-public-data.thelook_ecommerce.users` u on o.user_id = u.id
where
CONCAT( CAST(EXTRACT(YEAR FROM DATE (o.created_at)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (o.created_at)) AS STRING) ) = '2023-6'
group by age
order by age

5. Focusing on the sales for the top 1 countries based on their revenue contribution in 2023, compare the distribution of customer’s age between male and female.

with country_distribution as (
select country, gender, sum(o.sale_price) as revenue
from `bigquery-public-data.thelook_ecommerce.order_items` o
join `bigquery-public-data.thelook_ecommerce.users` u on o.user_id = u.id
where EXTRACT(YEAR FROM DATE (o.created_at)) = 2023
group by country,gender order by revenue desc limit 2
)
SELECT
distinct gender,
PERCENTILE_CONT(age, 0.1) OVER(PARTITION BY gender) AS percentile10,
PERCENTILE_CONT(age, 0.25) OVER(PARTITION BY gender) AS percentile25,
PERCENTILE_CONT(age, 0.75) OVER(PARTITION BY gender) AS percentile75,
PERCENTILE_CONT(age, 0.9) OVER(PARTITION BY gender) AS percentile90,
PERCENTILE_CONT(age, 0.5) OVER(PARTITION BY gender) AS median,
FROM `bigquery-public-data.thelook_ecommerce.users` u
where country in (select country from country_distribution)

the distribution of each gender age of our customer from China (highest sales country in 2023 so far) is same. each genders have a same range of age.

6. Calculate the average time taken from the moment the order is created until the order is delivered for each country.

with avg_time as (
select country, timestamp_diff(o.delivered_at, o.created_at, day) as delivered_time
from `bigquery-public-data.thelook_ecommerce.order_items` o
join `bigquery-public-data.thelook_ecommerce.users` u on o.user_id = u.id
where timestamp_diff(o.delivered_at, o.created_at, day) > 0
)
select country, round(avg(delivered_time)) as avg_delivered_day
from avg_time
group by country

Product Analysis

  1. Calculate revenues and quantity of orders for each product category in June 2023.
select category, sum(o.sale_price) as revenue, count(order_id) as count_order
from `bigquery-public-data.thelook_ecommerce.order_items` o
join `bigquery-public-data.thelook_ecommerce.products` p on o.product_id = p.id
where
CONCAT( CAST(EXTRACT(YEAR FROM DATE (o.created_at)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (o.created_at)) AS STRING) ) = '2023-6'
group by category
order by revenue desc, count_order desc

2. Comparing the all time’s top 5 categories in terms of order item quantity between male and female.

with top5_categories as (
select category, sum(num_of_item) as qty_item
from `bigquery-public-data.thelook_ecommerce.order_items` oi
join `bigquery-public-data.thelook_ecommerce.products` p on oi.product_id = p.id
join `bigquery-public-data.thelook_ecommerce.orders` o on oi.order_id = o.order_id
group by category
order by qty_item desc limit 5
)
select category, gender, count(order_id) as item_quantity
from `bigquery-public-data.thelook_ecommerce.order_items` o
join `bigquery-public-data.thelook_ecommerce.products` p on o.product_id = p.id
join `bigquery-public-data.thelook_ecommerce.users` u on o.user_id=u.id
group by category, gender
having category in (
select category from top5_categories
)
order by category

3. Compare the profit of each product category in June 2022 and June 2023. Provide your analysis.

select 
CONCAT( CAST(EXTRACT(YEAR FROM DATE (o.created_at)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (o.created_at)) AS STRING) ) as months,
category, sum((o.sale_price-p.cost)) as profit
from `bigquery-public-data.thelook_ecommerce.order_items` o
join `bigquery-public-data.thelook_ecommerce.products` p on o.product_id = p.id
where CONCAT( CAST(EXTRACT(YEAR FROM DATE (o.created_at)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (o.created_at)) AS STRING) ) = '2022-6' or
CONCAT( CAST(EXTRACT(YEAR FROM DATE (o.created_at)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (o.created_at)) AS STRING) ) = '2023-6'
group by months,category
order by category

4. Analyse the cancelled and returned data of each product category in June 2023.

select
category,status, count(order_id) as num_of_item
from `bigquery-public-data.thelook_ecommerce.order_items` o
join `bigquery-public-data.thelook_ecommerce.products` p on o.product_id = p.id
where status = 'Cancelled' or status = 'Returned'
and CONCAT( CAST(EXTRACT(YEAR FROM DATE (o.created_at)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (o.created_at)) AS STRING) ) = '2023-6'
group by category,status
order by category

The highest category that returned or cancelled in June 2023 is Intimates category.

5. Create a weekly cumulative revenue data for the highest revenue-generating brand in 2022. Visualise the query’s result with a bar chart.

with best_brand as (
select brand, sum(o.sale_price) as revenue
from `bigquery-public-data.thelook_ecommerce.order_items` o
join `bigquery-public-data.thelook_ecommerce.products` p on o.product_id = p.id
group by brand
order by revenue desc limit 1
), weekly_revenue as (
select EXTRACT(WEEK FROM DATE (o.created_at)) as week,
sum(o.sale_price) as revenue
from `bigquery-public-data.thelook_ecommerce.order_items` o
join `bigquery-public-data.thelook_ecommerce.products` p on o.product_id = p.id
where brand in (select brand from best_brand)
and EXTRACT(YEAR FROM DATE (o.created_at)) = 2022
group by week
order by week
)
select week, revenue, sum(revenue) over (ORDER BY week) as cumulative_revenue
from weekly_revenue

Traffic Analysis

  1. Calculate the monthly total number of customers for each traffic source from January to June 2023.
select EXTRACT(MONTH FROM DATE (e.created_at)) as bulan,
e.traffic_source, count(e.user_id) as num_of_customers
from `bigquery-public-data.thelook_ecommerce.events` e
join `bigquery-public-data.thelook_ecommerce.users` u on e.user_id = u.id
where CONCAT( CAST(EXTRACT(YEAR FROM DATE (e.created_at)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (e.created_at)) AS STRING) ) between '2023-1' and '2023-6'
group by bulan,e.traffic_source
order by bulan

Our amount of customers are increased from Januari to June 2023 and if we break down from the traffic source, Email shows the highest traffic source that obtain customers during the half of 2023.

2. Create age groups with these rules: Kids (<15), Teenager (15–24), Adult (25–50), Elderly (>50). Calculate the all-time total number of customers for each traffic source within each age group

with age_groups as (
select id, age,
case
when age < 15 then 'Kids'
when (age >= 15 and age < 24) then 'Teenager'
when (age >= 24 and age <= 50) then 'Adult'
when age > 50 then 'Elderly'
end as age_group
from `bigquery-public-data.thelook_ecommerce.users`
)
select e.traffic_source, age_group, count(e.user_id) as count_users
from `bigquery-public-data.thelook_ecommerce.events` e
join age_groups ag on e.user_id = ag.id
group by e.traffic_source, age_group
order by e.traffic_source

3. Calculate the monthly total revenue generated from each traffic source from January to June 2023.

select EXTRACT(MONTH FROM DATE (e.created_at)) as bulan,
e.traffic_source, sum(o.sale_price) as revenue
from `bigquery-public-data.thelook_ecommerce.order_items` o
join `bigquery-public-data.thelook_ecommerce.users` u on o.user_id = u.id
join `bigquery-public-data.thelook_ecommerce.events` e on u.id = e.user_id
where CONCAT( CAST(EXTRACT(YEAR FROM DATE (e.created_at)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (e.created_at)) AS STRING) ) between '2023-1' and '2023-6'
group by bulan,e.traffic_source
order by bulan

The monthly total revenue also increased in January to June 2023, and again email is the highest traffic source of those revenues.

This is the end of this page, I have done this challenge and i hope it can be my portofolio and of course i hope you guys find it useful to help you learn SQL, visualization and analysis.

If you guys have other thoughts about this, please feel free to comment and advice me, i will appreciate that as a new insight for me.

Thank You

--

--