Recent years, more and more companies started e-commerce business through Internet. To monitoring marketing and revenue performance, companies collect the data from their website and use data software such as SQL to analyze and improve their e-commerce strategy.
Which page is the landing page for most customers? How many customers finally place orders on the website? How could the company improve their website to increase the conversion rate? There are many questions for analysts to explore. For this article, I found a virtual case data file for this topic.
This Maven Fuzzy Factory database contains typical e-commerce records in several tables which includes website activity, products and order refund related data.
The database schema shows below, we could see primary key and foreign key for each table and understand the relationship between tables well.
Our task is to use MySQL to understand how customers access and interact with the site, build traffic analysis, calculate bounce rate and funnel conversion report to show the performances of pages.
We would like to use the first 8 months data to tell the story of the growth, explain the details of the current performance and quantify the revenue impact of some of the company wins.
The logic of this analysis story will from extracting the simple monthly trend of the most popular source, splitting into different device type, channels and campaign then diving to the conversion rate calculate and revenue estimation.
Show the monthly trend of biggest business driver
We would started from the page view table and find where the customers are coming from and which channels contribute the highest quality views.
Since when business run paid marketing campaigns, they often obsess over performance and measure many aspects such as how much they spend. Paid traffic is always tagged with utm parameters and added to the URLs which allow people to track back. By using the following query, we could find out which channel contribute the highest sessions.
select count(distinct website_session_id), utm_source from website_sessions
group by utm_source;
As we see, ‘gsearch’ seems to be the biggest driver, so we would like to dig into this channel. Since we have to find out the monthly trend of the business growth, the number of the orders could be a good measurement.
select count(distinct website_sessions.website_session_id) as sessions,
count(distinct orders.order_id) as orders,
year(website_sessions.created_at) as yea,
month(website_sessions.created_at) as monthly,
count(distinct orders.order_id)/ count(distinct website_sessions.website_session_id) as conversion
left join orders
where website_sessions.created_at < ‘2012–11–27’
group by yea,
order by monthly;
Note, to get the monthly trend, we have to group by both year and month.
From the result we could see both the sessions and orders are increasing during the first 8 months. For the conversion rate, we see a increasing from 3% to 4%，which means more and more sessions turns into payment income. This shows the company business is in a good growing path.
Moreover, we would like to break down to campaign level to see more specific information. Since there are nonbrand and brand two kinds, we would like to find out if the most traffic comes from brand, if so, it tells the company is always rely on the paid traffic.
By using the following query, we break the result into nonbrand and brand for comparison.
count(distinct case when website_sessions.utm_campaign=’nonbrand’ then website_sessions.website_session_id else null end) as nonbrandsessions,
count(distinct case when website_sessions.utm_campaign=’brand’ then orders.order_id else null end) as brandorders
As we know, brand campaign represent someone going into search engine and explicitly looking for this business. From the result we see the fact is brandsessions increased dramatically, it is a good sign for the investors.
We may also wanna know, what kinds of device do our customers use to get access to our website. This could be a benefit for the company to improve the user interface of website on different device.
Using the similar query and make sure the result is limited in ‘gsearch’ and ‘nonbrand’, we got the result as below.
count(distinct case when website_sessions.device_type=’mobile’ then website_sessions.website_session_id else null end) as mobilesessions,
count(distinct case when website_sessions.device_type=’mobile’ then orders.order_id else null end) as mobileorders,
count(distinct case when website_sessions.device_type=’desktop’ then website_sessions.website_session_id else null end) as desktopsessions,
count(distinct case when website_sessions.device_type=’desktop’ then orders.order_id else null end) as desktoporders
We could see a lot more desktop sessions from the beginning to the end. For November, the desktop sessions is almost 3 times of mobile sessions. Then we look at the orders, it is even more drastic. At the beginning, we have a 5 to 1 ratio of desktop to mobile but at the end we have a 10 to 1 ratio.
This is the thing for the company could use to show the board that their analysts really dive into business and help them understand it at a granular level.
As we mentioned above, there are large percent of traffic from gsearch, we would like to pull monthly trend of gsearch alongside with each other channels’ trend.
select utm_source,http_referer, utm_campaign from website_sessions
group by utm_source, utm_campaign,http_referer;
First of all, we would like to find out all possible distinct combination as below.
Note, we are easily seeing the paid search as gsearch and bsearch for both nonbrand and brand but we also find direct search which all three columns are null and organic search(coming from search engine but not tagged with our paid parameters) which only has value in http_referer.
count(distinct case when website_sessions.utm_source=’gsearch’ then website_sessions.website_session_id else null end) as gpaidsearchsessions,
count(distinct case when website_sessions.utm_source=’bsearch’ then website_sessions.website_session_id else null end) as bpaidsearchsessions,
count(distinct case when website_sessions.utm_source is null and http_referer is not null then website_sessions.website_session_id else null end) as organicsessions,
count(distinct case when website_sessions.utm_source is null and http_referer is null then website_sessions.website_session_id else null end) as directsessions
This is like a pivot table in excel.
From the result, we see both organic sessions and direct sessions are growing well which will make company very excited because these growing are not paid. For gsearch and bsearch, there are cost for customer acquisition for any orders come in and paying for that marketing spend it eats into your margin. By contrast, the organic search and direct search sessions that’s all margin when you sell orders there without additional variable costs for paying for that traffic.
A/B Test for Website Pages
Monthly Conversion Rate
We would like to check all sessions monthly trends at first as below.
This company made an A/B experiment for the gsearch nonbrand landing page version(‘/home’, ‘/lander-1’) for one month and they would like to check the result for the conversion rate and estimate the revenue test earn them.
To get the exact time of test version (‘/lander-1’) put into use, we created following query and get the result.
min(website_pageview_id) as min_pv_id
By reviewing the database diagram, we know that we have to use the result for the fist time test using to connect with website_sessions table and get the session id alongside with the pageview id for the landing action.
create temporary table gsearchlander
min(website_pageviews.website_pageview_id) as min_pageviewid
where website_pageviews.created_at <’2012–07–28' and website_pageviews.website_pageview_id>23504
group by website_pageviews.website_session_id;
After we got the result, we need to save it into a temporary table for next step using. So we use create query to realize.
Having the landing page id for each sessions, we have to connect ‘url’ information into our result, since this A/B test is created for the comparison for different url(‘/home’ , ‘/lander-1’). So we join back to website_pageviews table with our temporary table by using the landing pageview id and the pagevire id, then save it in a temporary table for next step use as below.
create temporary table landing_page1
website_pageviews.pageview_url as landing_page
left join website_pageviews
where website_pageviews.pageview_url in( ‘/home’ , ‘/lander-1’);
For now, we got the landing page url for every single sessions. We are ready for analysis for orders.
Extract the session id and page url to left join the orders table to get the order id for each landing action sessions and save into a temporary table. Then we can easily select information we need from the last temporary table.
create temporary table nonbrandorders
left join orders
From the result, we find that the ‘/lander-1’ is fairly better about 0.88 % additional orders per session. Then we are going to find the most recent page view for gsearch nonbrand where the traffic was sent to ‘/home’.
select max(website_sessions.website_session_id) as mostrecent,
max(website_sessions.created_at) as t
left join website_pageviews
where utm_source=’gsearch’ and utm_campaign=’nonbrand’ and pageview_url=’/home’
This maximum ID is where we had gsearch nonbrand traffic going to ‘/home’. Since then all of the traffic has been rerouted elsewhere so the next what we can do is we can see how many sessions that we’ve had since that test. So we are using that MAX that we just found to limit our date range and we can simply pull this:
where website_sessions.created_at <’2012–11–27'
and utm_source=’gsearch’ and utm_campaign=’nonbrand’;
So we can do quick math by multiplying 22972 sessions by the incremental conversion rate which is 0.88% and we get 202 incremental orders since that test concluded.
Generally, we could say that we have basically improved the performance of the website by changing over to the new page. We quantify that improvement which is above amount of lift in terms of orders generated per session and we find the incremental orders that were generated from the test. We could also check it per month. For almost 4 months, this generated almost 50 extra orders per month.
Full Conversion Funnel Analysis
Next we would like to dive more for the landing page test and we will show full conversion funnel from each of the two pages to orders during the same period we used above.
Before we start querying, I would like to show a complete payment path here. Customers will start from visit landing page, to the products page, to some specific page(‘the-original- mr-fuzzy’), then to the cart and shipping, and finally to the billing. Each complete orders will come to the last ‘Thank you ’ page. So we will built our funnel by using these information.
case when pageview_url =’/home’ then 1 else 0 end as home_page,
case when pageview_url =’/lander-1' then 1 else 0 end as lander_page,
case when pageview_url =’/products’ then 1 else 0 end as products_page,
case when pageview_url=’/thank-you-for-your-order’ then 1 else 0 end as thankyou_page
left join website_pageviews
where website_sessions.created_at between ‘2012–06–19’ and ‘2012–07–28’
and utm_source=’gsearch’ and utm_campaign=’nonbrand’
order by 1,2
Then we use nested select query to extract the website session id and each maximum value of each steps in our funnel shows above from the query above. After we get the flag table as below, we save it into a temporary table for next steps.
To show the full conversion rate for each steps, we use case to split the two testing landing page version and use count and simple division to get the value as following:
case when home=1 then ‘home’ else ‘lander-1’ end as traffic,
count(distinct case when product_made_it = 1 then website_session_id else null end)/count(distinct website_session_id) as lander_click_rate,
count(distinct case when thankyou_made_it = 1 then website_session_id else null end) /count(distinct case when billing_made_it=1 then website_session_id else null end) as billing_click_rate
group by 1;
From the result, we can clearly see each step performance in payment actions. We could realize where is our pain point to lose most of customers.
For example, the products click rate is quite good but the mrfuzzy click rate is not good at all. Note, n step click rate equals to the n+1 sessions id count divide n step count sessions id. We could dive into this point and try to improve cart page such as increase the user interface to make it more easier to continue their ordering to improve the conversion rate. Also, we have to improve our billing page, since many people quit their orders during that steps which is really a pity for our business!
Billing Test Impact Analysis
As we mentioned above, there might be some problems in our billing pages which is reducing the whole revenue. This company also made an A/B experiment for different billing page version. All we have to analyze is quite similar with landing page A/B testing analysis and estimate the impact of revenue generated from the test to finish a page version evaluation.
create temporary table billing
website_pageviews.pageview_url as billing_version_see,
left join orders
where website_pageviews.created_at between ‘2012–09–10’ and ‘2012–11–10’
and website_pageviews.pageview_url in (‘/billing’,’/billing-2');
The table shows billing page version and order information for each session as below. Then we just extract the following data to get the result.
count(distinct website_session_id) as sessions,
sum(price_usd)/COUNT(distinct website_session_id) as revenue_per_billing_page_seen
From the result, we see the billing version 2 produces 31.34 dollars per billing page which is more than 8 dollar than that original page. We could say there is a major lift coming from this change.
Last thing we could do is looking how many sessions we’ve had where somebody hit the billing page in the past month
select count(website_pageviews.website_session_id) as pastmonth
where website_pageviews.created_at between ‘2012–10–27’ and ‘2012–11–27’
and website_pageviews.pageview_url in (‘/billing’,’/billing-2');
We see 1193 sessions hit that page, by simple math, we know for each billing page session is now worth more than 8 dollars more to us, so 1193*8 which more than 10 thousand dollars per month. This is a really good improvement for company to sell more product and make more money.
This article use the data to built a simple e-commerce website analysis by using MySQL.
By analyzing traffic source, we could shift budget towards the engines and campaigns driving the strongest conversion rate, we could compare user behavior patterns across the traffic sources to inform creative strategy and identify opportunities to eliminate wasted spend.
By analyzing website performance, we could help company to optimize the marketing budget and identify the most viewed version of pages to improve our business income. Also, analyzing A/B test result will help company to identify the top opportunities and help them improve bounce rates or conversion rates. Funnel conversion analysis tells us where is the pain point for the online business, it will help company to figure out the problems and improve the online process to gain more revenue.
However, this data still has many aspects to dig with, please look forward my new part of analysis report in the future!
My SQL query is here.
Thank you for reading my Medium! My name is Sydney Chen and I am currently a Master student in Arizona State University majoring in Business Analytics.
Contact info: firstname.lastname@example.org