E-commerce Traffic and Website Analysis by Using MySQL — (1)

Sydney Chen
Apr 5, 2020 · 12 min read

Simple case study for e-commerce marketing analysis

https://images.app.goo.gl/WFD43zY9FiFchRrq9

Recent years, more and more companies started e-commerce business through Internet. To monitoring marketing and revenue performance, companies collect the data from their website and use data software such as SQL to analyze and improve their e-commerce strategy.

Which page is the landing page for most customers? How many customers finally place orders on the website? How could the company improve their website to increase the conversion rate? There are many questions for analysts to explore. For this article, I found a virtual case data file for this topic.

Case Description

The database schema shows below, we could see primary key and foreign key for each table and understand the relationship between tables well.

Our task is to use MySQL to understand how customers access and interact with the site, build traffic analysis, calculate bounce rate and funnel conversion report to show the performances of pages.

We would like to use the first 8 months data to tell the story of the growth, explain the details of the current performance and quantify the revenue impact of some of the company wins.

Story Telling

Show the monthly trend of biggest business driver

channels

We would started from the page view table and find where the customers are coming from and which channels contribute the highest quality views.

Since when business run paid marketing campaigns, they often obsess over performance and measure many aspects such as how much they spend. Paid traffic is always tagged with utm parameters and added to the URLs which allow people to track back. By using the following query, we could find out which channel contribute the highest sessions.

select count(distinct website_session_id), utm_source from website_sessions
group by utm_source;

As we see, ‘gsearch’ seems to be the biggest driver, so we would like to dig into this channel. Since we have to find out the monthly trend of the business growth, the number of the orders could be a good measurement.

select count(distinct website_sessions.website_session_id) as sessions,
count(distinct orders.order_id) as orders,
year(website_sessions.created_at) as yea,
month(website_sessions.created_at) as monthly,
count(distinct orders.order_id)/ count(distinct website_sessions.website_session_id) as conversion
from website_sessions
left join orders
on orders.website_session_id=website_sessions.website_session_id
where website_sessions.created_at < ‘2012–11–27’
and website_sessions.utm_source=’gsearch’
group by yea,
monthly
order by monthly;

Note, to get the monthly trend, we have to group by both year and month.

From the result we could see both the sessions and orders are increasing during the first 8 months. For the conversion rate, we see a increasing from 3% to 4%,which means more and more sessions turns into payment income. This shows the company business is in a good growing path.

campaigns

Moreover, we would like to break down to campaign level to see more specific information. Since there are nonbrand and brand two kinds, we would like to find out if the most traffic comes from brand, if so, it tells the company is always rely on the paid traffic.

By using the following query, we break the result into nonbrand and brand for comparison.

count(distinct case when website_sessions.utm_campaign=’nonbrand’ then website_sessions.website_session_id else null end) as nonbrandsessions,
…….
count(distinct case when website_sessions.utm_campaign=’brand’ then orders.order_id else null end) as brandorders

As we know, brand campaign represent someone going into search engine and explicitly looking for this business. From the result we see the fact is brandsessions increased dramatically, it is a good sign for the investors.

device type

We may also wanna know, what kinds of device do our customers use to get access to our website. This could be a benefit for the company to improve the user interface of website on different device.

Using the similar query and make sure the result is limited in ‘gsearch’ and ‘nonbrand’, we got the result as below.

count(distinct case when website_sessions.device_type=’mobile’ then website_sessions.website_session_id else null end) as mobilesessions,
count(distinct case when website_sessions.device_type=’mobile’ then orders.order_id else null end) as mobileorders,
count(distinct case when website_sessions.device_type=’desktop’ then website_sessions.website_session_id else null end) as desktopsessions,
count(distinct case when website_sessions.device_type=’desktop’ then orders.order_id else null end) as desktoporders

We could see a lot more desktop sessions from the beginning to the end. For November, the desktop sessions is almost 3 times of mobile sessions. Then we look at the orders, it is even more drastic. At the beginning, we have a 5 to 1 ratio of desktop to mobile but at the end we have a 10 to 1 ratio.

This is the thing for the company could use to show the board that their analysts really dive into business and help them understand it at a granular level.

channels comparison

As we mentioned above, there are large percent of traffic from gsearch, we would like to pull monthly trend of gsearch alongside with each other channels’ trend.

select utm_source,http_referer, utm_campaign from website_sessions
where created_at<’2012–11–27'
group by utm_source, utm_campaign,http_referer;

First of all, we would like to find out all possible distinct combination as below.

Note, we are easily seeing the paid search as gsearch and bsearch for both nonbrand and brand but we also find direct search which all three columns are null and organic search(coming from search engine but not tagged with our paid parameters) which only has value in http_referer.

count(distinct case when website_sessions.utm_source=’gsearch’ then website_sessions.website_session_id else null end) as gpaidsearchsessions,
count(distinct case when website_sessions.utm_source=’bsearch’ then website_sessions.website_session_id else null end) as bpaidsearchsessions,
count(distinct case when website_sessions.utm_source is null and http_referer is not null then website_sessions.website_session_id else null end) as organicsessions,
count(distinct case when website_sessions.utm_source is null and http_referer is null then website_sessions.website_session_id else null end) as directsessions

This is like a pivot table in excel.

From the result, we see both organic sessions and direct sessions are growing well which will make company very excited because these growing are not paid. For gsearch and bsearch, there are cost for customer acquisition for any orders come in and paying for that marketing spend it eats into your margin. By contrast, the organic search and direct search sessions that’s all margin when you sell orders there without additional variable costs for paying for that traffic.

A/B Test for Website Pages

https://images.app.goo.gl/eTYiknMvjXUiLoee7

Monthly Conversion Rate

We would like to check all sessions monthly trends at first as below.

This company made an A/B experiment for the gsearch nonbrand landing page version(‘/home’, ‘/lander-1’) for one month and they would like to check the result for the conversion rate and estimate the revenue test earn them.

To get the exact time of test version (‘/lander-1’) put into use, we created following query and get the result.

select
min(created_at),
min(website_pageview_id) as min_pv_id
from website_pageviews
where pageview_url=’/lander-1';

By reviewing the database diagram, we know that we have to use the result for the fist time test using to connect with website_sessions table and get the session id alongside with the pageview id for the landing action.

create temporary table gsearchlander

select website_sessions.website_session_id,
min(website_pageviews.website_pageview_id) as min_pageviewid
from website_pageviews
join website_sessions
on website_sessions.website_session_id=website_pageviews.website_session_id
where website_pageviews.created_at <’2012–07–28' and website_pageviews.website_pageview_id>23504
and utm_source=’gsearch’
and utm_campaign=’nonbrand’
group by website_pageviews.website_session_id;

After we got the result, we need to save it into a temporary table for next step using. So we use create query to realize.

Having the landing page id for each sessions, we have to connect ‘url’ information into our result, since this A/B test is created for the comparison for different url(‘/home’ , ‘/lander-1’). So we join back to website_pageviews table with our temporary table by using the landing pageview id and the pagevire id, then save it in a temporary table for next step use as below.

create temporary table landing_page1
select gsearchlander.website_session_id,
website_pageviews.pageview_url as landing_page
from gsearchlander
left join website_pageviews
on website_pageviews.website_pageview_id=gsearchlander.min_pageviewid
where website_pageviews.pageview_url in( ‘/home’ , ‘/lander-1’);

For now, we got the landing page url for every single sessions. We are ready for analysis for orders.

Extract the session id and page url to left join the orders table to get the order id for each landing action sessions and save into a temporary table. Then we can easily select information we need from the last temporary table.

create temporary table nonbrandorders
select landing_page1.website_session_id,
landing_page1.landing_page,
orders.order_id
from landing_page1
left join orders
on orders.website_session_id=landing_page1.website_session_id;

From the result, we find that the ‘/lander-1’ is fairly better about 0.88 % additional orders per session. Then we are going to find the most recent page view for gsearch nonbrand where the traffic was sent to ‘/home’.

select max(website_sessions.website_session_id) as mostrecent,
max(website_sessions.created_at) as t
from website_sessions
left join website_pageviews
on website_sessions.website_session_id=website_pageviews.website_session_id
where utm_source=’gsearch’ and utm_campaign=’nonbrand’ and pageview_url=’/home’
and website_sessions.created_at<’2012–11–27';

This maximum ID is where we had gsearch nonbrand traffic going to ‘/home’. Since then all of the traffic has been rerouted elsewhere so the next what we can do is we can see how many sessions that we’ve had since that test. So we are using that MAX that we just found to limit our date range and we can simply pull this:

select count(website_sessions.website_session_id)
from website_sessions
where website_sessions.created_at <’2012–11–27'
and website_sessions.website_session_id>17145
and utm_source=’gsearch’ and utm_campaign=’nonbrand’;

So we can do quick math by multiplying 22972 sessions by the incremental conversion rate which is 0.88% and we get 202 incremental orders since that test concluded.

Generally, we could say that we have basically improved the performance of the website by changing over to the new page. We quantify that improvement which is above amount of lift in terms of orders generated per session and we find the incremental orders that were generated from the test. We could also check it per month. For almost 4 months, this generated almost 50 extra orders per month.

Full Conversion Funnel Analysis

Next we would like to dive more for the landing page test and we will show full conversion funnel from each of the two pages to orders during the same period we used above.

Before we start querying, I would like to show a complete payment path here. Customers will start from visit landing page, to the products page, to some specific page(‘the-original- mr-fuzzy’), then to the cart and shipping, and finally to the billing. Each complete orders will come to the last ‘Thank you ’ page. So we will built our funnel by using these information.

select website_sessions.website_session_id,
website_pageviews.pageview_url,
case when pageview_url =’/home’ then 1 else 0 end as home_page,
case when pageview_url =’/lander-1' then 1 else 0 end as lander_page,
case when pageview_url =’/products’ then 1 else 0 end as products_page,
…………………..
case when pageview_url=’/thank-you-for-your-order’ then 1 else 0 end as thankyou_page
from website_sessions
left join website_pageviews
on website_sessions.website_session_id=website_pageviews.website_session_id
where website_sessions.created_at between ‘2012–06–19’ and ‘2012–07–28’
and utm_source=’gsearch’ and utm_campaign=’nonbrand’
order by 1,2

Then we use nested select query to extract the website session id and each maximum value of each steps in our funnel shows above from the query above. After we get the flag table as below, we save it into a temporary table for next steps.

To show the full conversion rate for each steps, we use case to split the two testing landing page version and use count and simple division to get the value as following:

select
case when home=1 then ‘home’ else ‘lander-1’ end as traffic,
count(distinct case when product_made_it = 1 then website_session_id else null end)/count(distinct website_session_id) as lander_click_rate,
………………..
count(distinct case when thankyou_made_it = 1 then website_session_id else null end) /count(distinct case when billing_made_it=1 then website_session_id else null end) as billing_click_rate
from pageflag
group by 1;

From the result, we can clearly see each step performance in payment actions. We could realize where is our pain point to lose most of customers.

For example, the products click rate is quite good but the mrfuzzy click rate is not good at all. Note, n step click rate equals to the n+1 sessions id count divide n step count sessions id. We could dive into this point and try to improve cart page such as increase the user interface to make it more easier to continue their ordering to improve the conversion rate. Also, we have to improve our billing page, since many people quit their orders during that steps which is really a pity for our business!

Billing Test Impact Analysis

As we mentioned above, there might be some problems in our billing pages which is reducing the whole revenue. This company also made an A/B experiment for different billing page version. All we have to analyze is quite similar with landing page A/B testing analysis and estimate the impact of revenue generated from the test to finish a page version evaluation.

create temporary table billing
select website_pageviews.website_session_id,
website_pageviews.pageview_url as billing_version_see,
orders.order_id,
orders.price_usd
from website_pageviews
left join orders
on orders.website_session_id=website_pageviews.website_session_id
where website_pageviews.created_at between ‘2012–09–10’ and ‘2012–11–10’
and website_pageviews.pageview_url in (‘/billing’,’/billing-2');

The table shows billing page version and order information for each session as below. Then we just extract the following data to get the result.

count(distinct website_session_id) as sessions,
sum(price_usd)/COUNT(distinct website_session_id) as revenue_per_billing_page_seen

From the result, we see the billing version 2 produces 31.34 dollars per billing page which is more than 8 dollar than that original page. We could say there is a major lift coming from this change.

Last thing we could do is looking how many sessions we’ve had where somebody hit the billing page in the past month

select count(website_pageviews.website_session_id) as pastmonth
from website_pageviews
where website_pageviews.created_at between ‘2012–10–27’ and ‘2012–11–27’
and website_pageviews.pageview_url in (‘/billing’,’/billing-2');

We see 1193 sessions hit that page, by simple math, we know for each billing page session is now worth more than 8 dollars more to us, so 1193*8 which more than 10 thousand dollars per month. This is a really good improvement for company to sell more product and make more money.

Summary

By analyzing traffic source, we could shift budget towards the engines and campaigns driving the strongest conversion rate, we could compare user behavior patterns across the traffic sources to inform creative strategy and identify opportunities to eliminate wasted spend.

By analyzing website performance, we could help company to optimize the marketing budget and identify the most viewed version of pages to improve our business income. Also, analyzing A/B test result will help company to identify the top opportunities and help them improve bounce rates or conversion rates. Funnel conversion analysis tells us where is the pain point for the online business, it will help company to figure out the problems and improve the online process to gain more revenue.

However, this data still has many aspects to dig with, please look forward my new part of analysis report in the future!

My SQL query is here.

About Me

Contact info: xchen378@asu.edu

The Startup

Get smarter at building your thing. Join The Startup’s +800K followers.

Sydney Chen

Written by

Machine Learning Learner | Data Analyst | Data Science Interest | LinkedIn: linkedin.com/in/sydneychen-/ | Engineering Background | github.com/SydneyChen2

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +800K followers.

Sydney Chen

Written by

Machine Learning Learner | Data Analyst | Data Science Interest | LinkedIn: linkedin.com/in/sydneychen-/ | Engineering Background | github.com/SydneyChen2

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +800K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store