eCommerce SQL Project Part-1. Web Traffic Analysis

Minyen Hsieh
Minyen Hsieh
Published in
10 min readAug 3, 2021

Traffic Source / Bounce Rates / AB Testing / Conversion Funnels

Introduction

In this SQL project, I “acted” as an eCommerce Database Analyst for Maven Fuzzy Factory, an online retailer which had just launched its first product. Along with the CEO, Marketing Manager and Website Manager in this startup team, I put myself in the shoes of an analyst and helped steer the decisions of a business by analyzing marketing channels, measuring website performance, and exploring the product portfolio.

Overview of the Maven Fuzzy Factory Database

EER Diagram, created in MySQL Workbench

Above are the 6 most critical and fundamental tables that I was working with. They are stripped down and simplified a little but still host the key data (i.e. website activity, products, orders and refunds) that an e-commerce database analyst would typically be working with on a daily basis.

Before jumping into the project, I would like to briefly cover some web/digital marketing buzzwords:

UTM (Urchin Tracking Module) Tracking Parameters

When businesses run paid marketing campaigns, they often obsess over performance and measure everything - how much they spend, how well traffic converts to sales, etc. Paid traffic is commonly tagged with tracking (UTM) parameters, which are appended to URLs. It allows us to tie website activity back to specific traffic sources and campaigns.

In the above URL, “?” tells the browser that everything coming next is NOT going to impact where the browser should look to find the page. It is simply on there for tracking purposes. The parameter value pairs (yellow part) are separated by ampersands(&).

In today’s database,

  • UTM sources: include gsearch(Google) and bsearch(Bing).
  • UTM campaigns: include nonbrand and brand. The nonbrand group are people looking for a product category i.e. “Teddy Bears”, “ Buy Toys Online”, etc. The brand group are people searching for your company specifically by name i.e. “Maven”, “Fuzzy Factory”.
  • UTM contents: include g_ad_1, g_ad_2, b_ad_1, b_ad_2. A lot of places will use utm content to store the name of a specific ad unit that they’re running.

Project Background

The SITUATION: Maven Fuzzy Factory has been live for ~8months, and the CEO is due to present company performance metrics to the board next week. I as an analyst, will be the one tasked with preparing relevant metrics to show the company’s promising growth.

The OBJECTIVE: Use SQL to extract and analyze website traffic and performance data from the Maven Fuzzy Factory database to quantify the company’s growth, and to effectively communicate the story of how we have been able to generate that growth.

8 Analyses Requests From the CEO

1.Gsearch seems to be the biggest driver of our business. Could you pull monthly trends for gsearch sessions and orders so that we can showcase the growth there?

[Results/Insights]: Clearly, the session volume is growing over time. In the first month, there were only 1860 sessions, now there are 8895.

Orders are advancing substantially as well. If we look at the order volume from March or the first full month of April at 92(orders), then now we are already at about 4 times that order volume.

The conversion rate started off from roughly 2.6% over the first 3 months. By the final month, we are up over 4%. That is a pleasant improvement for the business.

2. Next, it would be great to see a similar monthly trend for Gsearch, but this time splitting out nonbrand and brand campaigns separately. I am wondering if brand is picking up at all? If so, this is a good story to tell.

[Results/Insights]: Brand campaigns represent someone going into search engines, explicitly looking for our business, and hit the our Ad (- yes, companies bid on their own terms, largely to box out competitors). The board wants to know whether the company will always have to rely on paid traffic or if the brand is building over time. This is one of those ways that will show whether or not the company is getting brand traction. Here, the result showing that brand sessions have increased dramatically is a good sign. The investors are going to like it.

3. While we are on Gsearch, could you dive into nonbrand, and pull monthly sessions and orders split by device type? I want to flex our analytical muscles a little and show the board we really know our traffic sources.

[Results/Insights]: what we see here is a lot more desktop sessions than mobile. From the beginning, it was a little less than a 2:1 ratio (desktop:mobile), but at the end of this time period, we’ve got more than a 3:1 ratio. If we look at the orders, it’s even more drastic. We had a 5:1 ratio at the beginning of the period, and by the end, we’re almost looking at a 10:1 ratio.

4. I’m worried that one of our more pessimistic board members may be concerned about the large % of traffic from Gsearch. Can you pull monthly trends for Gsearch, alongside monthly trends for each of our other channels.

  • Here, we have UTM source values of Gsearch(Google) and Bsearch(Bing). Within those, we have the UTM campaign values of nonbrand and brand. These are our paid channels, but we would also like to dig into other channels as well i.e direct type-in and organic search.
  • Direct type-in traffic: When UTM source and UTM campaign(paid parameters) are NULL. At the same time, the http referrer is also null, meaning no referring domain - all 3 are NULL. then we know it’s a direct type-in traffic.
  • Organic search traffic: When paid parameters(utm_source, utm_campaign) are NULL, but we have a search engine as the refer(http_referer), then we know it is organic search traffic. It’s coming from the search engine, but it’s not tagged with our paid parameters. That’s how we’ll identify that as organic search traffic.

[Results/Insights]: In particular, the board and CEO are going to be very excited about organic search and direct type in sessions building up because these represent sessions that the company is not paying for. In contrast, with the gsearch/bsearch paid sessions, there is a cost of customer acquisition for any orders that come in. That marketing spend eats into the margin. However, there’s no additional variable cost for direct type-in and organic search traffic.

5. I’d like to tell the story of our website performance improvements over the course of the first 8 months. Could you pull session to order conversion rates, by month?

[Results/Insights]: The conversion rate started from about 2.6%, 3.5%, and gradually climbing up into the 4%.

6. For the gsearch lander test, please estimate the revenue that test earned us.

* the website manager ran a new custom landing page (/lander-1) in a 50/50 A/B testing against the homepage(/home) for the gsearch nonbrand traffic from Jun 19 - July 28.

In other words: Compare the new lander with the previous lander, which one makes us more money? How much more? Quantify this in terms of monthly revenue.

[Results/Insights]: The result table shows the CVR for both pages. lander-1 is fairly better — not quite a whole percentage point better, but approximately 0.0088% additional orders per session. After we get the lift, we need to find the most recent pageview for gsearch nonbrand where the traffic was sent to ‘/home’.

In marketing, “lift” represents an increase in sales in response to some form of advertising or promotion. Monitoring, measuring, and optimizing lift may help a business grow more quickly.

“most_recent_gsearch_nonbrand_home_session” is 17145. This is the maximum session id where we had gsearch nonbrand traffic going to ‘/home/’ page. Since then, all of the traffic has been rerouted to ‘/lander-1’. Then next, we can try to see how many sessions we‘ve had since this traffic reroute.

[Results/Insights]: We had 22977 sessions since after rerouting all traffic to lander-1. 22972 (sessions) * 0.0088 (incremental conversion rate) = 202 (incremental orders) since the home page A/B test concluded.

If we talk about it in terms of orders per month, this is generating an extra 50 orders per month. Roughly 4 months (7/29 ~ 11/27), so roughly 50 extra orders per month.

Incremental Conversion: When we talk about “incremental conversion”, we are talking about how well a new version of something converts compared to the previous version. Conversion Rate B — Conversion Rate A = incremental conversion.

7. For the landing page test you analyzed previously, it would be great to show a full conversion funnel from each of the two pages to orders. You can use the same time period you analyzed last time (Jun 19 — Jul 28).

Below I first tried to find out which version of the pages people saw, and how far they made it in the conversion funnel.

Here, we’ve got the website session and every pageview url for that particular session id, and a flag(1/0) to say whether this was the home page, lander-1, etc. We will use this as a subquery below.

Above result displays: for each home page, did they see the home page or did they see the custom lander(‘/lander-1’)? And we have various made-it rate to different points in the funnel.

The final table reveals what percentage of people click through from the lander, products, … etc.

8. I’d love for you to quantify the impact of our billing page AB test as well. Please analyze the lift generated from the test (Sep 10 — Nov 10), in terms of revenue per billing page session, and then pull the number of billing page sessions for the past month(10/27–11/27) to understand monthly impact.

* the website manager ran a new custom billing page (/billing-2) in a 50/50 A/B test against the original billing page(/billing) from Jun 19 — July 28.

[Results/Insights]:

  • $22.8 revenue per billing page seen for the old version.
  • $31.34 for the new version.
  • There is a major lift in revenue coming from this change. Lift: $8.54 per billing page view - Every time a customer sees the billing page, we are now making $8.54 more than we were previously.

Lastly, we’re going to look at how many sessions we’ve had where somebody hit the billing page within the past month (10/27–11/27).

[Results/Insights]:

  • 1193 billing sessions within the past month
  • Lift: $8.54 per billing session
  • Value of this billing test: 1193 * $8.54 = 10188.22 over the past month.

Conclusion:

This wrapped up part 1 of the project. Part 2 will come in the next article. I am absolutely loving this project so far. It provided me with an opportunity to go through the life of this business as if I was really a part of it. I was able to see what happened and how the business evolved based on the analyses I conducted. Also, it allowed me to solidify my web knowledge and get a chance to practice my SQL skills in a simulated real-world situation.

--

--