SQL Notebooks — The EASIEST Way to Build and Visualise a Conversion Funnel

Byzer SQL Notebooks: Combining the power of Jupyter and SQL editors for data analytics

Lori Lu
6 min readMar 23, 2022

What is a Conversion Funnel?

by DAN LEBLANC

A conversion funnel, also referred to as a site funnel, is the path to purchase in an eCommerce store or site. In some ways, it can be compared to the traditional marketing funnel, but different than a traditional funnel, most of the steps are occurring on your site.

- Conversion Funnel: How to Build, Analyze & Optimize, Dan LeBlanc

In this tutorial, we will run a funnel analysis in Byzer. By the end of this tutorial, you’ll see how seamlessly Byzer Notebook integrates with SQL and Python. Anyone with basic SQL and Python knowledge can complete this tutorial with ease.

A simple analogy of Byzer

You can Do More with Less in Byzer !

Without further ado, let’s get started.

Use Case

We will use this e-commerce website dataset throughout this tutorial. Please download the files to your laptop beforehand.

The site is very simple and has just 4 pages:

The first page is the home page. When you come to the site for the first time, you can only land on the home page as a first page.

From the home page, the user can perform a search and land on the search page.

From the search page, if the user clicks on a product, she will get to the payment page, where she is asked to provide payment information in order to buy that product.

If she does decide to buy, she ends up on the confirmation page.

The company CEO isn’t very happy with the volume of sales and, especially, of sales coming from new users. Therefore, she asked you to investigate whether there is something wrong in the conversion funnel or, in general, if you could suggest how conversion rate can be improved.

Step 1: Install Byzer on your laptop

Please follow this guide to complete the installation.

Step 2: Use Java to find JAVA_HOME

-- For Linux and macOS, we'll use grep:
java -XshowSettings:properties -version 2>&1 > /dev/null | grep ‘java.home’

Step 3: Update .mlsql.config

-- add this line to .mlsql.config file and save it 
-- JDK 8 is recommended
java.home= <Your JAVA_HOME PLACEHOLDER>

Step 4: Start Ray runtime

conda activate dev
ray start --head

Step 5: Import sample dataset

Create a new folder called sample_data/funnel_analysis, and move the downloaded files to this folder.

Step 6: Load CSV files into Byzer Engine

Let’s create a micro called loadCSV.

Macro are pieces of code that can be reused multiple times — they are analogous to “functions” in other programming languages, and are extremely useful if you find yourself repeating code across multiple places.

In this case, there are five files to be loaded. To make our life easier, we’ll create a micro called loadCSV and then call this micro to load all files separately.

--create a micro called loadCSV 
set loadCSV = '''
load csv.`./sample_data/funnel_analysis/{0}.csv`
where header="true"
as {1}
''';
--load each CSV file into Byzer engine as a table object
!loadCSV home_page_table home_page;
!loadCSV payment_confirmation_table payment_confirmation;
!loadCSV payment_page_table payment_page;
!loadCSV search_page_table search_page;
!loadCSV user_table user;

Now is the time to introduce Byzer’s core concept — Everything is a table.

Once the data is loaded into Byzer engine, you can see them as a flat table and access and manipulate data by thinking in SQL.

Byzer — Everything is a table

Step 7: SQL Time

-- Step 1: join event data with user dataselect user.*,case when home_page.page = 'home_page' then 1 else 0 end as step1,
case when search_page.page = 'search_page' then 1 else 0 end as step2,
case when payment_page.page = 'payment_page' then 1 else 0 end as step3,
case when payment_confirmation.page = 'payment_confirmation_page' then 1 else 0 end as step4
from user left join home_page
on user.user_id = home_page.user_id
left join search_page
on user.user_id = search_page.user_id
left join payment_page
on user.user_id = payment_page.user_id
left join payment_confirmation
on user.user_id = payment_confirmation.user_id
as user_events;-- Step 2: Calculate the user count for each stage of the funnelselect
"step" as step,
sum(step1) as step1_visitors,
sum(step2) as step2_product_search,
sum(step3) as step3_willing_to_pay,
sum(step4) as step4_pay
from user_events
as user_events_summary;
user_events_summary table

Fun Time — Calculate drop-off at each step

You’ll likely want to know where the drop-off between steps is the worst. To make it easier to calculate drop-off, you need to transpose columns to rows with Spark SQL stack()function. This function really comes in handy since you don't need write tons of union queries!!! Then you can use the lag() function to compare the output of each step against the one prior and the one after:

Transpose columns to rows with Spark SQL function

Here is the query:

-- Step 3: Transpose columns to rows with Spark Stack function
select
stack(4,"step1_visitors",`step1_visitors`,
"step2_product_search",`step2_product_search`,
"step3_willing_to_pay",`step3_willing_to_pay`,
"step4_pay",`step4_pay`) as (`step`,`unique_user_count`),
lag(unique_user_count,1) over (order by unique_user_count desc) as previous_step,
lag(unique_user_count,-1,step4_pay) over (order by unique_user_count desc) as next_step
from user_events_summary
as pre_funnel;
-- Step 4: Calculate drop-off at each stepselect *,
unique_user_count - next_step as drop_off_count,
round((unique_user_count - previous_step)/previous_step,2) as drop_off
from pre_funnel
as funnel;
the final output

This query suggests we should first focus on the drop-off from Homepage Visit to Product View. The drop-off is substantial and it’s near the top of our funnel. Any improvement at the top of the funnel positively impacts all down-funnel metrics.

Step 8: Funnel visualization with Python in Byzer

The plotly Python library is an interactive, open-source plotting library that supports over 40 unique chart types covering a wide range of statistical, financial, geographic, scientific, and 3-dimensional use-cases. This package contains everything you need to write figures to standalone HTML files.

Let’s check how to draw a funnel chart with Plotly.

First things first, let’s open up a terminal and install plotly using pip:

$ pip install plotly==5.6.0

Here is the Python code:

#%python
#%schema=st(field(content,string),field(mime,string))
#%env=source activate dev
#%runIn=driver
#%input=funnel
from pyjava.api.mlsql import RayContext,PythonContext
import plotly.express as px
import pandas as pd
context:PythonContext = context
ray_context = RayContext.connect(globals(),None)
df = ray_context.to_pandas()
stages = ["Homepage Visit", "Product View", "Enter Checkout", "Purchase"]
df_nextstep = pd.DataFrame(dict(number=list(df['next_step']), stage=stages))
df_nextstep['User Count'] = 'Potential Buyer Count'
df_dropoff = pd.DataFrame(dict(number=list(df['drop_off_count']), stage=stages))
df_dropoff['User Count'] = 'Drop Off User Count'
df = pd.concat([df_nextstep, df_dropoff], axis=0)
fig = px.funnel(df, x='number', y='stage', color='User Count')
fig.show()

You can copy & paste the Python code into a new code cell and hit cell run:

Voilà, FUNnel!!!

Notes:

#%python 
=> indicate this cell is a Python code cell
#%schema=st(field(content,string),field(mime,string))
=> specify the schema of a Python script's output is a table
=> for files, use #%schema=file
#%env=source activate dev
=> specify the Python virtual environment
#%runIn=driver
=> specify where the code will be excuted, driver or executor
#%input=funnel
=> specify table name for input data

What’s next

More interesting use cases coming up! Stay tuned!

Leaving feedback:

Please leave a comment here or join Slack to ask questions, get help, or discuss all things Byzer!

Last but not least, please share Byzer with data enthusiasts around you if you like this open-source project!

RELATED ARTICLE

Byzer 101 — PART 1

BI Dashboards are Creating a Technical Debt Black Hole

https://www.baeldung.com/find-java-home

Thanks for reading!

Please share, subscribe to my email list, or follow me on Medium for upcoming blogs.

--

--