SQL Notebooks — The EASIEST Way to Build and Visualise a Conversion Funnel
Byzer SQL Notebooks: Combining the power of Jupyter and SQL editors for data analytics
What is a Conversion Funnel?
A conversion funnel, also referred to as a site funnel, is the path to purchase in an eCommerce store or site. In some ways, it can be compared to the traditional marketing funnel, but different than a traditional funnel, most of the steps are occurring on your site.
- Conversion Funnel: How to Build, Analyze & Optimize, Dan LeBlanc
In this tutorial, we will run a funnel analysis in Byzer. By the end of this tutorial, you’ll see how seamlessly Byzer Notebook integrates with SQL and Python. Anyone with basic SQL and Python knowledge can complete this tutorial with ease.
You can Do More with Less in Byzer !
Without further ado, let’s get started.
Use Case
We will use this e-commerce website dataset throughout this tutorial. Please download the files to your laptop beforehand.
The site is very simple and has just 4 pages:
The first page is the home page. When you come to the site for the first time, you can only land on the home page as a first page.
From the home page, the user can perform a search and land on the search page.
From the search page, if the user clicks on a product, she will get to the payment page, where she is asked to provide payment information in order to buy that product.
If she does decide to buy, she ends up on the confirmation page.
The company CEO isn’t very happy with the volume of sales and, especially, of sales coming from new users. Therefore, she asked you to investigate whether there is something wrong in the conversion funnel or, in general, if you could suggest how conversion rate can be improved.
Step 1: Install Byzer on your laptop
Please follow this guide to complete the installation.
Step 2: Use Java to find JAVA_HOME
-- For Linux and macOS, we'll use grep:
java -XshowSettings:properties -version 2>&1 > /dev/null | grep ‘java.home’
Step 3: Update .mlsql.config
-- add this line to .mlsql.config file and save it
-- JDK 8 is recommended
java.home= <Your JAVA_HOME PLACEHOLDER>
Step 4: Start Ray runtime
conda activate dev
ray start --head
Step 5: Import sample dataset
Create a new folder called sample_data/funnel_analysis, and move the downloaded files to this folder.
Step 6: Load CSV files into Byzer Engine
Let’s create a micro called loadCSV.
Macro are pieces of code that can be reused multiple times — they are analogous to “functions” in other programming languages, and are extremely useful if you find yourself repeating code across multiple places.
In this case, there are five files to be loaded. To make our life easier, we’ll create a micro called loadCSV and then call this micro to load all files separately.
--create a micro called loadCSV
set loadCSV = '''
load csv.`./sample_data/funnel_analysis/{0}.csv`
where header="true"
as {1}
''';--load each CSV file into Byzer engine as a table object
!loadCSV home_page_table home_page;
!loadCSV payment_confirmation_table payment_confirmation;
!loadCSV payment_page_table payment_page;
!loadCSV search_page_table search_page;
!loadCSV user_table user;
Now is the time to introduce Byzer’s core concept — Everything is a table.
Once the data is loaded into Byzer engine, you can see them as a flat table and access and manipulate data by thinking in SQL.
Step 7: SQL Time
-- Step 1: join event data with user dataselect user.*,case when home_page.page = 'home_page' then 1 else 0 end as step1,
case when search_page.page = 'search_page' then 1 else 0 end as step2,
case when payment_page.page = 'payment_page' then 1 else 0 end as step3,
case when payment_confirmation.page = 'payment_confirmation_page' then 1 else 0 end as step4from user left join home_page
on user.user_id = home_page.user_idleft join search_page
on user.user_id = search_page.user_idleft join payment_page
on user.user_id = payment_page.user_idleft join payment_confirmation
on user.user_id = payment_confirmation.user_idas user_events;-- Step 2: Calculate the user count for each stage of the funnelselect
"step" as step,
sum(step1) as step1_visitors,
sum(step2) as step2_product_search,
sum(step3) as step3_willing_to_pay,
sum(step4) as step4_pay
from user_eventsas user_events_summary;
Fun Time — Calculate drop-off at each step
You’ll likely want to know where the drop-off between steps is the worst. To make it easier to calculate drop-off, you need to transpose columns to rows with Spark SQL stack()
function. This function really comes in handy since you don't need write tons of union queries!!! Then you can use the lag()
function to compare the output of each step against the one prior and the one after:
Here is the query:
-- Step 3: Transpose columns to rows with Spark Stack function
select
stack(4,"step1_visitors",`step1_visitors`,
"step2_product_search",`step2_product_search`,
"step3_willing_to_pay",`step3_willing_to_pay`,
"step4_pay",`step4_pay`) as (`step`,`unique_user_count`),lag(unique_user_count,1) over (order by unique_user_count desc) as previous_step,
lag(unique_user_count,-1,step4_pay) over (order by unique_user_count desc) as next_stepfrom user_events_summary
as pre_funnel;-- Step 4: Calculate drop-off at each stepselect *,
unique_user_count - next_step as drop_off_count,
round((unique_user_count - previous_step)/previous_step,2) as drop_offfrom pre_funnel
as funnel;
This query suggests we should first focus on the drop-off from Homepage Visit to Product View. The drop-off is substantial and it’s near the top of our funnel. Any improvement at the top of the funnel positively impacts all down-funnel metrics.
Step 8: Funnel visualization with Python in Byzer
The plotly
Python library is an interactive, open-source plotting library that supports over 40 unique chart types covering a wide range of statistical, financial, geographic, scientific, and 3-dimensional use-cases. This package contains everything you need to write figures to standalone HTML files.
Let’s check how to draw a funnel chart with Plotly.
First things first, let’s open up a terminal and install plotly
using pip
:
$ pip install plotly==5.6.0
Here is the Python code:
#%python
#%schema=st(field(content,string),field(mime,string))
#%env=source activate dev
#%runIn=driver
#%input=funnelfrom pyjava.api.mlsql import RayContext,PythonContext
import plotly.express as px
import pandas as pdcontext:PythonContext = context
ray_context = RayContext.connect(globals(),None)
df = ray_context.to_pandas()stages = ["Homepage Visit", "Product View", "Enter Checkout", "Purchase"]
df_nextstep = pd.DataFrame(dict(number=list(df['next_step']), stage=stages))
df_nextstep['User Count'] = 'Potential Buyer Count'df_dropoff = pd.DataFrame(dict(number=list(df['drop_off_count']), stage=stages))
df_dropoff['User Count'] = 'Drop Off User Count'df = pd.concat([df_nextstep, df_dropoff], axis=0)
fig = px.funnel(df, x='number', y='stage', color='User Count')
fig.show()
You can copy & paste the Python code into a new code cell and hit cell run:
Voilà, FUNnel!!!
Notes:
#%python
=> indicate this cell is a Python code cell#%schema=st(field(content,string),field(mime,string))
=> specify the schema of a Python script's output is a table
=> for files, use #%schema=file#%env=source activate dev
=> specify the Python virtual environment #%runIn=driver
=> specify where the code will be excuted, driver or executor#%input=funnel
=> specify table name for input data
What’s next
More interesting use cases coming up! Stay tuned!
Leaving feedback:
Please leave a comment here or join Slack to ask questions, get help, or discuss all things Byzer!
Last but not least, please share Byzer with data enthusiasts around you if you like this open-source project!
RELATED ARTICLE
BI Dashboards are Creating a Technical Debt Black Hole
https://www.baeldung.com/find-java-home
Thanks for reading!
Please share, subscribe to my email list, or follow me on Medium for upcoming blogs.