Using Javascript to Create an Acquisition Funnel Dataset
I work as a data analyst on the growth team at Echo, where a key goal is to improve signup conversion rates. We use acquisition funnel datasets to get visibility into where users are dropping off in the sign up process. To manually write all of the SQL queries needed for this would be time consuming and difficult to manage. Using a combination of SQL and JavaScript makes this process much easier.
I use Dataform to do this, which provides a place to write and save the queries, and runs them on schedules which publish up to date data every hour to a table in our warehouse (BigQuery). This table can then be queried for analyses and to create visualisations.
This post aims to explain the steps to create this dataset.
Step 1: Define the Funnel Steps
Firstly, we need to set up a data structure that defines each stage of the funnel.
In a file called funneldata.js
, a list is created that will contain one entry for each funnel. Each funnel has a name (e.g. signup_funnel), and a list of steps (e.g. landing, start_signup, verify_email). Each step has a name, and a list of tracks that indicate the user has reached that stage of the funnel. In our case, we use specific frontend tracks that we get from Segment. Some steps have multiple tracks because there are multiple ways a user could reach that step - for example on the landing step, users can either reach it through landing on the homepage, on web, or installing the app on mobile.
const funnels = [
{
name: "signup_funnel",
steps: [
{
name: "landing",
events: ["application_installed", "home_page"]
},
{
name: "start_signup",
events: ["signup_started"]
},
{
name: "verify_email",
events: ["email_verified"]
},
{
name: "signup_complete",
events: ["signup_complete"]
}
]
}
];module.exports = funnels;
Step 2: Create One Table per Step
Next, we create another file called funnel_steps.js
. This creates one table per funnel step, which has one row per user, and records when they first reached that step of the funnel.
- First, we reference the
funneldata.j
s file that we created in the previous step.
const funnels = require("./funneldata.js");
2. Then we use the ForEach
JavaScript function to loop through each funnel (we only have one for now, but we may add more later) and then loop through each step in the funnel.
funnels.forEach(funnel => {
funnel.steps.forEach((step, i) => {
3. For each step, we use the publish and config functions (Dataform-specific functions) to create a table in our data warehouse with a unique name (the funnel name & step name combined).
funnels.forEach(funnel => {
funnel.steps.forEach((step, i) => {
4. We add a variable to the SQL query which means for every step after the first, the where clause will have an additional filter on the user id being in the previous step. This ensures that the funnel is sequential, i.e. users have to have hit the first step to be recorded in the second step, and so on. The ${ctx.ref( )}
syntax is another Dataform-specific feature, which ensures that the previous step’s table is published before the current table, to ensure data is up to date.
let where = "";
if (i > 0) {
let previousStep = `${funnel.name}_${funnel.steps[i - 1].name}`;
where = `and user_id in (SELECT user_id from ${ctx.ref(previousStep)})`;
}
5. Lastly we write the SQL query, plugging in the variables. The bulk of the query is hard-coded as it’s the same for each funnel step. The variables are the parts that can change for each step — the step name, step number, the frontend tracks for the step that we defined in funneldata.js, and the optional additional filter using the where clause.
return `
SELECT * EXCEPT(row_number) FROM (
SELECT
user_id,
"${step.name}" as step_name,
${i} as step_number,
timestamp as step_started_at,
row_number() OVER (PARTITION BY user_id ORDER BY timestamp asc) as row_number
FROM ${ctx.ref("segment")}
where lower(event) in ("${step.events.join('", "')}")
${where}
GROUP BY 1,2,3,4
)
where row_number = 1
`
}
);
});
}
This is the SQL that the JavaScript generates for the first step:
SELECT * EXCEPT(row_number) FROM (
SELECT
user_id,
"landing" as step_name,
0 as step_number,
timestamp as step_started_at,
row_number() OVER (PARTITION BY user_id ORDER BY timestamp asc) as row_number
FROM `api.segment`
WHERE lower(event) in ("application_installed", "home_page")
GROUP BY 1,2,3,4
)
where row_number = 1
This is an example of the SQL generated by the JavaScript for the subsequent steps:
SELECT * EXCEPT(row_number) FROM (
SELECT
user_id,
"start_signup" as step_name,
1 as step_number,
timestamp as step_started_at,
row_number() OVER (PARTITION BY user_id ORDER BY timestamp asc) as row_number
FROM `api.segment`
where lower(event) in ("signup_started")
and user_id in (SELECT user_id from `funnel_data.signup_funnel_landing`)
GROUP BY 1,2,3,4
)
where row_number = 1
Step 3: Combine the Data Into One Funnel Table
Finally, we combine the data from each of the step tables into one table that summarises the funnel — this is the one we query for analyses. We do this in a file called funnel_combine.js
.
Again, we start by referencing the funneldata.js
file. Then we use the ForEach
function to loop through each funnel (we only have one for now). A select *
statement for each step is created, and they're joined with union all
. The publish
and config
functions are again used to create a table in the warehouse.
const funnels = require('./funneldata.js');funnels.forEach(funnel => {
publish(`${funnel.name}`, {
type: "table",
schema: "funnel_test"
})
.query(ctx =>
funnel.steps
.map(step => `select * from ${ctx.ref(`${funnel.name}_${step.name}`)}`)
.join(`\n union all \n`)
);
})
This is the SQL that the Javascript generates:
select * from `funnels.signup_funnel_landing`
union all
select * from `funnels.signup_funnel_start_signup`
union all
select * from `funnels.signup_funnel_verify_email`
union all
select * from `funnels.signup_funnel_signup_complete`
Running the Javascript in the final file publishes a table that looks like this to the data warehouse:
This dataset can be used to quickly get insights on activity during sign up, such as what % of users that start the funnel get to each step, and how long users spend on each step. Future funnels can be created easily too — all that needs to be done is to define the steps in the funneldata.js
file, and to run the funnel_combine.js
script. Anyone that wants to create a funnel dataset can do so now - no prior SQL experience is needed!
Dataform makes it super easy to use JavaScript to define simple scripts, constants or macros that can be reused across your project, saving you time writing out long SQL queries. Sign up for a free Dataform account here.
Originally published at https://dataform.co.