How Do I Create a Closed Funnel in Google Analytics for Firebase Using BigQuery?

Todd Kerpelman
Firebase Developers
13 min readSep 4, 2018

--

Big Important Update: As of August 1st, 2019, you can now upgrade your Firebase project to the full Google Analytics experience which means that you can create closed funnels for your Firebase project in the GA console without ever having to do anything in BigQuery. Still, this post might come in handy if you want to do more advanced analysis, or are just trying to figure out to use BigQuery.

Lesson 1: Let’s build a two-step funnel!

So there are a lot of nice reports that you get out of the box using Google Analytics for Firebase. Everything from the Dashboard to the real-timey StreamView to their latest release, the… uh… Latest Release report.

But sometimes you need to dig deeper into the data than what you get from the Firebase Console. And for that, the folks at Firebase have always recommended linking your Analytics data to BigQuery, so you can slice-and-dice all the raw event data to your heart’s content. Of course, that’s all well and good, but going from “Clicking on a link to see pretty graphs” to “Writing a SQL query that will parse gigs of data” is a bit of a step up in difficulty.

So let’s see if we can learn a little bit of BigQuery querying by covering some common tasks you might want to perform, and we’ll start with one of the most common: Creating a Closed Funnel.

Open vs Closed Funnels

If you look at the funnel reports in that you get today from the Firebase console, they’re all open funnels, meaning that a user who triggers event B in the funnel will still be counted, even if they never triggered event A. So I tend to think of them less as funnels and more like a bunch of event counts that are conveniently displayed next to each other.

Now, in many cases, these are still quite useful. If you’ve got a “Tutorial started” and “Tutorial complete” event, it’s probably going to be impossible for a user to trigger tutorial_complete without triggering the tutorial_started event first. So your results will be the same whether or not you have an open or closed funnel, and creating an open funnel from the Firebase console is a great solution.

But in some cases, they can be a little less intuitive. Let’s take a look at one example from our public demo project, a game called Flood-it.

She might not look like much, but she’s got it where it counts, kid.

One day, our producer comes in and wants to know how many times users encounter a “no more extra steps” dialog (marked by the no_more_extra_steps event), and then decide to spend virtual currency (spend_virtual_currency).

Now if you went ahead and tried building this funnel in the Firebase console, you’d get results that look a little like this:

Clearly, this report is misleading. The issue is that we’re counting every time a user spent virtual currency, whether or not they did it after seeing the “No more extra steps” dialog. We need to only count events where users first encountered the dialog, and then spent money.

So let’s see if we can delve into BigQuery and figure this out.

For this example, I’m going to be running all of our queries against our public Analytics data set. You can go ahead and make these same queries yourself against this same data set, or run a modified version against your own app’s data if you want something more relevant to you.

Note: All of the queries in this blog post are run using Standard SQL, which is the default in the fancy new UI. If you’re still using the old UI, make sure you uncheck “Use legacy SQL” from the options.

Let’s start by taking a look at our user’s journey as they use this app. Go ahead and run this query, and browse the results.

SELECT event_name, user_pseudo_id, event_timestamp
FROM `firebase-public-project.analytics_153293282.events_20180720`
ORDER BY 2,3
LIMIT 1000

What you get is something like this:

user_pseudo_id is a unique ID created for any instance of an app running on a particular device. It remains the same for all gameplay sessions, but changes from app to app and from device to device. You can think of it like a user ID that’s been built in a privacy-respecting way that still lets you group events together that belong to the same user.

By browsing this data, ordered first by this user ID, then the time the event occurred, you can kind of get a sense of what our users are up to, and the events that they trigger as they play our game.

Note: As you’re browsing our sample data, you might notice that the order of events don’t always seem that logical. This is because the sample data is the “Sanitized for public consumption” version of the data. And that means we’ve deleted some values, scrambled others, and, in the case of our timestamps, added a few thousand milliseconds of “fuzz” to make our lawyers happy. So while I promise these code samples are correct, the results in the public sample data can sometimes be a little wonky.

This is the start of something helpful, but let’s focus a bit more on building our funnel. Run this query to look at just the two events we’re interested in.

SELECT event_name, user_pseudo_id , event_timestamp
FROM `firebase-public-project.analytics_153293282.events_20180720`
WHERE (event_name = "no_more_extra_steps" OR event_name = "spend_virtual_currency")
ORDER BY 2,3

So this is a good start — we can still see the individual user journeys through this table, but now we’re focused on just these two events. In fact, it looks like our good friend user 2A50F2A460... hit a “no more extra steps” event, then spent some virtual currency shortly afterwards in rows 4 and 5.

But browsing through our table manually like this isn’t going to scale. How can we get BigQuery to do this for us?

Well, let’s take a look at my favorite function for building funnels, LEAD OVER. Give this query a try:

SELECT event_name, user_pseudo_id , event_timestamp,
LEAD(event_name, 1) OVER (ORDER BY user_pseudo_id, event_timestamp) AS next_event
FROM `firebase-public-project.analytics_153293282.events_20180720`
WHERE (event_name = "no_more_extra_steps" OR event_name = "spend_virtual_currency")
ORDER BY 2,3

What LEAD OVER is doing is taking your current results and saying, “Okay, if I were to order these results by user_pseudo_id and event_timestamp, what would the value of event_name be if I looked at the row one space ahead of the current one?” Well, it turns out that order is the same order that I’m using in my main results, so we basically get to see the name of the event that occurs one row down.

As you can see here in the output, it’s basically telling you what event the user triggers next as they’re using your app.

This works nicely, except that it kinda fails when we switch users. User 15955C277.. in row 1, for instance, doesn’t trigger spend_virtual_currency followed by spend_virtual_currency. That second event belongs to a completely different user.

Luckily, LEAD OVER has a way to fix this with the PARTITION BY argument. Try out this query instead:

SELECT event_name, user_pseudo_id , event_timestamp,
LEAD(event_name, 1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS next_event
FROM `firebase-public-project.analytics_153293282.events_20180720`
WHERE (event_name = "no_more_extra_steps" OR event_name = "spend_virtual_currency")
ORDER BY 2,3

This says, “Okay, if I were to take these results and split them up into lots of tiny little tables, broken up by user_pseudo_id, then order those tables by their timestamp, what’s the value of event_name one row ahead?”

So you can see in the results that we now have a table that lists every no_more_extra_steps or spend_virtual_currency event the user encountered, along with the one immediately after that, as long as it belongs to the same user.

Well, if you think about it, this is basically all the information we need to build our funnel. If we consider step 1 of our funnel “Every no_more_extra_steps event”, and step 2 of our funnel “Every spend_virtual_currency event that was preceded by a no_more_extra_steps event”, we can essentially get that info in our table with a simple if clause. Give this query a try:

SELECT *,
IF (event_name = "no_more_extra_steps", 1, 0) AS funnel_1,
IF (event_name = "no_more_extra_steps" AND next_event = "spend_virtual_currency", 1, 0) AS funnel_2
FROM (
SELECT event_name, user_pseudo_id , event_timestamp,
LEAD(event_name, 1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS next_event
FROM `firebase-public-project.analytics_153293282.events_20180720`
WHERE event_name = "no_more_extra_steps" OR event_name = "spend_virtual_currency"
ORDER BY 2,3
)

Notice that we’ve got a nested query going here. This is another technique I end up using a lot with BigQuery. It basically takes the results from one query and treats it just like another table that you can then query from. A lot of our official sample queries are nested queries, and while they sometimes look big and scary at first, once you break them down into their pieces, they’re actually pretty easy to understand.

Also, note that we don’t need the * wildcard as part of the SELECT statement in our outermost query. I’m just including that in my sample because it makes the output easier to understand. But in general, we should get into the habit of only grabbing the data we need to keep queries fast and pricing low. So in the next step, I’m going to remove it.

So voila! We now have a list of all of our funnel_1 and funnel_2 events in our app! Getting totals is as simple as summing them up…

SELECT sum(funnel_1) as funnel_1_total, sum(funnel_2) as funnel_2_total FROM (
SELECT
IF (event_name = "no_more_extra_steps", 1, 0) as funnel_1,
IF (event_name = "no_more_extra_steps" AND next_event = "spend_virtual_currency", 1, 0) AS funnel_2
FROM (
SELECT event_name, user_pseudo_id , event_timestamp,
LEAD(event_name, 1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS next_event
FROM `firebase-public-project.analytics_153293282.events_20180720`
WHERE event_name = "no_more_extra_steps" OR event_name = "spend_virtual_currency"
ORDER BY 2,3
)
)

…or if you want one fewer nested query, you can replace the funnel_1 / funnel_2 definitions with a COUNTIF function that adds ’em up.

SELECT 
COUNTIF (event_name = "no_more_extra_steps") as funnel_1_total,
COUNTIF (event_name = "no_more_extra_steps" AND next_event = "spend_virtual_currency") AS funnel_2_total
FROM (
SELECT event_name, user_pseudo_id , event_timestamp,
LEAD(event_name, 1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS next_event
FROM `firebase-public-project.analytics_153293282.events_20180720`
WHERE event_name = "no_more_extra_steps" OR event_name = "spend_virtual_currency"
ORDER BY 2,3
)

According to our results, out of the 25 times our users got the “no more extra steps” dialog, they spent virtual currency 22 times afterwards.

So this is a good way to calculate a two-step funnel, but there are three changes here that I might want to make, depending on the situation and the specific circumstances in my app.

Change #1: Let’s count users!

These results are counting total events, not total users. This is what our producer asked for, but we should be prepared to count total users as well in case they change their mind later.

To count total users, I think the easiest way would be to replace our funnel_1 and funnel_2 values with the user ID instead of some simple 1 and 0s…

SELECT *,
IF (event_name = "no_more_extra_steps", user_pseudo_id, NULL) as funnel_1,
IF (event_name = "no_more_extra_steps" AND next_event = "spend_virtual_currency", user_pseudo_id, NULL) AS funnel_2
FROM (
SELECT event_name, user_pseudo_id , event_timestamp,
LEAD(event_name, 1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS next_event
FROM `firebase-public-project.analytics_153293282.events_20180720`
WHERE event_name = "no_more_extra_steps" OR event_name = "spend_virtual_currency"
ORDER BY 2,3
)

Then we can use COUNT(DISTINCT xxx) which will count the distinct non-null values from each of these two columns.

SELECT COUNT(DISTINCT funnel_1) AS f1_users, 
COUNT(DISTINCT funnel_2) AS f2_users
FROM (

SELECT
IF (event_name = "no_more_extra_steps", user_pseudo_id, NULL) as funnel_1,
IF (event_name = "no_more_extra_steps" AND next_event = "spend_virtual_currency", user_pseudo_id, NULL) AS funnel_2
FROM (
SELECT event_name, user_pseudo_id , event_timestamp,
LEAD(event_name, 1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS next_event
FROM `firebase-public-project.analytics_153293282.events_20180720`
WHERE event_name = "no_more_extra_steps" OR event_name = "spend_virtual_currency"
ORDER BY 2,3
)
)

Change #2: Make our funnels time-sensitive

Our funnel doesn’t really take time into account. If our user runs out of extra steps, but 5 hours later in a completely different part of the game, decides to spend virtual coins, they’ll still be included in our funnel. Depending on the situation, it might be nice to only count events that are, say, less than a minute apart.

To add some time restrictions to our funnels, I’m going to do two things. First, let’s add a next_timestamp field the same way we added a next_event field:

SELECT event_name, user_pseudo_id , event_timestamp,
LEAD(event_name, 1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS next_event,
LEAD(event_timestamp, 1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS next_timestamp
FROM `firebase-public-project.analytics_153293282.events_20180720`
WHERE event_name = "no_more_extra_steps" OR event_name = "spend_virtual_currency"
ORDER BY 2,3

Then, when determining whether an event qualified as a funnel_2 type of event, we can change our IF statement to look at not just the two events, but the difference in time between them. Timestamps are measured in microseconds, so one minute is 60 * 1000 * 1000.

SELECT SUM(funnel_1) as f1_total, SUM(funnel_2) as f2_total FROM (
SELECT
if (event_name = "no_more_extra_steps", 1, 0) AS funnel_1,
if (event_name = "no_more_extra_steps" AND next_event = "spend_virtual_currency" AND next_timestamp - event_timestamp < 60 * 1000 * 1000, 1, 0) AS funnel_2
FROM (
SELECT event_name, user_pseudo_id , event_timestamp,
LEAD(event_name, 1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS next_event,
LEAD(event_timestamp, 1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS next_timestamp
FROM `firebase-public-project.analytics_153293282.events_20180720`
WHERE event_name = "no_more_extra_steps" OR event_name = "spend_virtual_currency"
ORDER BY 2,3
)
)

Notice that this decreased the second part of our funnel pretty significantly, but is probably more accurate in terms of finding out what our producer wanted to know.

And this can be easily modified to count users instead of events by using the same technique as above:

SELECT COUNT(DISTINCT funnel_1) as f1_users, COUNT(DISTINCT funnel_2) as f2_users FROM (
SELECT
if (event_name = "no_more_extra_steps", user_pseudo_id, NULL) AS funnel_1,
if (event_name = "no_more_extra_steps" AND next_event = "spend_virtual_currency" AND next_timestamp - event_timestamp < 60 * 1000 * 1000, user_pseudo_id, NULL) AS funnel_2
FROM (
SELECT event_name, user_pseudo_id , event_timestamp,
LEAD(event_name, 1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS next_event,
LEAD(event_timestamp, 1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS next_timestamp
FROM `firebase-public-project.analytics_153293282.events_20180720`
WHERE event_name = "no_more_extra_steps" OR event_name = "spend_virtual_currency"
ORDER BY 2,3
)
)

Change #3: Let’s look at a 30-day window

In all of these queries, we’ve really only been looking at data from one day. In most situations, we’ll want to look at data across several days.

Luckily, searching over multiple days is super easy. We just replace our table date with a wildcard, and use the _TABLE_SUFFIX operator, which lets us (inclusively) select from multiple tables. Here’s a query that calculates our funnel for events across 30 days, while still keeping that one minute time limit:

SELECT SUM(funnel_1) as f1_total, SUM(funnel_2) as f2_total FROM (
SELECT
if (event_name = "no_more_extra_steps", 1, 0) AS funnel_1,
if (event_name = "no_more_extra_steps" AND next_event = "spend_virtual_currency" AND next_timestamp - event_timestamp < 60 * 1000 * 1000, 1, 0) AS funnel_2
FROM (
SELECT event_name, user_pseudo_id , event_timestamp,
LEAD(event_name, 1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS next_event,
LEAD(event_timestamp, 1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS next_timestamp
FROM `firebase-public-project.analytics_153293282.events_*`
WHERE (event_name = "no_more_extra_steps" OR event_name = "spend_virtual_currency")
AND _TABLE_SUFFIX BETWEEN '20180629' AND '20180728'
ORDER BY 2,3
)
)

Keep in mind that BigQuery charges you for the amount of data it processes during a query, so I generally prefer to develop and test my queries against a single day’s worth of data, then run it against a month of data once I’m ready to run my query “for real”. Also, if you’re serious about keeping your BigQuery costs low (and you probably should be), consider setting some custom quotas against your project.

So there ya go! That’s how I’ve been able to build a closed funnel in BigQuery, and I think it’s general-purpose enough that you could apply it to your own apps. But if there’s anything I’ve learned about BigQuery and SQL in general, it’s that there’s more than one way of doing things, and the way I’m doing it is probably inefficient. So if you think you have a better solution or just a different approach with different pros-and-cons, let me know! I’d love to hear about it.

Quick Edit: Interested building a multi-part funnel? Check out part 2 of this series!

--

--

Todd Kerpelman
Firebase Developers

Former Game Designer. Current Developer Advocate at Google. Dad. Often sleep deprived.