Google Analytics sequence segments with BigQuery

Edward Bunzl
Converteo

--

BigQuery is a great tool to bring Google Analytics analysis to the next level. GA360 users have the possibility to extract GA’s raw data daily to a BigQuery table. In GA4 (the new Google Analytics version), this feature will also be available to free users.

The raw data extract lets you manipulate easily all the dimensions, metrics and concepts you would find in Google Analytics. There are a few exceptions though like custom channel groupings and segments.

We will focus on this article on how to recreate sequence segments in BigQuery. This article requires a first understanding of querying Google Analytics data from BigQuery. Some useful links are provided at the end of the article to help you getting started.

Use case presentation

Our objective is to recreate a sequence segment that checks the number of sessions that visit two pages:

  • The login page, identified by its page name “/login_page”
  • A dashboard page, for which the page name contains the pattern “dashboard”

Condition only segments

First of all, let’s see how to retrieve users or sessions in BigQuery:

  • For users we have the fullVisitorId field that matches GA
  • For sessions a concatenation between fullVisitorId and visitId is necessary

From now on we will write session-based queries but you can use them with fullVisitorId if you want to obtain users.

Our next step is to check for each session if it matches our segment’s condition. Since this check is done on all pageview hits we need

  • to extract all hits via the UNNEST(hits) command
  • to use an aggregation function. The one you can use is LOGICAL_OR, if you are more confident with manipulate 0/1 than true/false you could use an equivalent with MAX(IF()) syntax that I added as a comment in the SQL query

To obtain our final segment result i.e. a number of sessions, we will have to sum all rows. The aggregation function to be used differs depending on if you have used LOGICAL_OR() or MAX(IF()) previously:

If you are proficient in SQL you can also opt for shorter code version that checks hit by hit if the condition is meant and returns the sessionId, before counting the number of distinct sessionIds that are left:

Sequence segments

Now to the fun part.

We will cover here sequence segments that allow to count sessions that visit the login page and the dashboard page later during navigation, without the need of those two pages to be consecutive. This is often how sequence segments on pages are used (for consecutive pages there is the previous page path dimension in native reports that can be used instead).

The Google Analytics segment configuration is the following:

As you can see for the first page we are looking for an exact match and for the second one the page path only needs to contain “dashboard”, this is to bring some variety to our SQL syntax.

Time of condition sub-tables

Our first task is to isolate for each session a table with the times of completion of our first and second conditions.

To get an exact timestamp you can use the below formula. Note that visitStartTime is in seconds whereas hits.time is in milliseconds, hence the multiplication. Also, we cannot use hits.time alone since it only holds the information of the time elapsed from the beginning of the session.

visitStartTime * 1000 + hits.time

From there, we can work our way to getting the timestamps for each hit that satisfies our first condition:

Joining event table and comparing timestamps

We can now create a similar second table and join both on SessionId and if the timestamp of our first event is before the second.

Please note that we cannot use USING in our JOIN as it can only be used to match equality conditions.

And the result checks out with Google Analytics!

Closing notes

Writing this article I stumbled on a few limits and tricks that are good to keep in mind

  • When checking GA data with my segment, native report could present an incorrect number of users due to sampling
  • Do not forget to use totals.visits = 1 when checking sessions in BigQuery as explained in this article

Some useful resources for using BigQuery with GA data

--

--

Edward Bunzl
Converteo

I’m a Digital & Data Manager & Converteo, working on lean tracking solutions and tailor made data-engineering and reporting implementations.