Google Analytics sequence segments with BigQuery
BigQuery is a great tool to bring Google Analytics analysis to the next level. GA360 users have the possibility to extract GA’s raw data daily to a BigQuery table. In GA4 (the new Google Analytics version), this feature will also be available to free users.
The raw data extract lets you manipulate easily all the dimensions, metrics and concepts you would find in Google Analytics. There are a few exceptions though like custom channel groupings and segments.
We will focus on this article on how to recreate sequence segments in BigQuery. This article requires a first understanding of querying Google Analytics data from BigQuery. Some useful links are provided at the end of the article to help you getting started.
Use case presentation
Our objective is to recreate a sequence segment that checks the number of sessions that visit two pages:
- The login page, identified by its page name “/login_page”
- A dashboard page, for which the page name contains the pattern “dashboard”
Condition only segments
First of all, let’s see how to retrieve users or sessions in BigQuery:
- For users we have the fullVisitorId field that matches GA
- For sessions a concatenation between fullVisitorId and visitId is necessary
From now on we will write session-based queries but you can use them with fullVisitorId if you want to obtain users.
Our next step is to check for each session if it matches our segment’s condition. Since this check is done on all pageview hits we need
- to extract all hits via the UNNEST(hits) command
- to use an aggregation function. The one you can use is LOGICAL_OR, if you are more confident with manipulate 0/1 than true/false you could use an equivalent with MAX(IF()) syntax that I added as a comment in the SQL query
To obtain our final segment result i.e. a number of sessions, we will have to sum all rows. The aggregation function to be used differs depending on if you have used LOGICAL_OR() or MAX(IF()) previously:
If you are proficient in SQL you can also opt for shorter code version that checks hit by hit if the condition is meant and returns the sessionId, before counting the number of distinct sessionIds that are left:
Sequence segments
Now to the fun part.
We will cover here sequence segments that allow to count sessions that visit the login page and the dashboard page later during navigation, without the need of those two pages to be consecutive. This is often how sequence segments on pages are used (for consecutive pages there is the previous page path dimension in native reports that can be used instead).
The Google Analytics segment configuration is the following:
As you can see for the first page we are looking for an exact match and for the second one the page path only needs to contain “dashboard”, this is to bring some variety to our SQL syntax.
Time of condition sub-tables
Our first task is to isolate for each session a table with the times of completion of our first and second conditions.
To get an exact timestamp you can use the below formula. Note that visitStartTime is in seconds whereas hits.time is in milliseconds, hence the multiplication. Also, we cannot use hits.time alone since it only holds the information of the time elapsed from the beginning of the session.
visitStartTime * 1000 + hits.time
From there, we can work our way to getting the timestamps for each hit that satisfies our first condition:
Joining event table and comparing timestamps
We can now create a similar second table and join both on SessionId and if the timestamp of our first event is before the second.
Please note that we cannot use USING in our JOIN as it can only be used to match equality conditions.
And the result checks out with Google Analytics!
Closing notes
Writing this article I stumbled on a few limits and tricks that are good to keep in mind
- When checking GA data with my segment, native report could present an incorrect number of users due to sampling
- Do not forget to use totals.visits = 1 when checking sessions in BigQuery as explained in this article
Some useful resources for using BigQuery with GA data
- The following website lists all queries to obtain usual dimensions and metrics
- The official documentation is really well written. I recommend the aggregate function page and any page that lists functions you can use with a particular data type (example with strings)
- You should also give a look to the Google Analytics export schema