How to Blend UA and GA4 Data in looker studio
You might have tried to create a Data Visualization Report utilizing both UA and GA4 Data simultaneously, and realized it wouldn't be possible due to the differences in the exportation formats between UA and GA4 for example:
GA4:
UA:
Due to these differences merging the two datasets together based on the exported data will not be possible.
Looker Studio blend:
Looker Studio allows you to blend multiple data sources based on the same logic of SQL joins.
To set the stage for a clearer explanation lets set the theme of our task:
Say we are trying to get the yearly traffic data based on the default channels, but the data is divided between both UA and GA4 (as UA is deprecated as of July 2023), how can we get the data in one chart? you are tasked to retrieve the data from UA up to 2021 and to retrieve the data from GA4 in 2022 and 2023.
Important: This solution allows you to join the data from UA and GA4, but you have to check if there are huge data discrepancies between UA and GA4 on the same Timeline, if there are comparing the data in a continuous timeline wouldn't be logical. For example, if UA recorded 200k Traffic users in 2022 and GA4 recorded 50k Traffic users in 2022, it wouldn't be logical to merge the data from both sources as the data discrepancy on the same timeline is too large to make any analysis.
Step 1: Matching Fields
Create a matching field in each of UA and GA4 data sources (Resource →Manage Added Data Sources →Edit →Add a Field), format them both to have the same values and the same field id (they can have different names).
Formatting both fields to have the same values allows the chart to join the data from the different sources into one.
Formatting both fields to have the same field ID, allows you to filter both fields using the same data control in Looker Studio (we will get it more into this later on)
Step 2: Data Blend
Create a data blend (Resource → Manage blends → Add a blend), blending will allow you to join multiple data sources into one, choose the first table to be your universal analytics and the second to be GA4, based on our task (Yearly Traffic Data by default channel) we will add the Year time variable as a dimension and the second dimension will be the field we created to join the data on. After setting the dimensions we have to choose the matching metrics from each data source (Users from UA and Total Users from GA4) (New users and sessions from both UA and GA4).
Step 3: Join Configuration
So how will the blend work? If you are familiar with SQL joins you might already have an idea, if you are not, I will provide a simple explanation of joins before we dive deeper into the join of our choice for this solution.
Traffic_ua Table:
Traffic_ga Table:
SQL joins allow you to join different tables based on a mutual condition or multiple mutual conditions, for example:
select * from traffic_ua inner join traffic_ga on traffic_ua.default_channel =traffic_ga.default_channel and traffic_ua.year_calculated = traffic_ga.year_calculated;
what this statement means:
- * = select all columns from the created joined table
- from traffic_ua inner join traffic_ga, here we state which tables are in the join and which type of join we want to use (inner join).
- on traffic_ua.default_channel =traffic_ga.default_channel and traffic_ua.year_calculated = traffic_ga.year_calculated; , here we start giving our join conditions we want to join the rows only if the default_channel value from traffic_ua table matches the default_channel value from traffic_ga table and the year_calculated value from traffic_ua table matches the year calculated value from traffic_ga table.
If we want to join these 2 datasets using SQL join there are different joins we could use:
Inner Join (or simply join): will only retrieve rows where the values of the columns used in the join condition are equal in both tables.
Left Join: will retrieve all rows from the left table anyway and only the rows from the right table where the values of the columns used in the join condition match the ones in the left table.
Right Join: The inverse of the left join, will retrieve all rows from the right table and only the rows from the left table where the values of the columns used in the join condition match the ones in the right table.
Full Outer Join: The Union of both Left and Right Joins, will retrieve all rows from both tables regardless of whether they have matching values in the join conditions or not, if they do have matching values in the join conditions they will be joined in the same row, otherwise they will have null values in the opposite table columns.
In this solution we will utilize full outer joins as we need all the data from both tables (UA and GA4):
Step 4: Chart Creation
Step A: Go to the chart Data Source and choose the blend data source that you created.
Step B: Go to the Dimension and select (Year of the UA table in our example Year of table 1) as UA table will contain all the year values while GA4 wont, as it started collecting data later on.
Step C:
Click on Breakdown Dimension, then choose Create Field.
Step D:
Format the Year DateTime as a number and create a case to use UA data( UA_blend_default_channel column from the blended table) if the year is less than 2022 and to use the GA4 data elsewise (GA4_blend_default_channel column from the blended table).
CASE
WHEN CAST(FORMAT_DATETIME("%Y",Year (Table 1) ) AS NUMBER ) < 2022 THEN UA_blend_default_channel
ELSE GA4_blend_session_default_channel
END
Step E:
Do the same thing for the metric and create a case to use UA data (Users column from the blended table) if the year is less than 2022 and to use the GA4 data elsewise (Total Users column from the blended table).
You can add other metrics and do the same process.
CASE
WHEN CAST(FORMAT_DATETIME("%Y",Year (Table 1) ) AS NUMBER ) < 2022 THEN Users
ELSE Total users
END
Tips:
- Creating the fields with the same field ID, will allow you to filter both the created fields ( UA_blend_default_channel and GA4_blend_session_default_channel in this example) using one Control field.
- Do not use Data Blends for values aggregated by the average (Sessions per user for example), as the values created in the blended table will be treated as normal values (not an average of other values) so when you calculate the average, it will just give you the average of the average which isn't accurate:
For example:
Date: 2019, Users: 10000, Sessions: 50000, Sessions per user: 5
Date: 2020, Users: 20000, Session: 300000 Sessions per user: 15
Correct calculation for the sessions per user over 2019 and 2020:
300000+50000 / (20000+10000) = 11.6
How Looker Studio Data blend will calculate it:
(5+15)/2 = 10
- In general, there will be a slight data difference between (UA, GA4) and looker studio data, in metrics like total users due to data aggregation, the example below will further explain this:
How GA4 and UA calculates the Total Users over X years:
They will account for duplicate users over the years, for example if User A appeared in 2019 and 2020, they will be counted once.
How looker Studio calculates the total Users Over X years:
It doesn't account for duplicated users over the years, as it will first join the two tables by the year, then simply sum all the values that has the same year value, unable to tell whether it contains duplicate users or not.
Overall, it causes slight differences in data.