Unifying Your Google Analytics Data — From UA to GA4 and Beyond

ali izadi
13 min readMay 28, 2024

--

In marketing, data is everything. It’s the fuel that powers our campaigns, the compass that drives our decisions, and ultimately determines our success. However, the landscape of analytics platforms is constantly evolving, presenting marketers and analysts with the challenge of navigating fragmented data sources and ensuring the continuity of their insights.

We’ve spent years meticulously collecting data in Universal Analytics, only to learn that come July 1st, it will vanish into the digital ether, never to be seen again. Meanwhile, in Google Analytics 4 (GA4), we’re left waiting hours for insights that should be instantaneous, hindering your ability to trust our reports and optimize campaigns on the fly.

It’s a marketer’s nightmare, but there’s a solution on the horizon.

Today, we’re unveiling a comprehensive solution to tackling data integration challenges head-on.

Enter Continuous Analytics Bridge (CAB) — a robust data unification platform designed to address the challenges of integrating analytics data from UA, GA4, and beyond. CAB aims to provide a seamless and efficient way to preserve historical data, enable real-time analysis, and unlock the full potential of your analytics insights.

In this article, we will explore the technical intricacies of CAB, delving into its architecture, data integration strategies, and the powerful tools it leverages to deliver a unified analytics experience. We will also discuss the benefits of adopting CAB and address common concerns related to data integration, cost-effectiveness, and scalability.

Facing Data Head-On: A Marketer’s Dilemma

Before we uncover the solution, let’s address the obstacles marketers encounter daily. Imagine waking up July 1st to find years of Universal Analytics data have vanished into thin air. But there’s more.

1. The UA Data Deadline: Don’t Lose Your Insights!

Your trusty old Universal Analytics data is on borrowed time, set to disappear forever on July 1st. It’s like losing a key to a vault filled with years of marketing insights.

CSV exports? About as helpful as a broken compass in a storm — vague, confusing, and utterly useless.

Saying goodbye to your data? Not on your watch.

2. Real-Time Frustration: Waiting for GA4 Insights

In some cases, every minute counts. But what if your insights are stuck in a time warp, taking ages to materialize in Google Analytics 4 (GA4)? It’s like waiting for a letter in the age of email — slow, outdated, and downright frustrating. Sure, BigQuery can help speed things up, but what about the crucial data before you made the connection?

3. The Missed Connection: GA4 BigQuery Connector Woes

GA4’s BigQuery connector sounds like a dream come true, right? But what if you miss the connection on day zero? How do you fill in the gaps? It’s like missing the train to your dream destination — frustrating, disappointing, and potentially disastrous for your marketing campaigns.

4. The Data Retention Limit: The Clock is Ticking in GA4

Just when you thought the challenges with GA4 couldn’t get more daunting, there’s another hurdle: data retention limits. Unlike the seemingly endless archives of Universal Analytics, GA4 only retains data for either 14 months or a brief 2-month span in its free version.

This means that, at best, you cannot access granular data and explore dashboards with data older than 15 months. This restriction can feel like a major setback, as if historical insights — crucial for long-term trend analysis and strategic planning — are slipping through your fingers.

Introducing the Ultimate Solution: Continuous Analytics Bridge

Fragmented analytics platforms and disappearing insights are a marketer’s worst nightmare. But don’t worry, because the era of uncertainty ends today. Meet Continuous Analytics Bridge — a comprehensive data unification platform that’s set to revolutionize the way you report on key web metrics.

First, let’s address the past. You may recall my previous GitHub repositories — Export from Universal Analytics to BigQuery and Backfill-GA4-to-BigQuery. These open-source solutions were designed to backfill the gaps between analytics platforms, ensuring your data remained intact and accessible.

While they served their purpose admirably, I realized something crucial: complexity breeds confusion.

That’s where Continuous Analytics Bridge steps in. Gone are the days of navigating through technical scripts and disparate data sources. With CAB , complexity is replaced by simplicity, offering marketers an all-encompassing data storage and reporting solution that covers from past to now, from day zero of starting analytics in Universal Analytics to a mere minute before.

But there’s more. Continuous Analytics Bridge isn’t just about bridging data gaps — it’s about elevating your entire analytics experience.

Picture this: a unified view of your data, seamlessly integrated in BigQuery and effortlessly accessible through Looker Studio: From UA, to the GA4 you forgot to export to BigQuery, to the current GA4 data, and to real-time reporting, up to the minute.

CAB dashboard_ PageView Report

No more waiting for insights, no more lost connections. Just a clear, comprehensive dashboard that puts the power of data at your fingertips.

And here’s the kicker: Continuous Analytics Bridge isn’t just a one-time fix — it’s a dynamic solution that evolves with you. With real-time updates, your dashboard is always up-to-date, ensuring that you have the freshest insights at your fingertips, every time you refresh.

Switch from static reports to a robust, future-proof analytics solution with Continuous Analytics Bridge. Start leveraging the power of real-time data to enhance your marketing efforts today. Try CAB here.

Continuous Analytics Bridge: Functional Overview

Preserving Historical Insights:

By retaining your historical UA data, you’re not just holding onto numbers and figures; you’re preserving valuable insights into past campaigns, audience behavior, and trends. This historical context is like a compass, guiding future marketing strategies and helping you avoid repeating past mistakes.

Not to mention… (some other reasons, including what Ameet mentioned in this workshop: Organization data retention policies, etc.)

Uninterrupted Analysis:

With CAB, you can maintain continuous access to your historical data collected in Universal Analytics. Imagine having comprehensive insights from your entire analytics journey to the present moment at your fingertips. No more gaps — just a seamless flow of insights ready for analysis

Faster Decision-Making:

Timing is everything. With real-time access to your data, you can make informed decisions on the fly, uncovering issues and seizing opportunities and optimizing critical campaigns in the moment. No more waiting hours or days for insights to materialize; with this solution, the data you need is just a click away.

Improved Campaign Performance:

Armed with a comprehensive view of your data, you can fine-tune your campaigns with precision. Identify what’s working, what’s not, and pivot strategies accordingly — all without missing a beat to achieve improved ROI, higher conversion rates, and happier stakeholders.

Enhanced Reporting and Visualization:

CAB isn’t just about data integration; it’s about making that data accessible. With intuitive dashboards and customizable reports powered by Looker Studio, you can view your raw data into compelling visual narratives. From executive summaries to granular insights, you’ll have everything you need to communicate results effectively and drive decision-making across your organization.

Cost-Effective and Scalable:

CAB isn’t just about providing powerful insights; it’s about doing so in a cost-effective and scalable manner. Leveraging cutting-edge technology, we ensure affordability without compromising on performance. And as your data needs grow, our solution scales seamlessly to accommodate future growth, ensuring your analytics infrastructure keeps pace with your business.

Cracking Down ALL-in-One-Analytics: The CAB Data Journey

CAB provides a unified approach to data analysis, offering a seamless and dynamic integration of Universal Analytics (UA) and Google Analytics 4 (GA4) data. The platform leverages a robust architecture that ensures data accuracy, freshness, and accessibility, empowering you to gain deeper insights and make data-driven decisions. Here’s how CAB seamlessly integrates UA and GA4 data for comprehensive analytics:

Continuous Analytics Bridge Technical Breakdown

Data Extraction and Preparation:

  • CAB utilizes Google Analytics Core API to extract detailed data from both UA and GA4.
  • We create a Python workflow to process this data, ensuring proper data retrieval and transformation.
  • We meticulously select relevant dimensions and metrics, ensuring data quality and accuracy.
  • The workflow efficiently handles data limitations, allowing for comprehensive data analysis.

Data Integration and Consolidation:

  • CAB integrates GA4 data seamlessly with the UA data, leveraging BigQuery as a central data warehouse.
  • We create unified tables in BigQuery, eliminating the need for complex data joins and enhancing data accessibility.
  • We map and unify UA and GA4 metrics and dimensions, ensuring consistency and accurate comparisons.

Data Transformation and Reporting:

  • Dataform plays a crucial role in transforming raw data from UA and GA4 into a standardized format.
  • It performs complex data transformations and ensures efficient data management through incremental updates.
  • Dataform generates staging and reporting tables, providing a comprehensive view of analytics data for analysis and visualization.

Real-Time Data Processing and Updates:

  • CAB leverages BigQuery events and event_intraday tables to capture real-time data from GA4.
  • We employ a robust data pipeline, powered by Dataform, to process intraday event data for continuous data updates.
  • Dataform ensures real-time data analysis and provides a continuous flow of insights, eliminating data lag.

Visualization and Insights:

  • CAB leverages Looker Studio to visualize data insights effectively, creating interactive dashboards.
  • These dashboards allow you to explore high-level performance metrics and granular user activity data, providing a clear understanding of your audience and business performance. Here are some reports available for you to access in your CAB dashboard.
Continuous Analytics Bridge Report Components

Diving Deep into the CAB Engine Room

At the core of Continuous Analytics Bridge (CAB) is a sophisticated architecture that leverages Google Cloud Platform (GCP) technologies to deliver seamless and dynamic analytics integration. Below is a detailed explanation of each tool and its function within our system.

Analytics APIs, BigQuery

At the core of Continuous Analytics Bridge (CAB) is a sophisticated architecture that leverages Google Cloud Platform (GCP) technologies to deliver seamless and dynamic analytics integration. Below is a detailed explanation of each tool and its function within our system.

Analytics APIs

We utilize the Google Analytics Core API to programmatically extract detailed, unsampled data from both Universal Analytics (UA) and Google Analytics 4 (GA4). The APIs allow us to query specific dimensions and metrics, ensuring comprehensive data retrieval without sampling issues. This data is then fed into our processing pipeline for further manipulation.

BigQuery

BigQuery serves as our central data warehouse. It provides a scalable and efficient platform for storing massive amounts of data. By storing UA and GA4 data in unified tables within BigQuery, we enable fast querying and analysis, leveraging its integration with other GCP tools and visualization sources. CAB creates three datasets. The first one saves all backfill data without the fear of losing, while the other two store consolidated data from Dataform in staging tables and reporting views.

Dataform

Dataform is our chosen ETL tool, designed to simplify the creation, management, and orchestration of SQL-based data workflows. It allows us to define complex SQL transformations, manage dependencies, and automate data pipeline tasks efficiently. Within CAB, Dataform is responsible for:

  • Transforming raw data from Universal Analytics (UA) and Google Analytics 4 (GA4) and event_tables into a unified schema.
  • Managing incremental updates to ensure that data remains current.
  • Handling staging and reporting queries to maintain comprehensive and up-to-date datasets.

Reporting and Staging Queries

Staging Queries: Staging queries in Dataform are designed to process raw data extracted from UA and GA4 and events tables. These queries are responsible for:

  • Initial Data Transformation: Transforming raw data into a standardized format based on the mapping between UA, GA4, and BigQuery native GA4 schema.
  • Incremental Updates: Updating staging tables with the latest data. This process is triggered by messages from Pub/Sub, which are generated whenever new tables are created and detected by Logs Explorer and Router Sinks.
  • Data Consolidation: Ensuring that new data is seamlessly integrated with existing datasets to provide a complete and up-to-date view.

Reporting Queries: Reporting queries in Dataform focus on creating unified reporting tables that integrate data from both historical and real-time sources. These queries are responsible for:

  • Daily Incremental Updates: Updating incrementally unified tables each day by incorporating new data from staging tables.
  • Integration of Intraday Events: Merging intraday event data with staging tables to provide real-time insights. This ensures that the reporting tables reflect data from day 0 up to 1 minute before the current time.
  • Comprehensive Reporting: Generating final reporting tables that provide a unified view of analytics data, making it accessible for analysis and visualization in tools like Looker Studio.

Logs Explorer and Router Sinks:

  • Logs Explorer: Logs Explorer continuously monitors BigQuery for new event table creations. Whenever a new table is created, Logs Explorer captures these events in real-time, ensuring that we are immediately aware of any changes in our data landscape.
  • Router Sinks: Router Sinks act as the routing mechanism for event logs captured by Logs Explorer. They direct these logs to Pub/Sub, ensuring that every new data event is captured and processed on time.
CAB_LogsExplorer

Pub/Sub:

  • Message Broker: Pub/Sub (Publish/Subscribe) serves as the messaging backbone of our system. It receives decoded event logs from Router Sinks and distributes them to the Workflow component. Pub/Sub ensures high-throughput, reliable, and asynchronous message delivery, which is critical for maintaining the flow of data through our pipeline. It also sends the log message, which will be decoded in the workflow to run dataform incrementally.
CAB_ Pub/Sub

Workflow:

  • Task Orchestration: The Workflow component acts as the conductor of our data processing pipeline. Upon receiving messages from Pub/Sub, Workflow decodes the messages to determine which BigQuery tables were created or updated. It then triggers the appropriate Dataform jobs to transform and consolidate the data.
  • Error Handling and Retry Mechanism: Workflow includes robust error handling and retry mechanisms. If any data processing task fails, Workflow automatically retries the task, ensuring data integrity and reliability.
CAB_ Workflow

Cloud Scheduler:

  • Periodic Triggers: Cloud Scheduler is responsible for scheduling and triggering workflows at predefined intervals. It allows us to set up cron jobs that periodically activate the Workflow component to process intraday tables, ensuring that data is always current.
  • Customizable Schedules: With Cloud Scheduler, we can customize the frequency of data updates. For example, we can schedule updates every 15 minutes during peak business hours and less frequently during off-peak times, optimizing both data freshness and system efficiency.

Unlocking Value Without Breaking the Bank: How We Keep Your Bill in Check

Ever worried about sky-high bills from cloud services? Fear not, because with Continuous Analytics Bridge , we’ve designed a solution that maximizes value without emptying your pockets.

Here’s why:

API Backfills

One of the perks of our solution is that API backfills won’t make a dent in your budget. Not only are they cost-free, but their storage costs are negligible — less than the price of a bottle of water per month. With Continuous Analytics Bridge , you can enjoy the benefits of data integration without the hefty price tag.

Cost-Efficient GCP Tools

Dataform and our other mentioned GCP products are largely free to use. The only potential cost comes from running queries through Dataform, but rest assured, it’s a small price to pay for the insights gained.

Initial Setup Considerations

During the initial setup phase, we do incur some processing costs, particularly when performing a full refresh to create a unified table from UA, GA4, and BigQuery data. However, this processing volume typically remains under 20GB for most properties. And here’s the kicker: with 1TB of free processing power at your disposal every month, you’re unlikely to even scratch the surface.

Incremental Updates

Once the initial setup is complete, we switch to incremental updates, where we only update the exact tables created and append to the existing data. This minimizes processing volume and keeps costs in check. Even for hourly intraday runs, the costs remain low, as intraday tables are typically small. However, for larger properties, we offer the flexibility to adjust refresh intervals to balance cost savings with data freshness.

Cost Optimization Guarantee

Rest assured, we’ve fine-tuned every aspect of Continuous Analytics Bridge to ensure it’s the most cost-optimized solution on the market. With us, you can unlock valuable insights without breaking the bank. It’s the ultimate win-win. 😌

To ensure full transparency and control over your analytics costs, our solution includes a time series chart directly in your dashboard, which tracks all your BigQuery expenses. Utilizing sophisticated incremental methods, our tests on properties with up to 10 million events per month have shown that the costs can be astonishingly low — less than $2 per month. This feature allows you to monitor and manage your expenses effectively, ensuring that you get the most out of your analytics investments without any surprises on your bill.

CAB_Cost Monitoring

Data Dynamo: Your Ticket to Analytics Excellence!

You’ve just unearthed the ultimate secret to mastering analytics. But why stop now when the adventure has only just begun? Dive deeper with our exclusive guide or immersive webinar, and unlock the hidden treasures of data-driven success.

Imagine the thrill of being at the forefront of analytics innovation, armed with insider tips and expert guidance to supercharge your marketing strategies. Don’t let this opportunity slip through your fingers — embrace the power of data and watch your business soar to new heights!

Join the ranks of data dynamos who are revolutionizing the game. Your journey to analytics excellence starts now — seize it with both hands and let the adventure begin!

FAQ:

Q: Is the solution compatible with all types of websites?

A: Yes, our solution is designed to be adaptable to a wide range of websites, whether they’re eCommerce or non-eCommerce.

Q: How long does it take to implement the solution?

A: Implementation times may vary depending on the complexity of your analytics setup, but our team is here to assist you every step of the way to ensure a smooth transition.

Q: Will I need technical expertise to use the solution?

A: While some technical knowledge may be helpful, our solution is designed to be user-friendly and accessible to marketers of all skill levels. Plus, our support team is always on hand to provide assistance whenever needed.

Q: What if I want to only perform backfills and don’t need to unify data?

A: If your primary interest is in backfilling data from Universal Analytics to GA4 without the need to unify data across platforms, you can indeed do so completely free of charge. My open-source GitHub repositories are available that can guide you through the DIY process of setting up these backfills.

--

--

ali izadi

Marketing Data Analyst and Engineer with 3+ years in tech. Specializes in data, automation, team collaboration and continuous improvement