Finding the Most Valued Organic Keywords: A Data Story with GSC, GA4, and BigQuery

ali izadi
9 min readFeb 16, 2024

--

In the world of digital marketing, finding the right keywords is like discovering gold. These keywords are the keys to getting more people to see your website, and ultimately, make a purchase. Yet, a significant challenge exists: the integration between Google Search Console (GSC) and Google Analytics 4 (GA4) falls short, leading to a loss of crucial insights on how keywords drive sales and attract customers.

This gap hinders our ability to identify the most impactful keywords and optimize our online presence for increased sales. But, there’s hope. By employing a simple modeling technique in BigQuery, we can bridge this gap, merging GSC and GA4 data effectively.

This guide will show you how, enabling you to pinpoint the sales-driving power of each keyword with precision. It’s all about leveraging solid data to enhance your SEO and marketing strategies, turning hidden gems into tangible business value. Let’s unlock the full potential of your keywords.

Keyword Insights Lost: The Marketer’s Dilemma with GSC and GA4

Imagine you’re a detective in the digital marketing world. Your mission? To uncover the most valuable keywords that lead not just to clicks, but to actual conversions and sales. You have two main tools at your disposal: Google Search Console (GSC) and Google Analytics 4 (GA4). These should be your best allies, but there’s a twist in the tale. Despite their power, when you try to connect them directly, they’re like puzzle pieces from different sets that just don’t fit perfectly together.

The Gap in the Map

Visibility vs. Value: GSC shows us which keywords bring traffic, like clues leading to a hidden treasure. But it stops short of telling us which of these paths end with the ultimate prize — a conversion.

End of the Journey, Missing Start: GA4 reveals the treasure chest — conversions and sales. However, it’s like finding a chest without a map, leaving us guessing which keywords led the seekers to the gold.

Unanswered Questions

This leaves us marketers with puzzles:

Which keywords are not just popular but profitable?

How do I link my traffic to actual sales?

Marketers face a challenge with GA4 and GSC integration due to missing keywords and unclear conversion attribution. This makes it hard to identify valuable keywords that lead to customer actions. There’s a need for a solution that clearly maps the journey from search to sale. Get set for a data dive? Your SEO success story starts now!

From Clicks to Conversions: SQL Connects the Dots

Now, we pivot from challenge to solution. Armed with SQL, we'll decode the mysteries of GA4 and GSC data, pinpointing the keywords that matter most. Setting the stage: What we need to begin.

Setting Up:

First Things First: Make sure you’ve linked your Google Search Console (GSC) and Google Analytics 4 (GA4) to BigQuery. This step is crucial because the magic starts from the day you connect these tools — no earlier. All the data you’ll use is collected from this connection day forward.

SQLify Your SEO:

-- Declaration of date range for analysis
DECLARE start_date DATE DEFAULT '2024-02-09';
DECLARE end_date DATE DEFAULT '2024-02-13';

-- Custom function to transform URLs based on specified type
CREATE TEMP FUNCTION transform_url(page STRING, type STRING) AS (
CASE
WHEN type = 'parameters' THEN REGEXP_REPLACE(page, '\\?.*$|#.*', '')
WHEN type = 'protocol' THEN REGEXP_REPLACE(page, '^(https?://|ftp://)', '')
WHEN type = 'domain' THEN REGEXP_REPLACE(page, '^(https?://|ftp://)?[^/]+/', '/')
WHEN type = 'all' THEN REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(page, '\\?.*$|#.*', ''), '^(https?://|ftp://)', ''), '^[^/]+/', '/')
END
);

-- CTE for extracting conversions from GA4 dataset
WITH conversions AS (
SELECT
user_pseudo_id,
-- Extracting session ID
(SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_id') AS ga_session_id,
-- Calculating purchase value with a combination of potential value types
(SELECT IFNULL(value.int_value, 0) FROM UNNEST(event_params) WHERE KEY = 'value') +
(SELECT IFNULL(value.float_value, 0.0) FROM UNNEST(event_params) WHERE KEY = 'value') +
(SELECT IFNULL(value.double_value, 0.0) FROM UNNEST(event_params) WHERE KEY = 'value') AS purchase_value
FROM
`GA4_table_id` -- Customize with your GA4 table ID
WHERE
event_name = 'purchase'
AND _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', start_date) AND FORMAT_DATE('%Y%m%d', end_date)
),

-- CTE for extracting session starts filtered by organic traffic from Google
session_starts AS (
SELECT
user_pseudo_id,
-- Extracting session ID
(SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_id') AS ga_session_id,
-- Extracting page location
(SELECT value.string_value FROM UNNEST(event_params) WHERE KEY = 'page_location') AS page_location
FROM
`GA4_table_id` -- Customize with your GA4 table ID
WHERE
event_name = 'session_start'
AND concat (collected_traffic_source.manual_source,'/',collected_traffic_source.manual_medium) = 'google/organic'
AND _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', start_date) AND FORMAT_DATE('%Y%m%d', end_date)
),

-- CTE for merging sessions and conversions based on user ID and session ID
filtered_sessions AS (
SELECT
ss.user_pseudo_id,
ss.ga_session_id,
ss.page_location,
sus.purchase_value
FROM
session_starts ss
JOIN
conversions sus
ON
ss.user_pseudo_id = sus.user_pseudo_id AND ss.ga_session_id = sus.ga_session_id
),

-- CTE for aggregating Google Analytics data by transformed page location
ga_data AS (
SELECT
transform_url(page_location, 'all') AS transformed_page_location,
COUNT(*) AS num_sessions,
SUM(purchase_value) AS total_purchase_value
FROM
filtered_sessions
GROUP BY
transformed_page_location
),

-- CTE for aggregating Google Search Console data by query and transformed URL
gsc_data AS (
SELECT
query AS search_query,
transform_url(url, 'all') AS transformed_url,
SUM(impressions) AS total_impressions,
SUM(clicks) AS total_clicks
FROM
`GSC_table_id` -- Customize with your GSC table ID
WHERE
clicks > 0
AND data_date BETWEEN start_date AND end_date
AND query IS NOT NULL
AND url IS NOT NULL
GROUP BY
search_query,
transformed_url
),

-- CTE for calculating percentage of total clicks and share of voice conversion
pot_calculation AS (
SELECT
gsc.search_query,
gsc.transformed_url,
gsc.total_impressions,
gsc.total_clicks,
ga.num_sessions,
ga.total_purchase_value,
ROUND(gsc.total_clicks / SUM(gsc.total_clicks) OVER (PARTITION BY gsc.transformed_url), 2) AS percent_of_total_clicks,
ROUND(ga.num_sessions * gsc.total_clicks / SUM(gsc.total_clicks) OVER (PARTITION BY gsc.transformed_url), 2) AS sov_conversion
FROM
gsc_data gsc
LEFT JOIN
ga_data ga ON gsc.transformed_url = ga.transformed_page_location
WHERE
ga.num_sessions > 0
)

-- Final query aggregating the share of voice conversion and purchase value by search query
SELECT
search_query,
ROUND(SUM(sov_conversion),2) AS total_sov_conversion,
ROUND(SUM(sov_conversion * total_purchase_value),2) AS total_purchase_value
FROM
pot_calculation
GROUP BY
search_query;

Here is a visual representation of the results:

Most Valued Organic Keywords
Most Valued Organic Keywords

SQL Solution Breakdown: From Data to Insights

This SQL code connects GSC and GA4 data in BigQuery to reveal valuable organic keywords. Let’s break down how it works in a straightforward manner:

1. Setting the Scene with Date Ranges:

I begin by defining the period I am interested in analyzing with `start_date` and `end_date`. This ensures we’re only looking at data relevant to our selected timeframe.

2. Tailoring URLs for Analysis:

A custom function `transform_url` is created to modify URLs in various ways (like removing parameters or protocols). This standardization is crucial for accurately matching and comparing data from GSC and GA4. Depending on your URL structure, you can use each of the user-defined functions. Just ensure that the function produces the same URLs for both GSC and GA4 URLs.

3. Identifying Valuable Actions with Conversions:

The `conversions` section pulls data from GA4 to find events labeled as ‘purchase’, along with their value. This tells us when users have completed valuable actions and how much those actions are worth. It can be replaced with the conversions you want.

4. Tracking the Starting Point with Session Starts:

In `session_starts`, I filter GA4 data for sessions that began with organic searches from Google. It helps me filter only organic traffic from Google.

5. Merging Insights with Filtered Sessions:

`filtered_sessions` combines session starts with conversions by matching device IDs and session IDs. This step is key to linking the start of a user’s journey (finding the site through search) with their valuable actions (making a purchase).

6. Aggregating GA Data:

`ga_data` aggregates this combined data by the standardized page locations, counting sessions, and summing up purchase values. This highlights which pages are not just attracting visitors but also driving revenue.

7. Combining Forces with GSC Data:

`gsc_data` gathers search query information from GSC, including the number of impressions and clicks for specific queries. This part focuses on how users find the site through search.

8. Calculating Keyword Value:

`pot_calculation` merges GSC and GA data to calculate the share of voice conversion and the percentage of total clicks. This innovative step assesses the impact of specific search queries on organic conversions and purchase value for all pages. To learn more about calculating the percentage of total in BigQuery, visit https://www.youtube.com/watch?v=siNzbxtAqD0.

9. Revealing the Treasure:

The final `SELECT` statement aggregates these calculations to present each search query’s total share of voice conversion and associated purchase value. This outcome directly links organic search queries with their contribution to business goals.

The Mechanics of SQL: Assigning Conversions with Modeling Method

This modeling method is a powerful approach for SEO specialists and marketers aiming to understand the true value of their organic keywords. It works by first ensuring all Google organic traffic to a page, captured by Google Search Console (GSC), and the conversions from these pages, tracked by Google Analytics 4 (GA4), are accounted for in BigQuery. Here’s a simplification of what this method achieves and the significance of the retrieved columns:

  • Counting Organic Traffic: The method starts by counting all visits to a page that came from Google’s organic search results. This is crucial because it sets the stage for understanding which pages are attracting the most attention from potential customers.
  • Extracting Organic Conversions: Next, it identifies how many of these visits resulted in a conversion — be it a purchase, a sign-up, or another valuable action — as recorded in GA4. This step links the initial interest (traffic) to tangible outcomes (conversions).
  • Calculating Share of Voice: For each keyword, the method calculates its share of voice in terms of clicks from organic search. This involves figuring out what percentage of total clicks each keyword represents, giving insight into the keyword’s relative importance or dominance in driving traffic.
  • Assigning Conversions to Keywords: It then assigns the conversions extracted from GA4 to the keywords, based on their share of voice. This means if a keyword accounted for 50% of the clicks to a page, it’s assumed to contribute to 50% of the conversions from that page.
  • Handling Multiple Page Rankings: If a keyword ranks for several pages, the query calculates its conversion impact on each page separately. It then sums up these impacts across all pages, offering a comprehensive view of the keyword’s overall contribution to conversions.

Retrieved Columns Meaning:

  • search_query: The specific keyword that users searched for in Google.
  • total_sov_conversion: The total share of voice conversion, indicating the keyword’s proportionate contribution to conversions, based on its share of clicks.
  • total_purchase_value: This represents the sum of the conversion values (e.g., purchase amounts) attributed to the keyword, offering a direct link between search behavior and revenue generation.

In essence, this approach not only highlights which keywords are bringing traffic but, more importantly, which of these are actually driving conversions and contributing to the bottom line. It’s a nuanced, data-driven way to allocate SEO efforts and budget toward keywords with proven impact on business goals.

Maximizing Impact: Strategic Applications of Solution

Dive into a sea of opportunities with this SQL-driven approach, transforming the way you harness keyword data for strategic advantage. Here’s how you can pivot your strategies to not just chase, but capture and convert your digital audience effectively:

Fine-Tuning Google Ads Campaigns

Imagine funneling your budget into keywords that are not just guesses but guaranteed performers. By identifying organic keywords that are conversion goldmines, you can craft Google Ads campaigns that are precision-targeted to draw in traffic that’s ready to convert. It’s like placing bets where you know you’ll win, ensuring every dollar spent on ads is a dollar well invested.

Content Optimization

Turn your content into a conversion powerhouse. With insights on which keywords lead directly to sales, your content strategy transforms from shooting in the dark to hitting bullseyes. Whether it’s blog posts, landing pages, or videos, you’ll know exactly what topics to tackle to attract visitors who are more likely to take action. It’s about creating content that doesn’t just rank well but also resonates and converts.

Refining SEO Efforts for Maximum ROI

Prioritize your SEO tasks by focusing on optimizing for high-converting keywords. This methodical approach ensures that you’re not just improving rankings but are also attracting quality traffic with a higher propensity to convert. It’s a strategic shift from playing the numbers game to playing the value game, aligning your SEO efforts with tangible business outcomes.

By integrating these SQL insights into your strategy, you’re not just making informed decisions; you’re setting the stage for unprecedented growth and efficiency. This isn’t just about doing digital marketing; it’s about doing it smarter, with a laser focus on what truly moves the needle for your business.

Better Insights Aren’t Enough: Mastering Keyword Impact

Unlocking keyword insights is just the start of our Analytics mastery journey. By integrating Google Search Console, Google Analytics 4, and BigQuery, we dive deeper into how searches fuel business success. But the adventure doesn’t stop there. In the ever-changing digital landscape, your strategies need to evolve. Sharing our journeys enriches us all, pushing the boundaries of digital marketing. Let’s connect on Linkedin and explore the possibilities together. Your input can help shape the future of digital marketing, turning insights into lasting impact.

💡 Open-Source BigQuery Scripts for GA4 & GSC💡

I’ve curated 70+ SQL queries to simplify working with GA4 and Google Search Console data . Whether you’re diving into user behavior, tracking ad performance, or analyzing search data, this repository is here to help.Your feedback and contributions are welcome to make it even better!

👉 Access the GitHub repo here — always free to use!

--

--

ali izadi
ali izadi

Written by ali izadi

Analytics Engineer with 5+ years of experience, specializing in the Google stack. Focused on optimizing data workflows and solving marketing challenges.

Responses (3)