Simplifying Marketing Analytics with Google Cloud: A Beginner’s Guide
Author: Syed Ali Naqi at DreamAI Software
In today’s digital world, leveraging marketing analytics can significantly enhance the effectiveness of your advertising efforts. Google Cloud offers an extensive suite of tools and services tailored for this purpose, known as the Marketing Analytics Jumpstart. This guide aims to demystify the process of setting up a robust marketing analytics framework using Google Cloud, ensuring clarity and simplicity for beginners.
Understanding the Marketing Analytics Jumpstart
At its core, the Marketing Analytics Jumpstart is designed to optimize your digital marketing strategy through data-driven insights. Utilizing Terraform, it allows you to build, modify, and version your marketing infrastructure effortlessly.
Key Features:
- Scheduled ETL Jobs: Automates the extraction, transformation, and loading of data from Google Analytics 4 and Google Ads, simplifying data management.
- Machine Learning Pipelines: Includes ready-to-use models for purchase propensity, customer lifetime value, and audience segmentation.
- Interactive Dashboard: Provides a user-friendly interface for data visualization and insights.
- Activation Pipeline: Integrates machine learning predictions into Google Analytics 4, enriching your analytics data.
This solution includes pre-built machine learning models for three key areas:
Purchase Propensity: Predicting how likely someone is to buy something.
Customer Lifetime Value: Estimating the total amount a customer will spend over time.
Audience Segmentation: Grouping customers into categories based on their behavior or other characteristics.
Please note that, in this tutorial, we are only showing you the Audience Segmentation but in future, we will create a separate blog for Purchase Propensity and Customer Lifetime Value.
Setting Up: A Step-by-Step Guide
Setting Up and Exporting Google Analytics 4 Data to Big Query
The first step involves setting up a GA4 account and configuring data streams to funnel data from your website/app into a GA4 account. After setting up the account, we will link GA4 with Big query, Google’s multi-cloud data warehouse.
Setting Up a GA4 Account:
- First, login with your Google account and go to the following link https://analytics.google.com/.
- Select the start measuring button and it will take you to the account creation page.
- Under account details, give an account name (e.g, DemoAccount). Scroll down and press the Next button. It will take you to the property creation page.
- In order to measure your web and app data, you need to create a GA4 property.
- Under property details, give property name (e.g., demoproperty), select reporting time zone, currency and press Next button.
- Now it will take you to the business details page where you can specify the type of business you are running. Select the industry category of your business from the industry category dropdown, select your business size and hit Next.
- Choose the business objectives for reports that are personalized to your business. In order to get the multiple types of reports, select Get baseline reports.
- Once you press the create button, a popup will appear that will show you the Google Analytics terms and conditions. Read the terms and conditions and accept it.
- After accepting the terms and conditions, it will take you to the data collection page where you need to choose the type of platform from which you want to get the data.
- For our demo purpose, we will select a Web platform.
- Provide the website URL from where you want to get the data and give a stream name. We are getting the data from our company’s website which is https://www.dreamai.io and we are providing the stream name as DreamAI_Web_Stream.
- Once you create the stream, a web stream details page will appear where you can see details related to your data stream.
- Now finally, in order to collect the data from your website, you need to install the Google tag to your website. Click on the View tag instructions button to view the tag details.
- Under installation instructions, you can see the Google tag code. Copy and paste the tag code to every page of your website immediately after the <head> element.
- Once you do that, you will start getting data from your website to GA4. It may take up to 48 hours before your property starts collecting data.
- Now that you have set up your GA4 account, you are now ready to export GA4 data to Big Query. In your Google Cloud Platform account, you need to enable the Big Query APIs to configure the Big Query export to Google Analytics.
GA4 Export to Big Query:
- In your Google Analytics account, go to the Admin panel and under the Product links section, click Big Query links to configure the Big Query link.
- Click Link. Click Choose a Big Query project to display a list of projects for which you have access. In our case, we will select our company’s project.
- Select location for data collection (e.g., us-central1) and click Next.
- Select Configure data streams and events to select which data streams to include with the export and specific events to exclude from the export.
- You can exclude events by either clicking Add to select from a list of existing events or by clicking Specify event by name to choose existing events by name or to specify event names that have yet to be collected on the property.
- Click Apply.
- Select either or both a Daily (once a day) or Streaming (continuous) export of data.
- Click Next.
- Review your settings, then click Submit.
- Once the linkage is complete, data should start flowing to your Big Query project within 24 hours.
- Handle sensitive information securely by locating your Measurement ID and generating an API Secret within your GA4 account’s admin panel.
GA4 Configurations and Permissions:
The activation application uses the sensitive information from the Google Analytics property such as Measurement ID and API Secret.
To find the measurement ID, follow these steps:
- Go to the Admin panel of your GA4 account.
- Under Data collection and modifications, click Data streams.
- Select the Web tab.
- Click the stream name that you created earlier (e.g. in our case it is DreamAI_Web_Stream).
- Find the measurement ID in the first row of the stream details.
API Secret will not be present so first you need to generate it. To generate the API Secret, follow these steps:
- Go to the Admin panel of your GA4 account.
- Under Data collection and modification, click Data Streams.
- Click Web, then click your web data stream (e.g. DreamAI_Web_Stream).
- In the web stream details, click Measurement Protocol API Secrets.
- Review and accept the User Data Collection Acknowledgement.
- Click Create.
- Enter a nickname for the secret (e.g. in our case it is dreamai_web_data_api_secret), then click Create.
- Once the API secret is created, it will be listed in the Measurement Protocol API Secrets from where you can find the secret value.
Cloning Dataform Git Repository
MDS uses Dataform as the tool to run the data transformation. Dataform uses a private Git repository to store SQL transformation scripts. You will need to create a repository and copy the SQL scripts from a companion GitHub repo before running the Terraform scripts.
To do that, follow these steps:
- Create a private empty repository on GitHub or GitLab and checkout the blank repository to your computer.
- Clone the MDS Dataform scripts from GitHub using following command:
git clone https://github.com/googlecloudplatform/marketing-analytics-jumpstart-dataform.git
- Now you need to push the contents to your private repository. For that, navigate to the cloned directory (i.e., marketing-analytics-jumpstart-dataform) and add your private repo as a remote using command:
git remote add copy https://github.com/<your-account>/<repo>.git
- Rename the current branch to main using command:
git branch -M main
- Push to your repository using command
git push -u copy main
- Clean up the checkout directory
- Return to the previous directory and remove the cloned directory using command:
rm -rf marketing-analytics-jumpstart-dataform
- You need to Generate a GitHub personal access token for Dataform access. If you don’t know how to create GitHub personal access token, go to the link below and follow the instructions to create the token. https://docs.github.com/en/enterprise-server@3.9/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens
- Provide the Git URL and access token to the Terraform scripts using a Terraform variable.
Create Dataform Repository on GCP and connect it to your private Git Repository
Once your private repo is created, you need to create a Dataform repository on CGP and connect it to your private repo.
To create a Dataform repository, follow these steps:
- In your Google Cloud console, go to the Dataform page and enable the Dataform API. When you enable the Dataform API, you will land to the Dataform page.
- Click Create repository and on the Create repository page, in the Repository ID field, enter a unique ID. (e.g., marketing-analytics-jumpstart)
- In the Region drop-down list, select a region (e.g., us-central1) for storing the repository and its contents.
- In the Service account drop-down, select Default Dataform service account. Click Create, and then click Done.
Now that your Dataform repo is created, you can connect it to your private git repository. To do that, follow the steps:
- In your Google Cloud console, go to the Dataform page and select the repository you want to connect to.
- On the repository page, click Settings tab and click Connect with Git.
- In the Link to remote repository pane, in the Remote Git repository protocol, select HTTPS and in the Remote Git repository URL field, enter the URL of your remote Git repository, ending with .git.
- In the Default remote branch name field, enter the name of the main development branch of the remote Git repository.
- In the Secret drop-down, select your secret for the remote Git repository. Secret contains the personal access token of your github. You can manually put the secret or create a secret in the Secret Manager.
- Finally click Link and if everything goes well, your private git repo will link to the dataform repository.
- You also need to grant the permission roles/secretmanager.secretAccessor to the default Dataform service account. To do that, go to the cloud shell and run following command:
gcloud projects add-iam-policy-binding “<your-project-id>” — member serviceAccount:<default-dataform-service-account-name> — -role “roles/secretmanager.secertAccessor”
Installing the MSD, ML pipelines, the feature Store, and the activation pipeline
We will use the Terraform scripts to create the infrastructure to start data ingestion into Big Query, create feature store, ML pipelines and Dataflow activation pipeline. In order to run the terraform scripts, we will use the Google Cloud Shell and perform all the other steps using it.
Here are the steps that you need to follow in order to perform the above task:
- First go to the GCP Cloud Shell and clone the marketing-analytics-jumpstart repository.
- Enter the following command into the cloud shell:
REPO=”marketing-analytics-jumpstart”
cd $HOME
git clone https://github.com/GoogleCloudPlatform/${REPO}.git
- Once the repo cloned successfully, install Python’s Poetry and set Poetry to use Python3.8–3.10 version.
- In Cloud Shell run the following commands:
curl -sSL https://install.python-poetry.org | python3 -
- Verify that poetry is on your $PATH variable by running the command:
poetry — version
- If it fails — add it to your $PATH variable by using following command:
export PATH=”$HOME/.local/bin:$PATH”
- Set poetry to use your latest python3 using command:
poetry env use python3
- Authenticate with additional OAuth 2.0 scopes to use the Google Analytics Admin API. Run following command:
gcloud auth application-default login — quiet — scopes=”openid,https://www.googleapis.com/auth/userinfo.email,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/sqlservice.login,https://www.googleapis.com/auth/analytics,https://www.googleapis.com/auth/analytics.edit,https://www.googleapis.com/auth/analytics.provision,https://www.googleapis.com/auth/analytics.readonly,https://www.googleapis.com/auth/accounts.reauth"
Note: You will receive an error message informing the Cloud Resource Manager API has not been used/enabled for your project, similar to the following:
ERROR: (gcloud.auth.application-default.login) User [@.com] does not have permission to access projects instance [<gcp_project_ID>:testIamPermissions] (or it may not exist): Cloud Resource Manager API has not been used in project <gcp_project_id> before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/cloudresourcemanager.googleapis.com/overview?project=<gcp_project_id> then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.
On the next step, the Cloud Resource Manager API will be enabled and, then, your credentials will finally work.
- Run the following script to create a Terraform remote backend.
SOURCE_ROOT=${HOME}/${REPO}
cd ${SOURCE_ROOT}
scripts/generate-tf-backend.sh
- Create the Terraform variables file by making a copy from the template and set the Terraform variables. To do that, run the following command:
TERRAFORM_RUN_DIR=${SOURCE_ROOT}/infrastructure/terraform
cp ${TERRAFORM_RUN_DIR}/terraform-sample.tfvars ${TERRAFORM_RUN_DIR}/terraform.tfvars
- Now open the terraform.tfvars file and edit the variables. Update the variables according to your GCP project id and Google Analytics account. You can nano into terraform.tfvars file and update the variables. Make sure you update everything correctly.
- Our website does not contains ads so we declared empty variable for Ads export data in terraform.tfvars file (i.e., source_ads_export_data = []), but if your website contains the Ads as well, you can update this variable accordingly.
- After updating the variables, run Terraform to create the resources. To do that, run following commands:
terraform -chdir=”${TERRAFORM_RUN_DIR}” init
terraform -chdir=”${TERRAFORM_RUN_DIR}” apply
- If you don’t have a successful execution from the beginning, re-run until all is deployed successfully.
- Now run the following command in the terminal:
terraform output
and it will give you the link to the looker dashboard where you can view your data.
Post Installation Instructions
After deploying all assets successfully in your Google Cloud Project, trigger Cloud Workflow to start data flow or wait for scheduled execution.
Data requirements:
- Looker Studio Dashboard: Deploy after successful previous installation steps.
- ML pipeline: Requires views and tables for schema reading and column transformations.
In order to manually start data flow, run Cloud Workflow by following these steps:
- Go to Google Cloud console > Workflows page.
- Find dataform-prod-incremental workflow, click three dots > Execute Workflow.
- For large data (>XXX GBs), processing may take time. Ensure completion before next steps.
- Invoke Big Query stored procedures for backfilling:
- Go to the Big Query page, run queries to invoke procedures for customer ltv tables, purchase propensity tables, and audience segmentation tables.
- Large GA4 Big Query dataset backfilling can take several hours. Ensure procedures start without errors.
- Redeploy ML pipelines with Terraform:
- In the code editor, change deploy_pipelines from true to false in ${TERRAFORM_RUN_DIR}/terraform.tfvars.
- Undeploy ML pipelines with command:
terraform -chdir=”${TERRAFORM_RUN_DIR}” apply.
- To redeploy, revert changes in terraform.tfvars and apply terraform configuration again.
- Run session-resume.sh script for new session setup with necessary variables and credentials and follow the authentication workflow if prompted.
Resources Created
At this time, the Terraform scripts in this folder perform the following tasks:
- Enables the APIs needed.
- IAM bindings needed for the GCP services used.
- Secret in GCP Secret manager for the private GitHub repo.
- Dataform repository connected to the GitHub repo.
- Deploys the marketing data store (MDS), feature store, ML pipelines and activation application.
Looker Studio Dashboard
The Looker Studio dashboard is intended to be a starting point for the visualization of your Marketing Analytics Jumpstart output using some common descriptive analytics for GA4.
When you click the looker link (that you got from terraform output), it will take you to the looker studio page where you can visualize all the details.
A default looker design will be shown to you where you can visualize some basic details. If you want to modify the looker design and add more information to it, you can click the Edit and share button at the top right of the page.
After clicking the Edit and share button, a popup will appear that will ask you to review the data source configuration settings. Review the data source configuration settings and then click on Acknowledge and Save to continue.
Once the editing is enabled, you can add or remove data from the existing charts. You can also add new charts or update the previous ones according to your data that will be coming from your Big Query database.
Key Takeaways
The Marketing Analytics Jumpstart on Google Cloud equips businesses with the tools necessary to transform their digital marketing data into actionable insights. By following this guide, you can set up a comprehensive marketing analytics solution that leverages machine learning to predict customer behaviors and preferences, ultimately leading to more effective advertising strategies.
Reference Section
For detailed instructions and further exploration, the following Google tutorials are invaluable resources:
- Google Cloud Documentation
- Terraform by HashiCorp
- Google Analytics 4 Help Center
- BigQuery Documentation
This guide provides a foundational understanding of setting up marketing analytics on Google Cloud. By leveraging these tools and services, businesses can gain deeper insights into their marketing efforts and drive more value from their digital advertising budgets.