Digital Analytics

BigQuery | CXL Review

Gain insights with real-time and predictive analytics

Sandra Simonovic
Plus Marketing

--

Google BigQuery official logo

BigQuery course is a part of CXL’s Digital Analytics Minidegree Program. The instructor is Khrystyna Grynko.

The course duration is two hours, and it’s divided into six lessons. The goal of this course is to teach you to consolidate your data in one place and automate your marketing reports. It will help you develop the skillset to become a data-driven marketer. You will learn more about Google Cloud and its possibilities and discover the full analytics process from data collection to data visualization.

Table of content:

  • Introduction to the Google Cloud Platform
  • Introduction to the Google BigQuery
  • Get your data into BigQuery
  • Work with data in BigQuery
  • Query and analyze your Google Analytics data with BigQuery
  • Build automated reports and dashboards using your BigQuery data
  • Closing

Introduction to the Google Cloud Platform

Google Cloud consists of a set of physical assets (computers and hard disk drives) and virtual resources (virtual machines) stored in Google’s data centers worldwide.

The Cloud Console is the graphical interface you can use to create and manage BigQuery resources and run SQL queries.

Google Cloud uses a pay-as-you-go pricing model, which means you only pay for the services you use.

The list of available Google Cloud services is too long to be listed here (also, they keep extending it). For data analytics, there are:

  • BigQuery,
  • Looker,
  • Dataproc,
  • Dataflow,
  • Dataprep,
  • Cloud Data Fusion, and others.
Image source: CXL

Introduction to the Google BigQuery

The BigQuery sandbox gives you free access to the power of BigQuery within the sandbox’s limits. The BigQuery sandbox is available to any Google Cloud customer.

Note that the BigQuery sandbox has some limitations:

  • All BigQuery Quotas and limits apply.
  • You have the same free usage limits as the free tier: 10 GB of active storage and 1 TB of processed query data per month.
  • All datasets have the default table expiration time and the default partition expiration set to 60 days. Any tables, views, or partitions in partitioned tables automatically expire after 60 days.

Also, be aware that Sandbox projects do not support:

  • Streaming data
  • Data Manipulation Language (DML) statements
  • The BigQuery Data Transfer Service

Upgrade your project and remove or update the expiration time for your resources to eliminate the sandbox limits. Let me remind you that after upgrading from the sandbox BigQuery Quotas and limits still apply.

Learn more:

More about BigQuery SandBox

BigQuery public datasets

Build your data schema

Get your data into BigQuery

Google Services for BigQuery data import

  • BigQuery Data Transfer Service: Google Play, Google Ads (2.5$ per month per customerID), Youtube (5$ per month per channel)
  • Google Analytics 360
  • Cloud Storage (no charges, only for storage)
  • Google Cloud Dataflow (pipeline to write data to BigQuery)
  • Google Cloud Function

Also, you can do BigQuery data import from the Local Machine.

Load data from a readable data source:

  1. You can load data manually
  2. You can use API or Client Libraries (create connectors or use paid connectors)

API is a set of clearly defined methods of communication among various components — applications, software

Batch task automation without manual work

BigQuery Data Transfer

Load data from Google (except Analytics) and Amazon sources:

  1. In BigQuery go to Transfers
  2. Choose data source
  3. Fill in all the data source and data destination information
  4. Connect to your Google Account
  5. Save

Note:

  • You can delete/pause your data transfer at any time
  • Check Run History to make sure that everything works well

BigQuery data streaming

Load data immediately without delaying (for real-time reports):

  • You can create your custom solution
  • Use Google solutions (Google Dataflow)
  • Or use paid solutions

Use-cases: mobile application — sending errors events in real-time

BigQuery DataFlow and pub/sub

Pub/Sub is a fully-managed real-time messaging service that allows you to send and receive messages between independent applications. Cloud Pub/Sub Documentation

Google Cloud Dataflow is a cloud-based data processing service for both batch and real-time data streaming applications. It enables developers to set up processing pipelines for integrating, preparing, and analyzing large data sets, such as those found in Web analytics or big data analytics applications. What is Google Cloud Dataflow?

Streaming into BigQuery:

  1. Create Cloud Pub/Sub topic
  2. Read from Pub/Sub in Dataflow
  3. Compute windowed aggregates
  4. Stream into BigQuery

Work with data in BigQuery

To work with data in BigQuery, you will have to learn some basic SQL language

SQL is a programming language that gives you the possibility to communicate with your data.

If you are not familiar with SQL, you will have to dedicate some time to master it. Here is the video that can help you start:

Learn more:

Creating views | BigQuery

Query syntax in Standard SQL | BigQuery

What is BigQuery BI Engine?

Expressions, functions, and operators in Standard SQL | BigQuery

Sample CSV Files / Data Sets for Testing (till 5 Million Records) — Sales

Query and analyze your Google Analytics data with BigQuery

Google Analytics 360 and Google Analytics 4 give you the possibility to store your data in Google BigQuery. If you have a Google Analytics standard version (known as Universal Analytics), you will need to use external connectors to pull your data from Google Analytics to BigQuery.

A partitioned table is a table divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance and control costs by reducing the number of bytes read by a query.

Denormalization is a strategy that database managers use to increase the performance of database infrastructure. It involves adding redundant data to a normalized database to reduce certain types of problems with database queries that combine data from various tables into a single table.

Learn more:

BigQuery Export for Analytics 360

Google Analytics sample dataset for BigQuery

Introduction to partitioned tables | BigQuery

Specifying nested and repeated columns

BigQuery Export schema

Build automated reports and dashboards using your BigQuery data

To create your reports in Google Spreadsheet, you have two possibilities:

  • Use a natural data connector with BigQuery (available only for G Suite users)
  • Use an add-on OWOX BI

If you have a paid version of Google Drive, you can connect to BigQuery directly from your Google Spreadsheet:

  1. Click on Connect to BigQuery,
  2. Select a project for which the billing is activated,
  3. You will see the query editor where you can edit a query,
  4. Click on Insert results.

Visualize your BigQuery data in Google Data Studio

VIDEO: How to visualize data in BigQuery in Data Studio

Closing

Thanks to this course, you will master consolidating your data in one place and automating your marketing reports. By learning more about Google Cloud solutions you will become a competent data-driven marketer.

CXL BigQuery course will help you discover the full analytics process, from data collection to data visualization.

--

--