Unlock the Power of Machine Learning Without Coding: A Beginner’s Guide to BigQuery ML

Demystifying machine learning for data analysts — build predictive models directly within your data warehouse

Brian Ling
Google Cloud - Community
4 min readApr 1, 2024

--

A vibrant illustration of a data warehouse with colorful machine learning algorithms swirling around it, representing the ease of integration

As a data analyst, you’re constantly seeking insights to drive better business decisions. But traditional machine learning often means complex coding, separate environments, and a reliance on specialized skills that your team might not possess. What if you could tap into the power of predictive modeling without leaving the comfort of your familiar data warehouse?

Introduction

BigQuery ML (BQML) opens the door to machine learning for those who are experts in SQL. It bridges the gap between data analysts and machine learning specialists, allowing you to create, train, and deploy a variety of powerful machine learning models directly within Google Cloud’s BigQuery.

Purpose

This blog post will guide you through a hands-on exploration of BigQuery ML. We’ll cover the basics, walk you through a practical use case, and discuss its potential to revolutionize how you use your data.

Use Cases

  • Predicting customer churn: Identify customers at risk of leaving.
  • Fraud detection: Uncover unusual patterns in financial transactions.
  • Demand forecasting: Predict future sales to optimize inventory.
  • Sentiment analysis: Understand customer feedback trends.

Skill Prerequisites

  • Basic understanding of SQL.
  • Familiarity with BigQuery and Google Cloud Platform (GCP).

Disclaimer

BigQuery ML is a powerful tool, but it’s important to use it responsibly. Ensure your data is unbiased and representative of real-world scenarios to avoid inaccurate or discriminatory predictions.

Step-by-Step Walkthrough

Prerequisites

  • A Google Cloud Platform project with billing enabled.
  • BigQuery access and the necessary IAM permissions.
  • A dataset in BigQuery to train your model.

Architecture Diagram

Creating a Machine Learning Model in BigQuery ML

  1. Create your dataset
  • To create a dataset, click on the View actions icon next to your project ID and select Create dataset.
  • Name your Dataset ID bqml_lab and click Create dataset.

2. Create a model

  • Go to BigQuery EDITOR, paste the following query to create a model that predicts purchase likelihood:
CREATE OR REPLACE MODEL bqml_lab.sample_model
OPTIONS(model_type='logistic_reg') AS
SELECT
IF(totals.transactions IS NULL, 0, 1) AS label,
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(geoNetwork.country, "") AS country,
IFNULL(totals.pageviews, 0) AS pageviews
FROM
bigquery-public-data.google_analytics_sample.ga_sessions_*
WHERE
_TABLE_SUFFIX BETWEEN '20160801' AND '20170631'
LIMIT 100000;

Explanations:

  • bqml_lab is the dataset, sample_model is the model name.
  • We’re using binary logistic regression (model_type=’logistic_reg’).
  • label is what we aim to predict (purchases).
  • Features include device OS, mobile status, country, and pageviews.

3. Evaluate your model:

  • Replace the previous query with the following and click Run:
SELECT
*
FROM
ML.EVALUATE(MODEL `bqml_lab.sample_model`, (
SELECT
IF(totals.transactions IS NULL, 0, 1) AS label,
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(geoNetwork.country, "") AS country,
IFNULL(totals.pageviews, 0) AS pageviews
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170801'));
  • When the query is complete, click the Results tab below the query text area. You should see a table similar to this:

Explanations:

  • Want to know how well your model performs? Check out these key terms: precision, recall, accuracy, f1_score, log_loss, roc_auc: You can consult the machine learning glossary for definitions.

4. Use your model to predict outcomes

  • With this query you will try to predict the number of transactions made by visitors of each country, sort the results, and select the top 10 countries by purchases:
SELECT
country,
SUM(predicted_label) as total_predicted_purchases
FROM
ML.PREDICT(MODEL `bqml_lab.sample_model`, (
SELECT
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(totals.pageviews, 0) AS pageviews,
IFNULL(geoNetwork.country, "") AS country
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))
GROUP BY country
ORDER BY total_predicted_purchases DESC
LIMIT 10;
  • When the query is complete, click the Results tab below the query text area. The results should look like the following:

🎊Congratulations! You used BigQuery ML to create a binary logistic regression model, evaluate the model, and use the model to make predictions.

Outro

Recap key takeaways:

  • BigQuery ML allows you to build machine learning models using SQL.
  • No specialized machine learning expertise is required.
  • BQML models are easily integrated into your existing BigQuery workflows.

Call to Action/Next Steps

  • Explore other BQML model types: Experiment with classification, time series forecasting, and more.
  • Dive deeper into model evaluation and optimization techniques.

👋 By the way, if you happen to be a startup owner who is actively seeking to propel your business to new heights with Cloud:

We invite you to join our exclusive virtual live workshops (links below), where you’ll gain hands-on guidance from Google Cloud experts and discover how to seamlessly integrate GCP into your operations. Don’t miss this limited-time opportunity to empower your startup with the knowledge and expertise needed to thrive in the cloud-driven world. Register now and secure your spot!

⚒️Startup Onboarding Workshop

⛑️Generative AI Workshop

🔑Data Analytics Workshop

🔐Security Workshop

📠Modern Applications Workshop

--

--

Brian Ling
Google Cloud - Community

Engineer at Google. Passionate about AI/ML, Data Analytics