Integrating BigQuery ML & Tableau

Matt Collins
Oct 4, 2018 · 6 min read

Using BigQuery ML and Tableau for dynamic model predictions.

Source: rawpixel on Unsplash

Data Science Democratization

A trend that has emerged over the past couple of years in the analytics space is that of democratizing data science. In other words, enabling business users to garner insights from data that, many years ago, might have taken a room full of experts to extract.

At Google Next ’18, Google announced a new entrant in this democratization race — BigQuery ML. The ML component builds onto BigQuery, Google’s serverless data warehouse. As Google tells it:

BigQuery ML enables users to create and execute machine learning models in BigQuery using standard SQL queries. BigQuery ML democratizes machine learning by enabling SQL practitioners to build models using existing SQL tools and skills. BigQuery ML increases development speed by eliminating the need to move data.
Google Cloud Platform Website

This is an exciting prospect, as it will enable those who are familiar with SQL, but who may have never tried an object-oriented programming language such as Python or R, to utilize machine learning.

According to StackOverflow’s 2018 Developer Survey, in which over 100,000 developers responded to questions about their experiences and backgrounds, SQL ranked as the 4th most commonly used programming language. This is compared to Python at #7 and R at #19.

Source: StackOverflow

Given these results, it would appear that a technology like BigQuery ML could go a long way in enabling a large number of users.

BigQuery ML pairs well with another popular analytics tool — Tableau. Connecting a BigQuery ML model to Tableau creates some exciting opportunities for understanding and acting on machine learning predictions.

In this post, we’ll look at:

  • Creating a model in BigQuery
  • Understanding predictions in Tableau
  • Using Tableau parameters for dynamic predictions

To demonstrate a business use case, let’s explore using BigQuery ML to create a model to predict customer churn. We’ll use logistic regression to predict a binary value of whether we expect a given customer to discontinue service. Then we’ll use Tableau to understand the predictions and see how different actions might affect those predictions.

(Data used is from IBM Analytics Communities Sample Data Sets)

Creating a Model in BigQuery

Creating a machine learning model in BigQuery only requires a few commands in addition to a typical SELECT statement.

Here’s what the syntax looks like:

The bare minimum needed here is: naming the model; selecting linear or logistic regression; and setting the label (unless its actual name is already “label”). Not too bad, eh?

Following the example above, the code for our model would be as follows:

Here, we are training the model on every feature in our dataset except for our ID field — customerID.

Note: This post will not cover some very important stages of the machine learning process such as feature engineering, hyper-parameter tuning, cross validation or model performance monitoring. We will only cover what is needed to stand up an initial model.

After running the query, BigQuery allows you to view information on your model such as size, training statistics, and the model schema:

Understanding Predictions in Tableau

Now that we have a working model to predict customer churn, we can bring it into Tableau and predict outcomes for a new sample of customers.

The dataset we want to predict is also loaded into BigQuery and contains data on separate customers from those in our training data. We’ll bring this data into Tableau and use Tableau’s Custom SQL feature to call our model and predict churn for this new subset of customers.

The above Custom SQL query is all it takes to call our model and use it to make predictions on our new test data. The first column in the data preview window — Churn? — is our newly predicted value.

Now that we have predicted values, we can use Tableau to visualize these predictions and see how they relate to our data.

Using the predicted values, I created a simple dashboard to view some metrics related to predicting customer churn:

Looking at this dashboard, we can see that about 17.2% of our customers are expected to churn the next cycle. We’ll look at how different actions might effect these predictions a bit later.

Another thing you’ll notice in the dashboard above is that all of the customers predicted to churn are on month-to-month contracts. Additionally, looking at churn rates by tenure at the bottom right of the dashboard, we can see that we expect to lose almost half — 44% — of our customers that have a tenure between zero and six months. This rate drops significantly, however, once you pass the 6-month and 1-year tenure mark.

Based on this knowledge, it seems that converting subscribers off month-to-month plans and into plans that result in them staying longer than one year could have an impact on our predicted churn rates.

Using Tableau Parameters for Dynamic Predictions

Given that we want to get customers into longer-term contracts, an option may be to offer a special 1- or 2-year promotional pricing plan targeted to current month-to-month customers. Using parameters in Tableau, we can dynamically see how a response like this would impact our predicted churn.

Below, I have created another version of the dashboard we looked at earlier:

On the right-hand side, there are now two parameter options to select. The options I’ve included here are:

  • % to Convert — the % of current month-to-month customers we expect to convert based on previous promotion results
  • Convert to Contract — the contract length we will offer to current month-to-month customers

Let’s see how changing the parameters can impact our predictions:

As you can see above, converting users off of month-to-month contracts has a positive impact on reducing our predicted churn rates. Let’s say a conservative estimate would be to convert 20% of our month-to-month customers to a 1-year contract. Using those parameters, we could expect to see our overall predicted churn rate drop from 17.2% to 14.9%.

Bigger Picture

In closing, the capabilities illustrated in this post provide a glimpse of how ML can be combined with other elements of BI and how these capabilities will continue to shape the analytics landscape. The integration of tools like BigQuery ML and Tableau provide an exciting outlook on the current state of machine learning and its continued march into the forefront of business intelligence and decision making.

Matt Collins

Written by

Data Science @ Slalom | NBA Enthusiast

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade