Why BigQuery ML is Such a Big Deal

Spiros P
Reblaze Blog
Published in
4 min readJul 27, 2018

During the Wednesday morning keynote at Google Cloud Next ’18, Google had several major product announcements, including BigQuery Machine Learning (BQML). It’s an extremely important innovation in Machine Learning (ML).

Reblaze Technologies played a significant role in this announcement, in more ways than one. For example, later in that same keynote, Google announced Cloud Armor — a robust new framework for web application security. That announcement included a live DDoS attack on a site protected by Reblaze and Cloud Armor. And although it wasn’t mentioned during the demo, Reblaze was using BQML as part of its threat detection engine.

Back to the BQML announcement itself. Months ago, Google invited Reblaze to participate in the BQML closed pre-alpha. Reblaze’s use cases are excellent examples of mission-critical, real-time analyses that push the limits of what’s possible with Machine Learning. (More on this below.)

During the alpha, Reblaze personnel submitted numerous feedback items and feature requests. As of Wednesday, BQML was ready for, and has been released into, public beta.

So what is BQML? Imagine a version of SQL that makes a Big Data trove as accessible as any other backend database, and it allows you to easily apply Machine Learning to it as well, without needing to worry about all the underlying complexity of getting the ML process into place. That’s the power available via BQML.

Serverless Big Data capabilities have been available for some time now. (Google Data Lake and BigQuery are obvious examples of these.) They offer many benefits:

  • Users can capture and access as much data as desired, without needing to design, build, or maintain any infrastructure.
  • Distributed storage provides redundancy, scalability, global access, and other advantages.
  • All the other advantages of SaaS, including a low cost structure, remote management, and more.

These capabilities have made it easy to capture and store enormous quantities of data. The difficulty arises when actually trying to make use of that data, and analyze it to obtain useful business insights.

Now with BQML, Google is automating the application of Machine Learning to Big Data troves. Now there’s a single environment for analytics, feature engineering, storing training data, and running predictions — and its syntax is based upon SQL. If you can use SQL, you will feel very comfortable with BQML.

Using BQML, the overall process of using ML is much smoother. Everything is stored in the same place: source data, training data, and results. Everything is run in the same environment: training, feature engineering, and predictions.

This new product is not merely a great tool for data scientists — it also offers tremendous benefits to real-world applications. For example, Reblaze is a cloud-based web application security platform. It uses ML to constantly analyze global web traffic, processing over three billion http/s transactions per day in order to recognize and harden itself against new attack patterns. Obviously, any innovation in ML can be beneficial for a use case such as this.

But Reblaze has additional challenges that make BQML especially useful. Reblaze is a single-tenant platform: unlike other cloud security solutions, Reblaze provides a unique Virtual Private Cloud for every account. (This eliminates the multi-tenancy vulnerabilities that other security providers suffer from.) Each customer gets an entire dedicated stack, for that customer’s exclusive use alone.

Therefore, in addition to a unique cluster for real-time traffic processing, each customer also requires the provisioning of an entire ML facility, including Kubernetes clusters for data analysis, the processes of loading data from BigQuery and analyzing it, pushing back the results, and so on.

BQML makes this unnecessary. Each customer can now receive the benefits of Machine Learning for adaptive, accurate threat detection, without needing a separate environment for it. All the ML can occur within the same environment that the data itself is stored in. And it can all be automated, and operated programmatically.

(Note: this isn’t yet universal. Currently, BQML only supports two data models: Linear Regression with Regularization, and Binary Logistic Regression with Regularization. Until clustering is added to BQML — hopefully soon! — Reblaze will still use our own facilities for this.)

That’s an overview of some of the benefits that BQML offers for ML-intensive companies such as Reblaze. Stay tuned for future articles, where we’ll look at specific examples and in-depth case studies of our use of BQML.

If you’re a Reblaze customer, full BQML support is going out now in rolling updates.

If you aren’t currently a Reblaze customer, you can try Reblaze for free via Google Cloud Marketplace, where you can deploy Reblaze directly from your Cloud Console in just a few clicks.

Or, you can learn more about Reblaze by contacting our sales team. They’ll be happy to schedule a demo, or answer whatever questions you have. Just send an email to hello@reblaze.com. Or, fill out the form at https://www.reblaze.com/contact/. Someone will be in touch with you soon.

--

--