Google Cloud Platform for SQL Practitioners

Mike Kahn
Google Cloud - Community
6 min readJun 6, 2019
Image borrowed from the Google Cloud Platform blog

The purpose of this article is to give an overview of the SQL features in Google Cloud Platform (GCP). This article is intended for anyone who currently uses SQL and is looking to understand the current options for using their SQL skills on Google Cloud Platform.

MS SQL Server on Windows Server

If you want to run MS SQL Server on GCP you have a few options, IaaS BYOL, IaaS, and fully managed. A few benefits of running MSSQL Server on Windows Server on GCP:

Bring your own license (IaaS BYOL)

If your organization has license requirements that limit physical hardware usage or a service provider license agreement (SPLA), you can use Google Compute Engine (GCE) sole-tenant nodes. These instances allow you to import a custom image into GCE, use it to start an instance, and enable in-place restarts so the VM restarts on the same physical server. By using sole-tenant nodes you ensure your VMs run on hardware fully dedicated to your use while limiting physical core usage. To prove your server usage for reporting Stackdriver monitoring allows determine server usage for reporting and export server IDs. More on using existing Microsoft Application Licenses.

Pay license with instance cost (IaaS)

If you do not have a physical hardware license requirement you can pay for SQL Server on Google Cloud Platform with your instance cost. This is the easiest way to get started with SQL Server on GCP with no upfront costs and per second billing. This is your flexible license option that allows you to pay for licensing the same way you pay for cloud infrastructure, pay as you go, pay only for what you use.

SQL Server Images on Google Cloud Platform (June 2019)

Cloud SQL for SQL Server (Fully Managed)

If you are looking to run SQL server in a fully managed service where the provider takes care of managing the instance for you Cloud SQL for SQL Server is being released and is in alpha as of writing this article. Cloud SQL takes care of backups, replication, patches and updates, and has features such as read replicas for scaling out. Cloud SQL has limits of 10TB for storage, so if you need beyond 10TB you’ll need to consider other options (or clean up your DB :)).

More on SQL server on Google Cloud Platform here.

Data Warehousing and Processing with SQL

We previously covered MSSQL server relational transaction processing use cases for SQL. Now we will cover analytics processing with SQL on GCP.

Bigquery is a very powerful serverless, highly scalable, data warehouse with in-memory BI Engine and machine learning built in on Google Cloud Platform. The idea with BigQuery is to focus on the analytics, not your infrastructure.

Enterprises use BigQuery to unlock insights and run blazing-fast SQL queries on gigabytes to petabytes of data. To learn more about BigQuery check out the product page here. We’ll go over some of the more specific features of BigQuery that may be interesting to SQL practitioners in the remainder of this article.

BigQuery Standard SQL

If you are an experienced SQL administrator you can use your SQL experience to operate BigQuery. BigQuery has a mode called Standard SQL which is compliant with the SQL 2011 standard, supports nested and repeated data, allows for user defined functions, and DML.

If you are a SQL administrator you can use SQL queries for data transformation, analytics, and even machine learning in BigQuery.

The syntax and features of both MS SQL Server and Standard SQL are very similar and in most cases you will be changing a function or data type such as DATEADD vs DATE_ADD in your query.

Sqllines.com has a good resource of SQL Server to MySQL migration reference here.

SQL server:

WHEN (PurchaseDate < DATEADD(MM,+1,GETDATE()) AND (PERIOD_DATE__A >= DATEADD(MM,+1,GETDATE()) OR A.PERIOD_DATE__A IS NULL))

SQL Standard:

WHEN (DATE(PurchaseDate) < DATE_ADD(Current_Date, Interval 1 month) AND ( a.PERIOD_DATE__A >= CAST(DATE_ADD(current_date, Interval 1 month) AS string) OR a.PERIOD_DATE__A = ‘’) )

More on Standard SQL Reference here.

BigQuery Machine Learning (BQML)

BigQuery ML (BQML) enables users to create and execute machine learning (ML) models in BigQuery using Standard SQL queries. BQML makes ML accessible to data analysis teams and enables SQL practitioners to build models using existing skills.

ML on large datasets typically requires python experience and knowledge of ML frameworks. These requirements have been restrictive in organizations to a small group of individuals and in many cases organizations just do not have these skill sets yet in existing data teams. Many of these organizations have data analysts who understand their companies data but have limited or aspiring machine learning and programming skillsets.

Analysts can use BigQuery ML to build and evaluate ML models in BigQuery. Analysts no longer need to export small amounts of data into spreadsheets or other applications or wait for limited resources from a data science team.

So what type of insights and machine learning can be done within BigQuery by SQL practitioners? To start understand in BigQuery ML, a model can be used with data from multiple BigQuery datasets for training and prediction.

Types of machine learning models that are supported by BigQuery ML:

  • Linear regression for forecasting. Example, sales of an item on a given day in the future.
  • Binary logistic regression for classification. Example, determining whether a customer will make a purchase.
  • Multiclass logistic regression for classification. Example, predict where an input is low, medium, or high-value.
  • K-means clustering for data segmentation. Example, identifying customer segments.

After you have a model (dataset) and you have ran the ML.EVALUATE function query, you can use your model to predict outcomes using the ML.PREDICT function.

Example query:

#standardSQL
SELECT
country,
SUM(predicted_label) as total_predicted_purchases
FROM
ML.PREDICT(MODEL `bqml_tutorial.sample_model`, (
SELECT
IFNULL(device.operatingSystem, “”) AS os,
device.isMobile AS is_mobile,
IFNULL(totals.pageviews, 0) AS pageviews,
IFNULL(geoNetwork.country, “”) AS country
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN ‘20170701’ AND ‘20170801’))
GROUP BY country
ORDER BY total_predicted_purchases DESC
LIMIT 10

Result:

This is giving you total predicted purchases based on Google analytics data.

More on BigQuery ML here.

Cloud Dataflow SQL

Cloud Dataflow SQL lets you use SQL queries to develop and run Cloud Dataflow Jobs in the BigQuery UI. Dataflow SQL uses Beam SQL and this is big for SQL practitioners because previously you could only write Apache Beam or Dataflow pipelines with Java, Python, or Go languages. This functionality opens up a whole new area of data processing for SQL practitioners to explore.

What you can do with this:

  1. Develop and run streaming pipelines from the BigQuery UI in SQL
  2. Join streams (from Cloud PubSub) with snapshotted datasets (BigQuery tables)
  3. Write results into a BigQuery table for analysis and dashboards

Currently you can only read from a PubSub topic or BigQuery table and write to a BigQuery table however I am sure more data sources and destinations will be enabled in the future.

Some possible use cases for Cloud Dataflow SQL

  • Join application events in AppEngine via PubSub with other monitoring system datasets in BigQuery for richer system insights
  • Join the Google Analytics Real Time Reporting API with CRM or sales table data in BigQuery for marketing analytics

If you are experienced with SSIS and looking for a UI based ETL service, also consider Cloud Data Fusion. While it does not include SQL capabilities at this time it is a code free GCP native ETL pipeline service.

This article was intended to give an overview of some of the SQL options in GCP to show how SQL is supported in many ways on Google Cloud. Lastly, this week Google Cloud databases named a Leader in The Forrester Wave Database-as-a-Service, Q2 2019.

I hope this post was helpful to show some of the options that SQL practitioners can try on Google Cloud Platform!

--

--

Mike Kahn
Google Cloud - Community

Field Engineering Manager, Databricks. All views and opinions are my own. @mkahn5