Artificial Intelligence on Google Cloud Platform

Srivatsan Srinivasan
DataDrivenInvestor

--

There is no reason beyond doubt that the future of AI is on the cloud. Cloud along with data fueling knowledge of the business brings in a new degree of accessibility to AI technology

Why cloud in general for AI?

Scale — Instant access to hundreds of compute instances

Speed — Easy availability of specialized device like (GPU/TPU) that can help accelerate AI development.

Cloud AI API’s — Quick jump start into complex activities rather build from scratch. For cases like speech to text or language translation, enterprise as well might lack data to build models with high accuracy as available in the cloud

Cloud AutoML — Train high-quality models specific to business needs with a citizen data scientist or even by business users

Cloud Bursting — With advances in Hybrid Cloud, start small in the local data center and use the cloud to scale AI compute

In this article, we are going to focus on AI and related services offered by Google cloud platform. Let us start by looking at google cloud AI building blocks post a recent announcement in Next’19

Some of the key new additions to Google Cloud Platform AI capability was the introduction of AI platform that enables seamless creation of End to End machine learning pipeline, AutoML tables to automatically build and deploy machine learning models on structured data and support for new ML algorithms part of Big Query ML

Below is a summary of AI capability added or enhanced as part of Google next 19 announcement

If data scientists are life blood of today’s data driven enterprise then data engineers are the veins carrying clean blood for machine learning algorithms to be useful

AI development and Training is a relatively small fraction of the entire end to end Machine Learning life cycle. Data Ingestion, Data Engineering, Feature Engineering, Data Analysis and Validation, Feature Engineering, Model performance monitoring and deploying the model is where a typical Data Engineer + Data Scientist spend 90+% of the time.

While this article is focused on AI capability let us quickly check on how different but integrated Google services makes an end to end ML possible

An interesting aspect of these services is on how well they integrate with each other to create a seamless pipeline. One example is Tensorflow Transform, which uses full passes on input data during model training and is exported as Tensorflow graph to do prediction on single instances during serving. This prevents from training serving skew as same transformations are applied during both stages

Let us now discuss on GCP key capabilities recently announced in google NEXT

AutoML tables

AutoML Tables enables an entire team of data scientists, analysts, and developers to automatically build and deploy state-of-the-art machine learning models on structured data at massively increased speed and scale. Every aspect of ML is really automated starting from

  • Detecting schema and class distribution
  • Helps detect missing value and outliers
  • Codeless interface, making it easier for a wide range of personas to build models not only data scientists
  • Seamless deployment of machine learning models
  • Enables model interpretation using model output and feature importance graph

Below diagram summarizes the simplicity of AutoML tables. Once you have your model dataset most of the activity is UI guided with minimal or no coding

AutoML tables also support automated feature engineering for most data types

Currently, it runs models for below algorithms against input dataset based on selected configuration parameters

  • Logistics and Linear Regression
  • Feedforward Deep Neural Network
  • Wide and Deep NN
  • Gradient Boosted Decision Trees (GBDT)
  • DNN + GBDT Trees

Based on the complexity of data it might also run Neural + Tree Architecture Search

BigQuery ML

BigQuery ML brings ML to the data. Models are trained and accessed in BigQuery using SQL. BigQuery ML democratizes the use of ML by empowering data analysts, the primary data warehouse users, to build and run models using existing business intelligence tools and spreadsheets.

While using AutoML tables no knowledge of ML is required with BigQuery ML basic understanding of ML is essential.

A nice illustration of different ML capability along with user personas. Note Cloud ML Engine is now called AI Platform training

Building model in BigQuery ML is a simple 3 step process

Step 1: Create a Model

CREATE MODEL `bqml_tutorial.sample_model`
OPTIONS(model_type='logistic_reg') AS
SELECT
IF(totals.transactions IS NULL, 0, 1) AS label,
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(geoNetwork.country, "") AS country,
IFNULL(totals.pageviews, 0) AS pageviews
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN '20160801' AND '20170630'

Step 2: Evaluate the created model

SELECT
*
FROM
ML.EVALUATE(MODEL `bqml_tutorial.sample_model`, (
SELECT
IF(totals.transactions IS NULL, 0, 1) AS label,
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(geoNetwork.country, "") AS country,
IFNULL(totals.pageviews, 0) AS pageviews
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))

Step 3: Predict using Final model

SELECT
country,
SUM(predicted_label) as total_predicted_purchases
FROM
ML.PREDICT(MODEL `bqml_tutorial.sample_model`, (
SELECT
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(totals.pageviews, 0) AS pageviews,
IFNULL(geoNetwork.country, "") AS country
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))
GROUP BY country
ORDER BY total_predicted_purchases DESC LIMIT 10

Model performance and metrics can be tracked using BigQuery UI. UI provides details on confusion matrix, ROC curve, precision/recall matrix among others

And finally, most of the below happens behind the scenes during the 3 step model creation

Below is the algorithm support and road map as highlighted in Google Next’19

AI Platform

AI Platform provides seamless creation of end to end ML pipeline starting from ingesting data to preparing, discovering, training and deploying ML models. Below images summarize AI Platform end to end process

AI Platform comes with managed notebook instance which is integrated with BigQuery, Cloud Dataproc, and Cloud Dataflow, making it easy to go from data ingestion to pre-processing and exploration, and eventually model training and deployment

AI Platform supports Kubeflow, that lets you build portable ML pipelines that you can run on-premises or on Google Cloud without significant code changes. Below is the services available as part of AI Platform that helps build an end to end machine learning pipeline

One will also have access to AI technology like TensorFlow and Tensorflow Extended (TFX) tools as you deploy your AI applications to production. In case if you want to know more details on TFX check my multi-part series on this topic

Keep watching for future series of TFX...

There were few other announcements in AI space. I will give a quick rundown of key announcements

AutoML Natural Language — Custom entity extraction lets you identify custom fields from the input text

AutoML Vision — Object detection, detects multiple objects and provided bounding box coordinates

Cloud Solution AI — Introduced Recommendation and Document Understanding AI

Document understanding AI enables companies to digitize, classify, and extract knowledge. It also helps to organize and store knowledge graphs and other extracted data for easy search, query, consumption, and actionable insights

Nice representation of document understanding AI solution architecture is below

Few other products where there was new enhancements or capabilities are highlighted below. You can check the references section below to get more information on newly added features

References

International AI experts will be discussing on — How large organisations like Google are adopting AI with Cloud to improve decision making for data supported businesses at the World AI Show on 24 July in Singapore.

Register your seats by using the coupon code — DDI-AI10 to save 10% on your tickets.

--

--