Artificial Intelligence on Google Cloud Platform
There is no reason beyond doubt that the future of AI is on the cloud. Cloud along with data fueling knowledge of the business brings in a new degree of accessibility to AI technology
Why cloud in general for AI?
Scale — Instant access to hundreds of compute instances
Speed — Easy availability of specialized device like (GPU/TPU) that can help accelerate AI development.
Cloud AI API’s — Quick jump start into complex activities rather build from scratch. For cases like speech to text or language translation, enterprise as well might lack data to build models with high accuracy as available in the cloud
Cloud AutoML — Train high-quality models specific to business needs with a citizen data scientist or even by business users
Cloud Bursting — With advances in Hybrid Cloud, start small in the local data center and use the cloud to scale AI compute
In this article, we are going to focus on AI and related services offered by Google cloud platform. Let us start by looking at google cloud AI building blocks post a recent announcement in Next’19
Some of the key new additions to Google Cloud Platform AI capability was the introduction of AI platform that enables seamless creation of End to End machine learning pipeline, AutoML tables to automatically build and deploy machine learning models on structured data and support for new ML algorithms part of Big Query ML
Below is a summary of AI capability added or enhanced as part of Google next 19 announcement
If data scientists are life blood of today’s data driven enterprise then data engineers are the veins carrying clean blood for machine learning algorithms to be useful
AI development and Training is a relatively small fraction of the entire end to end Machine Learning life cycle. Data Ingestion, Data Engineering, Feature Engineering, Data Analysis and Validation, Feature Engineering, Model performance monitoring and deploying the model is where a typical Data Engineer + Data Scientist spend 90+% of the time.
While this article is focused on AI capability let us quickly check on how different but integrated Google services makes an end to end ML possible
An interesting aspect of these services is on how well they integrate with each other to create a seamless pipeline. One example is Tensorflow Transform, which uses full passes on input data during model training and is exported as Tensorflow graph to do prediction on single instances during serving. This prevents from training serving skew as same transformations are applied during both stages
Let us now discuss on GCP key capabilities recently announced in google NEXT
AutoML tables
AutoML Tables enables an entire team of data scientists, analysts, and developers to automatically build and deploy state-of-the-art machine learning models on structured data at massively increased speed and scale. Every aspect of ML is really automated starting from
- Detecting schema and class distribution
- Helps detect missing value and outliers
- Codeless interface, making it easier for a wide range of personas to build models not only data scientists
- Seamless deployment of machine learning models
- Enables model interpretation using model output and feature importance graph
Below diagram summarizes the simplicity of AutoML tables. Once you have your model dataset most of the activity is UI guided with minimal or no coding
AutoML tables also support automated feature engineering for most data types
Currently, it runs models for below algorithms against input dataset based on selected configuration parameters
- Logistics and Linear Regression
- Feedforward Deep Neural Network
- Wide and Deep NN
- Gradient Boosted Decision Trees (GBDT)
- DNN + GBDT Trees
Based on the complexity of data it might also run Neural + Tree Architecture Search
BigQuery ML
BigQuery ML brings ML to the data. Models are trained and accessed in BigQuery using SQL. BigQuery ML democratizes the use of ML by empowering data analysts, the primary data warehouse users, to build and run models using existing business intelligence tools and spreadsheets.
While using AutoML tables no knowledge of ML is required with BigQuery ML basic understanding of ML is essential.
A nice illustration of different ML capability along with user personas. Note Cloud ML Engine is now called AI Platform training
Building model in BigQuery ML is a simple 3 step process
Step 1: Create a Model
CREATE MODEL `bqml_tutorial.sample_model`
OPTIONS(model_type='logistic_reg') AS
SELECT
IF(totals.transactions IS NULL, 0, 1) AS label,
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(geoNetwork.country, "") AS country,
IFNULL(totals.pageviews, 0) AS pageviews
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN '20160801' AND '20170630'
Step 2: Evaluate the created model
SELECT
*
FROM
ML.EVALUATE(MODEL `bqml_tutorial.sample_model`, (
SELECT
IF(totals.transactions IS NULL, 0, 1) AS label,
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(geoNetwork.country, "") AS country,
IFNULL(totals.pageviews, 0) AS pageviews
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))
Step 3: Predict using Final model
SELECT
country,
SUM(predicted_label) as total_predicted_purchases
FROM
ML.PREDICT(MODEL `bqml_tutorial.sample_model`, (
SELECT
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(totals.pageviews, 0) AS pageviews,
IFNULL(geoNetwork.country, "") AS country
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))
GROUP BY country
ORDER BY total_predicted_purchases DESC LIMIT 10
Model performance and metrics can be tracked using BigQuery UI. UI provides details on confusion matrix, ROC curve, precision/recall matrix among others
And finally, most of the below happens behind the scenes during the 3 step model creation
Below is the algorithm support and road map as highlighted in Google Next’19
AI Platform
AI Platform provides seamless creation of end to end ML pipeline starting from ingesting data to preparing, discovering, training and deploying ML models. Below images summarize AI Platform end to end process
AI Platform comes with managed notebook instance which is integrated with BigQuery, Cloud Dataproc, and Cloud Dataflow, making it easy to go from data ingestion to pre-processing and exploration, and eventually model training and deployment
AI Platform supports Kubeflow, that lets you build portable ML pipelines that you can run on-premises or on Google Cloud without significant code changes. Below is the services available as part of AI Platform that helps build an end to end machine learning pipeline
One will also have access to AI technology like TensorFlow and Tensorflow Extended (TFX) tools as you deploy your AI applications to production. In case if you want to know more details on TFX check my multi-part series on this topic
Keep watching for future series of TFX...
There were few other announcements in AI space. I will give a quick rundown of key announcements
AutoML Natural Language — Custom entity extraction lets you identify custom fields from the input text
AutoML Vision — Object detection, detects multiple objects and provided bounding box coordinates
Cloud Solution AI — Introduced Recommendation and Document Understanding AI
Document understanding AI enables companies to digitize, classify, and extract knowledge. It also helps to organize and store knowledge graphs and other extracted data for easy search, query, consumption, and actionable insights
Nice representation of document understanding AI solution architecture is below
Few other products where there was new enhancements or capabilities are highlighted below. You can check the references section below to get more information on newly added features
References
International AI experts will be discussing on — How large organisations like Google are adopting AI with Cloud to improve decision making for data supported businesses at the World AI Show on 24 July in Singapore.
Register your seats by using the coupon code — DDI-AI10 to save 10% on your tickets.