Machine Learning in Google Cloud with BigQuery
BigQuery ML : Machine learning with Standard SQL
BigQuery ML lets you create and execute machine learning models in igQuery using standard SQL queries. BigQuery ML democratizes machine learning by letting SQL practitioners build models using existing SQL tools and skills. BigQuery ML increases development speed by eliminating the need to move data.
If you are new/want to explore GCP for first time, then use google sandbox account which is free and login with your account information and can access GCP toolkit with limited resource.
I would like to dive directly into the model development in BigQuery ML , as we discuss the basic points in the previous article on Quickstarts : BigQuery and BigQuery ML.
In this article we will be discussing:
- Steps to Create Models using Google BigQuery ML
- Sample models in BigQuery ML
- Benefits of BigQuery ML
1. Steps to Create Models using Google BigQuery ML :
So as we do in traditional ML , we can do all the below steps in BigQuery ML:
- Create and Explore the dataset
- Create/Train the model
- Evaluate the model
- Predict using the model
1.1 Create Dataset
Create the dataset in cloud console in order to use it for machine learning.I have explained this part in my previous article or more info can be found here.
1.2 Create / Train Model
As shown below write the query starts with ‘CREATE MODEL’ and in ‘OPTIONS’ we can set up the parameters of model such as model_type,learning rate etc.
1.3 Evaluate Model
Use ‘ML.EVALUATE’ function to provide statistics about model to evaluate the model,the below example shows the query returns the error matrics such as mean-absolute-error, mean-squared-error etc
1.4 Predict using Model
Use ‘ML.PREDICT’ to predict on given data from the table in BigQuery.
BigQuery ML Official interactive tutorial Documentation
2. Sample models in BigQuery ML :
As mentioned in my last article we can solve Regression, Classification, clustering, time series problems with BigQuery ML
Here we will see linear regression and logistic regression with the code snippet/query sample.
2.1 Linear regression
The CREATE MODEL
clause is used to create and train the model named bike_model.model
The OPTIONS(model_type='linear_reg', input_label_cols=['duration'])
clause indicates that you are creating a linear regression model. A linear regression is a type of regression model that generates a continuous value from a linear combination of input features.
Below query will be used to predict with the model that we trained in the above step.
#Predict with model
SELECT
*
FROM
ML.PREDICT(MODEL `bike_model.model`,
(
SELECT
*
FROM
`bigquery-public-data`.london_bicycles.cycle_hire
WHERE
duration IS NOT NULL
))
Use below query to evaluate your model:
#To Evaluate the model
SELECT
*
FROM
ML.EVALUATE(MODEL `bike_model.model`,
(
SELECT
*
FROM
`bigquery-public-data`.london_bicycles.cycle_hire
WHERE
duration IS NOT NULL))
BigQuery ML Linear Regression official documentation for more info.
2.2 Logistic regression
You can create and train a logistic regression model using the CREATE MODEL
statement with the option 'LOGISTIC_REG'
. The following query uses a CREATE MODEL
statement to train a new logistic regression model
Now the following code will return the evaluation metrics for the model. The metrics are “precision, “recall”, “accuracy”, “f1_score”, “log_loss”, “roc_auc”. The code is given below.
#Predict with model
SELECT
*
FROM
ml.EVALUATE(MODEL `bqml_codelab.sample_model`, (
SELECT
IF(totals.transactions IS NULL, 0, 1) AS label,
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(geoNetwork.country, "") AS country,
IFNULL(totals.pageviews, 0) AS pageviews
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170801'));
Use below query to predict using the final model:
#To Evaluate the model
SELECT
country,
SUM(predicted_label) as total_predicted_purchases
FROM
ml.PREDICT(MODEL `bqml_codelab.sample_model`, (
SELECT
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(totals.pageviews, 0) AS pageviews,
IFNULL(geoNetwork.country, "") AS country
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))
GROUP BY country
ORDER BY total_predicted_purchases DESC
LIMIT 10;
BigQuery ML Logistic Regression official documentation for more info.
3. Benefits of Google BigQuery ML :
Google BigQuery ML is fast to create, evaluate and execute various Machine Learning models easily using standard SQL queries. It offers many benefits over other Cloud Data Warehouses. Some of the benefits are listed below.
- You need not to know Python or any other language for managing Machine Learning models. Analyst should be expertise in SQL can train models and make predictions.
- The data export involves many steps, and it’s a time-consuming process. Google BigQuery ML saves time and resources by letting users use Machine Learning models in Google BigQuery.
- It allows users to run Machine Learning models on large datasets within minutes as it uses computation resources of Google BigQuery Data Warehouse.
- It features some automated Machine Learning models that reduce the workload to manipulate data manually. It saves time and allows users to quickly train and test models on the dataset
Conclusion :
So in this article we discussed about how can we do machine learning in BigQuery using standard SQL. it can be easily integrated in the SQL query. It eliminates the restriction Data Analysts face for using Machine Learning models in their analysis. Companies can deploy models directly on their datasets to make better business decisions.We also discuss about the regression and classification problem and how can we train, predict and evaluate the model in BQ ML.With BigQuery ML we can also train machine learning model for time series , clusttering, Recommentation Engine.
Thanks for Reading
Connect with me on Linkedin
Find me on Github
Visit my technical channel on Youtube
Support: Buy me a Cofee/Chai
If you find this article helpful, please give your support by a clap and stay tuned to read more interesting articles in datasciene and machine learning discipline. See you all soon in my next post.