Hyperparameter tuning BigQuery ML

BigQuery ML can use Vertex AI to tune common model parameters

Lak Lakshmanan
Google Cloud - Community
2 min readJul 13, 2021

--

What is BigQuery ML?

BigQuery ML allows you to quickly train ML models on data in BigQuery. For example, suppose you want to train a linear ML model to predict the duration of a bicycle rental, you can do that using:

CREATE OR REPLACE MODEL ch09eu.bicycle_model_linear
OPTIONS(
model_type='linear_reg', input_label_cols=['duration']
)
AS
SELECT
start_station_name,
CAST(EXTRACT(DAYOFWEEK FROM start_date) AS STRING) AS dayofweek,
CAST(EXTRACT HOUR FROM start_date) AS STRING) AS hourofday,
duration
FROM `bigquery-public-data.london_bicycles.cycle_hire`

BigQuery ML also supports feature engineering that will be automatically repeated during inference. A slightly more sophisticated deep neural network model might be:

CREATE OR REPLACE MODEL ch09eu.bicycle_model_dnn
TRANSFORM(
start_station_name,
CAST(EXTRACT(DAYOFWEEK FROM start_date) AS STRING) AS dayofweek,
ML.BUCKETIZE(EXTRACT(HOUR FROM start_date), [0, 6, 12, 18, 24]) AS hourofday,
duration
)

OPTIONS(
model_type='dnn_regressor', input_label_cols=['duration'],
hidden_units=[32, 8],
learn_rate=0.1,
dropout=0.25,
)
AS
SELECT
start_station_name, start_date, duration
FROM `bigquery-public-data.london_bicycles.cycle_hire`

Hyperparameter tuning

But what if you want to experiment with different different learning rates? This is called hyperparameter tuning and you can specify ranges or candidate values for the model parameters:

CREATE OR REPLACE MODEL ch09eu.bicycle_model_dnn_hparam
TRANSFORM(
start_station_name,
CAST(EXTRACT(DAYOFWEEK FROM start_date) AS STRING) AS dayofweek,
ML.BUCKETIZE(EXTRACT(HOUR FROM start_date), [0, 6, 12, 18, 24]) AS hourofday,
duration
)
OPTIONS(
model_type='dnn_regressor', input_label_cols=['duration'],
num_trials=10,
hidden_units=hparam_candidates([struct([32,8]), struct([32]), struct([64,32,4])]),
learn_rate=hparam_range(0, 0.5),
dropout=hparam_candidates([0, 0.1, 0.25, 0.4])

)
AS
SELECT
start_station_name, start_date, duration
FROM `bigquery-public-data.london_bicycles.cycle_hire`

BigQuery ML will use Vertex AI’s hyperparameter tuning service to choose the best parameters within the budget (here 10 trials) that you specify.

Note that each model type only allows a few of the parameters to be tuned. For example, for DNN models, you can tune the batch_size, dropout, learn_rate, optimizer, etc.

What if you want to change the bucket intervals? Or try with and without individual features? In that case, you should follow the approach in this article and build a Python scaffold. Such a Python scaffold will give you a lot of flexibility.

For tuning basic model parameters, however, use the built-in integration to Vertex AI.

Invoking best model

Once the training is complete, you can look up the evaluation metrics for each of the trials:

SELECT * FROM ML.EVALUATE(MODEL ch09eu.bicycle_model_dnn_hparam) 
ORDER BY mean_absolute_error ASC

and invoke the best model using:

SELECT * FROM ML.PREDICT(MODEL ch09eu.bicycle_model_dnn_hparam,
(
SELECT
CURRENT_TIMESTAMP() AS start_date,
'Waterloo Station 1, Waterloo' AS start_station_name
)
)

--

--

Lak Lakshmanan
Google Cloud - Community

articles are personal observations and not investment advice.