Training and deploying machine learning models on GCP ML-Engine using Tensorflow Estimators

What is Google Cloud ML-Engine

Monark Unadkat
Dec 26, 2018 · 5 min read
Image result for ml engine

1. Training

Part 1 → Getting the dataset

The dataset I am using here is Kaggle’s House Price Prediction dataset. You can download it from here.

Part 2 → Creating the python package

File tree showing the files we need to create.
  1. loads data from the location specified and applies preprocessing logic.
  2. Calls the model training logic located in with said parameters.
  • get_args → it defines all the arguments required from the user such gcs-job-dir, data-file-location, etc.
  • load_data → this functions first downloads the data from gcs. All the preprocessing is done in this function. the function returns train-test data.
  • train_and_evaluate → The function defines all the estimator specs and exporter information and finally calls tf.estimator.train_and_evaluate function to start the training job.
  1. input_fn() which is used for passing input to your model. We will use tensorflow’s dataset api to create a dataset iterator. It returns a dataset iterator object.
  2. serving_input_fn() defines the features to be passed to the model during inference, for ex-> Tensorflow placeholders. It takes no arguments and returns an InputFnOps (tf.estimator.export.ServingInputReceiver).

2. Training

I. Training Locally

Before actually submitting the training job on the cloud, you can test your package locally on a dummy dataset and check for any errors and debug it. To start a training job on your local machine, you can either use python or gcloud command.

$ export JOB_DIR=/path/to/the/dir/
$ rm -rf $JOB_DIR
$ export TRAIN_FILE=/path/to/training/file
#TRAIN_FILE could either be a path to your local file or a gcs location.$ python -m trainer.task \
--train-file=$TRAIN_FILE \
$ gcloud ml-engine local train --module-name=trainer.task \
--package-path=trainer \
--train-file=$TRAIN_FILE \

II. Submitting the job to the cloud

If everything works fine, you can submit a training job to the cloud ML-Engine. You will need to move your training data to GCS bucket. After the job is submitted, it will start the training of your model using the dataset you provided and will save the model checkpoints and tensorflow SavedModel to the GCS_JOB_DIR which you can then use to deploy the model for serving purposes. To submit a training job to the cloud, run the following command.

$ export JOB_NAME=housing_job_1
$ export GCS_JOB_DIR=gs://cloud-ml-job-bucket
$ export TRAIN_FILE=gs://cloud-ml-data-storage-
$ export REGION=us-central1

3. Deploying

Now as your model is successfully trained, it is time to deploy your model to production so that other people can use that model.

$ export MODEL_NAME kaggle_housing_price_prediction
$ export MODEL_PATH=gs://cloud-ml-job-
$ gcloud ml-engine models create $MODEL_NAME
$ gcloud ml-engine versions create "version_1" --model $MODEL_NAME -
-origin $MODEL_PATH

4. Predictions

For serving predictions, you will need to prepare your data and export it to a json file so that it can be used by our deployed model.

with open('test_data.json', 'w') as outfile:
json.dump(test, outfile)
$ gcloud ml-engine predict --model kaggle_housing_price_prediction - 
-version version_1 --json-instances test_data.json

Wrapping Up

I have explained the steps to define, train and deploy your custom machine learning models on cloud ml-engine. Overall the steps are pretty straightforward. I think GCP ML Engine is surprisingly accessible and flexible. Everything starting from loading the dataset, preprocessing the data, training the model and exporting the trained model, all happens on the ml-engine which auto-scales based on the resources required. No matter how big our preprocessing computations are or how big our model or dataset is, all is completely managed by the ML-Engine.


Searce Engineering

We identify better ways of doing things!

Monark Unadkat

Written by

Searce Engineering

We identify better ways of doing things!