Same Classifier, Different Cloud Platform — Part 3: Google Cloud
This blog post is part of a series of articles about training and deploying the exact same model on three different cloud platforms: AWS, Azure and Google Cloud.
- Part 0: Introduction + Data Scraping
- Part 1: Amazon Web Services (SageMaker)
- Part 2: Azure (Azure ML)
- Part 3: Google Cloud (AI Platform)
Code available on Github.
The first thing you want to do is create a Google Cloud account. When you create your account, you get $300 of free credits to use during a one year period, which I believe is the best deal between all three cloud platforms. Once you have created an account, you need to either create or join an existing project. Once you have selected a project, make sure to link the project to a billing account and to enable the needed APIs for model training and deployment, which are: Cloud Machine Learning Engine and Compute Engine API.
To interact with Google Cloud from your computer you will need to download the Cloud SDK, available on this page. This will allow you to interact with Google Cloud from your terminal. Once you have downloaded and installed the Google Cloud SDK, initialize it by entering the following command in your terminal:
The program will ask you your login information and the project you want to work on. Once it’s done, you are finally linked to your account.
Same as for AWS and Azure, for the model to access your data during training, you need to upload the data on their storage platform. Here we use Google Storage.
To interact with Google Storage from your computer, you can use gsutil in your terminal, which is installed by default when installing the Cloud SDK.
First create a bucket by specifying its region and its bucket name (founder-classifier-storage) :
gsutil mb -l europe-west1 gs://founder-classifier-storage
Then upload your data to the bucket:
gsutil cp ../data/train.json gs://founder-classifier-storage
where mb stands for make bucket and cp stands for copy. Buckets names are unique so make sure to use something distinctive. You can also select a region for your bucket. Here europe-west-1 is the data center situated in St. Ghislain, Belgium, about 80 kms from where I live (Brussels).
In order to train a model on their AI Platform, Google recommends a file structure like this one:
In order for Google to run train your model on their server, they need to package your training application and store it in the bucket you just created. To package your code, just create a setup.py and __init__.py in the same place as in the file structure above. The former specifies the required dependencies needed for the training and the latter is just an empty file that indicates the folder that needs to be packaged. If you want to know more about how packaging works, there are a lot of material online.
You can take a look at the packages that are already installed on AI Platform here. If there are any packages that you need which are not pre-installed, you can specify them into your setup.py file.
Contrary to the entry point script used when training on Azure, we now store our code into two modules, model.py and task.py. The first one is where we store the functions needed by tf.estimator (model_fn, train_input_fn, eval_input_fn and serving_input_fn). The second one is where we store our main method.
To launch the training, you need to submit your training job from the terminal by specifying its name (‘founder_classifier_training_job’), the bucket where your package will be stored (staging-bucket), the name of the module to run (module-name), the path to the folder that you’ve packaged, the region and the AI Platform runtime version.
You can also choose the type of instance you want for training. Here we set scale-tier to basic-gpu which corresponds to a single NVIDIA Tesla K80 GPU and costs $0.40/hour.
We can also give the training job extra arguments by typing ‘--’ and then the extra args. These args will be collected by our task.py module. For example, data-folder is the place where we stored the data and also where we store our saved model at the end of the training. The rest are hyperpparameters for our model as well as the number of steps.
gcloud ai-platform jobs submit training founder_classifier_training_job --staging-bucket gs://founder-classifier-storage --module-name trainer.task --package-path ./trainer --runtime-version 1.14 --region europe-west1 --scale-tier basic-gpu -- --data-folder gs://founder-classifier-storage --batch-size 32 --learning-rate 0.001 --steps 1000
Once you have sent your training job, you can follow the state of the training by just going on Google Cloud Platform > AI Platform > Jobs > founder_classifier_training_job > View Logs.
At the end of the training, the outputs of the model are saved in the bucket as well as the tf.estimator.
Once your tf.estimator is trained and saved in your bucket, you can deploy it on an a remote instance. First, you need to create a model using:
gcloud ai-platform models create founder_classifier_model --description "Model that classifies a picture of Jeff Bezos, Bill Gates or Larry Page" --regions europe-west1
Then you can specify the directory to your saved model.
gcloud ai-platform versions create "version_1" --model founder_classifier_model --origin gs://founder-classifier-storage/model/1570723003 --runtime-version 1.13 --python-version 3.5
For now, Google deploys your model by default on a mls1-c1-m2, which is a single core CPU. The pricing can be seen here. Also, you are only charged when you make requests to your model.
In order to make requests to your model, and ask it which founder is on the picture, we need to send that picture as a JSON serialized file. So, I have built a small method that takes as input an image directory, turns it into grayscale, crops the face of the individual, resizes it and saves it into a JSON. Of course, if you want to make several predictions at a time you can store several pictures in your JSON.
Once your json image file is ready, you can send it to your deployed model by specifying the name and version of the model as well as the path to your json image:
gcloud ai-platform predict --model founder_classifier_model --version version_1 --json-instances image.json
If it’s the first time that you query your model or it hasn’t been queried recently, you might experience a warmup stage where it takes about 1 minute for your model to output a prediction. After this warmup stage, predictions are instantaneous. Here’s an example with a picture of Bill Gates giving a speech at the Sixth World Fund Conference in Lyon on October 10, 2019. The picture was taken after the training of our model. The output is class 0, which is Bill Gates’ label in our dataset.