Same Classifier, Different Cloud Platform — Part 0: Introduction

Aymeric de la Brousse
5 min readOct 17, 2019

--

This blog post is part of a series of articles about training and deploying the exact same model on three different cloud platforms: AWS, Azure and Google Cloud.

  1. Part 0: Introduction + Data Scraping.
  2. Part 1: Amazon Web Services (SageMaker)
  3. Part 2: Azure (Azure ML)
  4. Part 3: Google Cloud (AI Platform)

In this four part article, my goal is to show you how to train and deploy a model on AWS, Azure or Google Cloud. To do this I will create a simple image classifier using a custom dataset and the Tensorflow Estimator API. In this part, I will introduce the model and show you how I gathered the data. Of course, all the code is available on my Github.

The logo of each cloud platform was constructed using its founder’s faces

Founder Classifier

In order to show you how to train and deploy a tf.estimator model on these three platforms, I first thought about using an already available dataset such as fruits360. In most cases, you will actually learn to build models using well know and used datasets. For this project, I wanted to create my own in order to experiment with various data scraping tools available.

If I want to build my own dataset, I first need to think about what my model should be classifying. What better way to test out each of the cloud platforms then by classifying each company’s founder? So, I created a model which when given an image classifies the person on it as either Bill Gates, Jeff Bezos or Larry Page.

The classifier we used is one of the most famous convolutional network architecture. Two convolutional (and pooling) layers extract the features and two fully connected layers learn to classify the founder. This architecture, even though really simple, has proven itself to be very effective on the MNIST dataset. At the end, regardless of the model’s performance, the goal here is really to focus on how to train it and deploy it on the cloud. The architecture can be seen on the image below:

Famous architecture used for our classifier

Scrape It Till You Make It

The quickest and most straightforward way to build an image dataset is to use an image search engine such as Google Images or Bing Images (too bad Amazon does not have its own).

Google Images
To download images in bulk from Google Images you can install google_images_download using pip. All you have to do then is give it the necessary arguments such as the search terms (keywords), the number of images you need (limit), the output directory. You can even specify the type of image you are looking for and the format of the pictures.
Additionally, if you want to retrieve more than 100 images, you need to download chromedriver on your computer and specify the path to the exe file. Chromedriver is an essential tool in order to interact with Google Chrome from your python script and is a very useful tool for data scraping in general. You can download it from here.

Bing Images
To download images from Bing we use the Image Search API available here. In order to use it, you need to sign up for a free Azure account and retrieve the key to the API. We can only get the urls of 150 images at a time, so we use the offset argument to start downloading pictures in four batches starting from a different position. Similarly to google_images_download we can specify that we are looking for “faces”, which will curate our dataset form unsuitable pictures. Once we gathered the URLs, we can use requests to download the pictures.

Results
So between Google and Bing, which search engine gave us the most suitable images for our classifier? In our case, the best dataset should have a lot of pictures and these pictures should only include one face. Thus, we compared both search engine on these two metrics. To identify the number of faces in each picture we used OpenCV’s Haarcascade (frontal-face) and counted the number of pictures where more than one face was detected.

The results were plotted on the charts below:

As you can see, the Bing API seems to perform better for this type of data collection. Also, the number of pictures retrieved by Bing is the same for each of our founders, which means we don’t have to deal with data imbalance. Because I assume that there is a lot of overlap between the images retrieved from Bing and Google, I decide to only keep the Bing images in my dataset.

Image Preprocessing

In order to feed images to a CNN, we need each image to have the same dimensions. We used Haarcascade’s face detection algorithm to find a square bounding box around each detected face and to crop it. After that, all cropped faces were transformed to grayscale and resized to dimensions of 28x28. These are the same dimensions as the MNIST digits dataset on which our model performs really well.

The images have been turned to grayscale and cropped using Haarcascade’s frontal face detection
Some easily spotted false positives on the Jeff Bezos dataset

As you can see on the above sampled images of our Jeff Bezos dataset, there are some incorrect images here and there. Either we retrieved faces from the wrong person, or the errors came from the Haarcascade face detector. The latter could be improved by tweaking the arguments given to Haarcascade. Parameters like the minimum dimensions of a face or the scale at which the image is reduced when performing the algorithm can decrease the rate of false positives. But then again, the goal of this article isn’t to build a perfect classifier but rather to show the process of training it on the cloud. So a few mislabeled images is no big deal.

The last step of the data processing is to separate data into train and test set, and then store these images and labels in a json file.

Now that your dataset is ready you can continue to the next parts to learn how to train and deploy on the cloud.

--

--

Aymeric de la Brousse

Passionate about data and field hockey. Just graduated from a Master in Artificial Intelligence.