Object Detection, Image Classification and Semantic Segmentation using AWS Sagemaker

9 min readAug 3, 2020

Today Artificial Intelligence and Machine Learning go hand in hand. With such an advanced move in the field of technology machines use their own senses to learn new patterns, solving problems, and so on. Image classification, Object detection, and Semantic segmentation are the branches of the same tree.

In this post, we will be learning about the different techniques such as Classification, Detection, and Segmentation and how we will be using AWS Sagemaker to train and deploy the respective machine learning models.

But before we move ahead let’s understand the difference between them. Lets first go over two basic definitions that will help you as you learn about each technique:

Labeling: Labeling refers to us taking a dataset of unlabeled images and adding meaningful, informative tags telling you what is in that image.
Bounding boxes: Bounding boxes are a type of labeling where a box can be created, edited, and deleted, and the box is assigned concepts.

What is Image Classification?

This technique is useful if you just need to identify general things like “Is this a beach or is it a pool?”It refers to a type of labeling where an image/video is assigned certain concepts, with the goal of answering the question, “What is in this image/video?”. Moreover, Image Classification is the task of extracting information classes from a multiband raster image. It is a supervised learning problem.

The Amazon SageMaker image classification algorithm is a supervised learning algorithm that supports multi-label classification. It takes an image as input and outputs one or more labels assigned to that image. It uses a convolutional neural network (ResNet).

What is Object Detection?

This technique is useful if you need to identify particular objects in a scene, like the cars parked on a street, versus the whole image. It is a computer vision technique for locating instances of objects in images or videos. Object detection algorithms typically leverage Machine learning or deep learning to produce meaningful results. It uses bounding boxes to tell us where each object is in an image/video. Face detection is one form of object detection.

The Amazon SageMaker Object Detection algorithm detects and classifies objects in images using a single deep neural network. It is a supervised learning algorithm. The object is categorized into one of the classes in a specified collection with a confidence score that it belongs to the class. Its location and scale in the image are indicated by a rectangular bounding box. It uses the Single Shot multi-box Detector (SSD) framework.

What is Semantic Segmentation?

Segmentation is a type of labeling where each pixel in an image is labeled with given concepts. Here, whole images are divided into pixel groupings which can then be labeled and classified, with the goal of simplifying an image or changing how an image is presented to the model, to make it easier to analyze. The divided parts of an image are called segments. It’s not a great idea to process the entire image at the same time as there will be regions in the image which do not contain any information. By dividing the image into segments, we can make use of the important segments for processing the image.

The Amazon SageMaker semantic segmentation algorithm provides a fine-grained, pixel-level approach to developing computer vision applications. It tags every pixel in an image with a class label from a predefined set of classes.

While human beings have always been able to do all the above in the blink of an eye, it’s taken many years of research, trial, and error to allow computers to emulate us. Nevertheless, today, thanks to computer vision, our devices are finally catching up to our needs. Let’s understand how to build and deploy these image processing models using AWS services.

For this project, we have used Amazon Sagemaker and S3 buckets

AWS Sagemaker

There are many cloud platforms available for data scientists for developing and deploying machine learning models. Amazon Sagemaker is one of the popular platforms to deploy machine learning models in a progressive environment. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high-quality models.

S3 — Simple Storage Service

S3 is a storage service offered by Amazon, it helps to store and retrieve any amount of data, at any time, from anywhere on the web. It stands for simple storage service and provides cloud storage for various types of web development applications. To upload your data (photos, videos, documents, etc.) to Amazon S3, you must first create an S3 bucket in one of the AWS Regions. You can then upload any number of objects to the bucket.

About the dataset, the data is taken from popular IIIT-Oxford Pets Dataset.

The architecture of the project

Steps:

Create a Notebook Instance on AWS Sagemaker

To begin, go to AWS Management Console and type Amazon Sagemaker in the find service box. Click on Amazon Sagemaker.

On the next page, in the left menu find and click on Notebook Instances.

Next, enter the name of the notebook instance and create a new role. We have already created a role so we are going to use the existing one. Keep the rest of the settings as default.

And Click on “Create Notebook Instance”.

On the next page, you will see the status is pending. It will take a few minutes.

After a few minutes, the status will be “InService”. Now, click on the “Open Jupyter” notebook.

Once the notebook is open, we can write the code in python, here we will be performing the next steps

2. Download the Data

3. Extract the Annotation

4. Visualize the data

5. Setting up Sagemaker

Here we set up the linkage and authentication to AWS services. There are three parts to this:

The roles used to give learning and hosting access to your data. This will automatically be obtained from the role used to start the notebook
The S3 bucket that you want to use for training and model data
The Amazon sagemaker image classification docker image which need not be changed

6. Upload the data to S3

Now finally we will upload the data to the s3 bucket. We have to make a bucket with AWS S3 to store the dataset.

Come back to AWS Management Console and Search S3

Click on Create Bucket button on the AWS console page

Further, give the bucket a name, we gave “petsdataset” in our case. Click next and then create a bucket.

Now we will push the data from the temporary storage to the S3 bucket that we created above.

7. Sagemaker Estimator

Now we will set the sagemaker estimator, set a training instance, runtime, and the path where we will save our model.

In our case, we have saved the model in the s3 bucket under the output folder.

8. Hyperparameter tuning

We will perform some hyperparameter tuning for the model

num_layers: The number of layers (depth) for the network. We use 18 in these samples but other values such as 50, 152 can be used.
num_classes: This is the number of output classes for the new dataset. pets data was trained with 37 output classes.
epochs: Number of training epochs
learning_rate: Learning rate for training
num_training_samples: This is the total number of training samples.
mini_batch_size: The number of training samples used for each mini-batch. In distributed training, the number of training samples used per batch will be N * mini_batch_size where N is the number of hosts on which training is run

9. Data Channels

10. Training the model

You can also view information about and the status of a training job using the AWS SageMaker console. Just click on the “Jobs” tab.

11. Deploying model

12. Predicting the results

When we’re done with the endpoint, we can just delete it and the backing instances will be released.

We have used the same dataset for Object Detection and Semantic Segmentation. The first few steps for both Object Detection as well as Semantic Segmentation is quite similar.

We will be Downloading and Extracting the data, and creating notebook instances as done above.
While preparing data for Sagemaker, we will perform the following steps: