AutoML using Amazon SageMaker Autopilot | Multiclass Classification

Vaibhav Malpani
Vaibhav Malpani’s Blog
5 min readJul 14, 2020

AWS has always aimed at “Putting machine learning in the hands of every developer”. AWS launched Amazon Sagemaker in 2017 that helps developers build Machine Learning models with No prior knowledge. At re:invent 2019, AWS has launched a huge new feature to SageMaker called “SageMaker Studio”. It is an IDE for all your Machine Learning needs.

Introduction to SageMaker Studio:

It helps you Preprocess the data, Train your Machine Learning models using the best possible Algorithm by understanding your data, Deploy the model, Monitor the deployed model, and Debug the model.

SageMaker Autopilot

SageMaker Autopilot automatically inspects raw data, applies feature processors, picks the best set of algorithms, trains and tunes multiple models, tracks their performance, and then ranks the models based on performance, all with just a few clicks.

SageMaker Autopilot can help you build a Classification and Regression Model on a Tabular data with no expertise about Machine Learning. It is quick to build your model and you can build it in a day compared to a Data Scientist who can take about a week or two.

Problem Statement:

We have a dataset of news title and description which are classified into 4 categories consisting of 1-World, 2-Sports, 3-Business, 4-Sci/Tech. This will be a multiclass classification problem and we will try to solve it using SageMaker Autopilot. (I will be using a subset (20% of the complete dataset) to reduces the training time)

Caution ⚠ : Autopilot is not included in free tier, so it will cost you for trying it out.

Steps to build using Autopilot

  1. Upload data to an S3 bucket
  2. Create an Experiment in SageMaker Studio
  • Preprocess Data
  • Create a Model
  • Deploy Model

3. Test Deployed Model.

Upload data to S3

This is a very simple step. We will create a new bucket in S3 and make 2 folders as ‘input’ and ‘output’. We will then upload the training data in the input folder.

Create an Experiment in SageMaker Studio

If you have not created your SageMaker Studio, you can go here and create it. After the studio is created, click on ‘open studio’.

On the left side, select the SageMaker Experiment list and select the create Experiment.

After that, you will get a window in the middle of your screen. Enter your experiment name and enter the path to your dataset that is stored in S3.

Once you have selected your dataset. Enter the attribute name for which you want to do the predictions. In our case, it is ‘class_index’. We will also need to give a path to store the output data. We will give our output folder as the output data location.

In our case, we know that it is going to be a Multiclass classification, but let’s just select Auto, and let’s see if it is able to understand that our problem type.

After this, just hit ‘Create Experiment’.

We can now see that this pipeline has been created. Wait for it to complete. Once the Analyzing Data step is complete, you will see two new options at the top right corner: ‘Open candidate generation notebook’ and ‘Open data exploration notebook’

Candidate Generation Notebook: This will consist of a notebook that will have generated code for Data Transformation, SageMaker training job pipelines, Auto Hyperparameter tuning, and also the code to deploy the best model. (Do open this file, It’s really interesting)

Data exploration notebook: Once you open this, you can see that it has all the details about our dataset. It has also understood that our dataset was for Multiclass classification and had 4 classes — 1, 2, 3, 4.

Once the pipeline is completed, you will see details about the jobs that have run and it’s accuracy shown as ‘Objective’.

Once you see this screen, select the best-performed job and click on the ‘Deploy model’. Enter an endpoint name and select an appropriate instance type depending on your expected usage. You can select to save the request and response data to the endpoint. After this, hit the “Deploy model’ button.

On the left side, click on the SageMaker endpoint list as shown in the image below. First, you will see the status as ‘creating’. Wait for it to turn into ‘InService’. When it InService that means your model has been deployed and ready to be tested. You can deploy multiple models on multiple endpoints as shown below.

Test Deployed Model:

You can test your code on local using the below code.

import boto3,sysendpoint_name = ""SageMaker_client = boto3.Session().client('runtime.sagemaker')response = SageMaker_client.invoke_endpoint(EndpointName=endpoint_name, ContentType='text/csv', Body="Flintoff, Collingwood lead England to win, Andrew Flintoff hit a typically brutal 99 and took the final wicket Friday as England cruised to a comfortable series-clinching victory over India in the second one-day cricket international at the Oval.")
print(response['Body'])

And you should get the output as 2 for the above call as the news is related to Sports.

What did we learn?

  • Building Machine Learning model on tabular data with no prior knowledge using SageMaker Autopilot.
  • Deploying models in a highly scalable environment.

If you liked this post, please 👏👏for it on left, follow me if you want to read more such posts!

Twitter: https://twitter.com/IVaibhavMalpani
LinkedIn: https://www.linkedin.com/in/ivaibhavmalpani/

Twitter: https://twitter.com/IVaibhavMalpani
LinkedIn: https://www.linkedin.com/in/ivaibhavmalpani/

--

--

Vaibhav Malpani
Vaibhav Malpani’s Blog

Google Developer Expert for Google Cloud. Python Developer. Cloud Evangelist.