Deploy Machine Learning Model using Amazon SageMaker (Part 3)
In this article, we will continue our series on deploying a machine learning model using Amazon SageMaker.
This is part 3 of a series:
Set up SageMaker — Part 1
Data Preprocessing — Part 2
Train the Model — Part 3 (You are here)
So far, we have set up an Amazon SageMaker instance, preprocessed our data, and uploaded the data into Amazon S3. Now we are going to train our model.
Choose the Training Algorithm
Ideally, we need to evaluate different models in order to find the most suitable models for our data. We can let SageMaker Autopilot find an appropriate model for this tabular dataset.
Amazon SageMaker Autopilot automates a machine learning solution by automatically building, training, and tuning the best machine learning models based on the dataset.
For simplicity, we won’t be using the Amazon SageMaker Autopilot but we will use the SageMaker XGBoost Algorithm built-in algorithm.
Run a model training job
The first thing we have to do is to run a model training job.
The Amazon SageMaker Python SDK provides framework estimators and generic estimators to train your model while orchestrating the machine learning (ML) lifecycle accessing the SageMaker features for training and the AWS infrastructures, such as Amazon Elastic Container Registry (Amazon ECR), Amazon Elastic Compute Cloud (Amazon EC2), Amazon Simple Storage Service (Amazon S3).
We start by importing the Amazon SageMaker Python SDK and use it to retrieve the basic information from the current SageMaker session.
This returns 2 information. The first one is the AWS region - the current AWS region where the SageMaker notebook instance is running. Currently, you can see that it’s running in EU East 3 Region.
The second output is the IAM role used by the notebook instance we created earlier.
We can check the SageMaker Python SDK version by running the following command.
The current version is 2.86.2. If the SageMaker’s version was less than 2.20, then you will have to update it by using
Next, we are going to create an XGBoost estimator using the sagemaker.estimator.Estimator
class.
In the code above, the XGBoost estimator is named xgboost_model. Now, to construct the SageMaker estimator, we need to specify some parameters.
image_uri
specifies the training container image. In this example, the Sagemaker XGboost training container URI is specified by using the command sagemaker.image_uris.retrieve
.
role
is the AWS identity and access management role that SageMaker uses to perform tasks on your behalf. Some of these tasks include reading training results, calling model artifacts from Amazon s3, and writing training results to Amazon s3.
instance_count
and instance_type
specify the type and number of Amazon EC2 ML compute instances to use for model training. For this training exercise, we use an ml.m5.xlarge
instance. (You can use anything you like).
volume_size
is the size (in GB) of the EBS storage volume to attach to the training instance, and this must be large enough to store training data if you use file mode. File mode is on by default, so we don’t have to worry about that.
output_path
is the path to the s3 bucket where SageMaker stores the model artifact and training results.
sagemaker_session
is the session object that manages interactions with SageMaker API operations, and AWS services that the training job uses.
rules
specify a list of SageMaker debugger built-in rules. In this example, we have used the create XGboost report which creates an XGboost report that provides insights into the training progress and results. We will check this report out later.
Setting the hyperparameters
Next is to set the hyperparameters for the XGboost algorithm by calling the set_hyperparameters
method of the estimator and there is a complete list of XGboost hyperparameters.
You can also tune hyperparameters using the SageMaker hyperparameter optimization features.
Configure data input flow for training
Next, we use the TrainingInput
class to configure a data input flow for training.
The code above shows how to configure TrainingInput
objects to use the training and validation datasets you uploaded to Amazon S3 in the Split the Dataset into Train, Validation, and Test Datasets section in Part 2.
Train the model
Now that we have configured the training job, we are finally going to train the model.
Here, we call the estimator’s fit
method with the training and validation datasets. By setting wait=True
, the fit
method displays progress logs and waits until training is complete.
The output of the code above shows that it has started the training job which can take a while.
And once the training is done, you can see that the output is completed. Training job completed, Training seconds and Billable seconds is given.
Download an XGBoost Training Report
After the training job has done, you can download an XGBoost training report and a profiling report generated by SageMaker Debugger.
The XGBoost training report offers you insights into the training progress and results, such as the loss function with respect to iteration, feature importance, confusion matrix, accuracy curves, and other statistical results of training.
We run the following code to specify the s3 bucket URI where the debugger training reports are generated and check if the reports exist.
Next, we will have to download the debugger XGBoost training and profiling reports to the current workspace.
Next, we will have to run the following IPython script to get the file link of the XGBoost training report.
Now the following script returns the filing of the debugger profiling report that shows summaries and details of the ec2 instance, resource utilization, system bottleneck detection results, and also the python operation profiling results.
Once you run this block of code, you can just click below to view the profile report. The report shows a summary and statistics of the report. Here, you have the rules summary where you can see the GPU memory, increase the CPU bottleneck, the max initialization time, the low GPA utilization load, balancing, etc. You can also see the start time, end time, and job duration.
View the location of the model artifact
Now that we have a trained XGBoost model, SageMaker stores the model artifact in your S3 bucket. To find the location of the model artifact, run the following code to print the model_data attribute of the xgb_model estimator.
In the next part, we will Deploy the Model to Amazon EC2.
Here is a link to the notebook on GitHub.