Machine Learning with SageMaker
“Creating and deploying an ML model is a cumbersome job.” — Anuran, my friend who recently entered into Data Science field, said. Being a cloud developer on the AWS platform I took the onus upon myself to help him out. This article is all about how to create and deploy ML Models using SageMaker.
Before we begin, a little intro to what SageMaker is. SageMaker is a managed ML platform that allows you to build and deploy models easily. Everything is taken care of by AWS. It gives a nice GUI to manage the workflow end-to-end, from Training, Model creation & deployment. If plug and play is your way of doing things then Sagemaker does support that as well. You can run these components independently as well.
We will start exploring it in the console and each term will be explained as and when necessary. Below is the console for SageMaker. Click on Get Started.
Hosted Jupyter Notebooks
This takes us to the Notebook Instance Page. Click on Create notebook instance. The below screen will get displayed. Before we start this few words about what do we mean by notebook instance. Basically what we are doing here is launching an EC2(Server) which will have the Jupyter Notebook application and its dependencies(Python Anaconda etc) installed. Let's enter the details.
Notebook instance name: User friendly name to identify your notebook. Notebook instance type: This determines RAM, CPU and server specs.
Elastic Inference: This is an extra GPU instance which can test and evaluate inference performance of your model while creating it in notebook. We are not selecting one here.
Life cycle configuration allows us to run certain scripts when creating or starting our notebook instance. This we will leave blank.
In the Permission tab, we will select to create a new role, which allows notebook application on EC2 to access S3 buckets where it will get training test and validation data. Also, this bucket will be used to store Models.
These settings are all optional and we will go with defaults. Click on Create notebook instance.
This will start creating the Notebook application. Once done we will be able to see the below screen.
So SageMaker gave us a fully configured/customized Jupyter Notebook
Here we see options. “Open Jupyter | Open JupyterLab”. JupyterLab is the next-generation web-based user interface for notebook applications.
We will use Jupyter Notebook. On the console, we can see the Environment under the Conda tab. It's the place where we can see installed as well available packages.
Under the SageMaker tab, we have sample notebooks provided by AWS. For the demo, we will use these.
This also gives us the option to see the terminal. Let login to it and check.
Output from the above terminal snap suggests that the Notebook application is running on EC2 and the user is ec2-user. Also, we can see all the files and sample algorithm folders.
Let select one of the algorithms. I chose linear_learner_mnist.
Distributed Training jobs
In the bucket variable, I mentioned one of my buckets.
All the above codes are using boto3 which AWS client for python. Here we divided the data into training, validation, and test set. And same was stored in my S3 bucket. These will be downloaded to the training instances. If you don’t want to incur costs associated with this download you can store data in recordio-protobuf format and it will be streamed from S3. This is called pipe mode.
AWS stores the pre-built popular algorithms such as linear regression, logistic regression, principal component analysis, text classification, and object detection as docker images in ECR. The methodology and engineering of these algorithms are different than open-source versions. AWS team has tuned algorithms to take advantage of distributed training and GPU acceleration. Due to the tuning, these algorithms are ~10 times better in terms of performance than the base ones. AWS also gives us the option to bring our own algorithm. We need to create a container image of the algorithm and push it ECR. For the list visit here.
Here we are finding the container which contains the code for liner-learner. The output of this is something like
Once we have the container it is passed as an argument to SageMaker’s estimator method. This will create a Job which in turn will create a model and store it in S3.
The above image is of the Job. We click on it and got to see the CloudWatch logs to see what's happening behind the scene.
From the console, we see that amazon launched ECS cluster and launched containers on it to create the model.
In a different algorithm xgboost_abalone actually, it created a Hadoop cluster and used YARN to distribute model creation tasks among multiple instances.
Once the job finishes, we can see the model generated and stored in the S3 bucket. The same can be seen in the SageMaker console. See below.
If we want we can download/export this model and use it outside AWS. Or alternatively, we can bring a model built somewhere else and upload it in the S3 and create a new endpoint from that model.
Now we have the model we can deploy it. Below is the command for deploying.
Once we have our model ready we can deploy it. To do so we can choose AWS ECR and use the images created, this gives us more control of our deployment. We can directly go with AWS-managed Endpoint.
The same Endpoint configuration can be created from the console as well.
The same endpoint can also be created in the console.
Once the command is successful, AWS will create an Endpoint configuration and one Endpoint from the same configuration.
Lastly, we verified the endpoint by calling it with test and batch data. We saw how it created four instances of the app(see logs it has the number of workers as 4) and returned the results. The whole process(technical) is explained by the below diagram.
Hope this helped.