MLOps

How to deploy a Machine Learning model on the Cloud

A guide on how to perform a Lightweight deployment

Published in

Plain Simple Software

5 min readJan 8, 2022

Deploying a Machine Learning model is considered the baptism of fire for many developers. While building a rudimental Machine Learning mode may only require a few lines of code, even the lightest deployment will need you to interact with complex tools and deploy your code on your Virtual Private Cloud, hopefully creating an endpoint so that it can be accessed by end-user.

This makes it much more complex than working with code only, because the tools that you will need to use have to do with software architecture, and nowadays are all no-code interfaces. If you do not yet know anything about software architecture, this publication may be a good starting point, as well as this article.

In how many ways can you deploy a trained model?

Once a model has been trained, it can be deployed on the Cloud, and its processing costs for prediction are nothing compared to the training costs. Depending on the size of the model, its complexity, and its capacity of scaling, there might be the need of implementing different solutions. In this article, I will focus on AWS technology, but because I am explaining how to deploy a model using Virtual Machines, the principles will also apply to other Clouds, like Azure or Google Cloud Platform.

If you are familiar with serverless lambda, you will know that there is a comfortable way to run a snippet of code on the Cloud, only being charged for the time it requires to run. Unfortunately, as of now, there is not one easy implementation of lambda that works with Machine Learning.

Using Sagemaker (AWS) or any other MLOps tool

Each Cloud offers a tool that helps you manage your entire Machine Learning pipeline (MLOps). Unfortunately, these tools are usually built for heavy and expensive models and require a high level of expertise to be run.

The closest solution AWS has been able to provide us is called serverless inference and is a lambda variation specifically created for Machine Learning. However, its deployment is much more complex than what we could ever create in a notebook, as it requires Sagemaker. Essentially, you would need to:

Create a Sagemaker instance
Send your model to S3
Load back your model and create an artifact with a Sagemaker Class
Send the containerized artifact to ECR
Create a serverless endpoint
Activate the serverless inference

Not only the entire Sagemaker procedure requires much more work than what it would take to create a simple lambda, but this feature is also experimental.

Using Virtual Machine

What I found to be the best alternative to the huge complexity of Sagemaker, is using a Virtual Machine. There are plenty of Operating Systems you can choose from, I prefer Linux because is lighter and less expensive, but Microsoft server is also a valid option:

Compared to Sagemaker, there are pros and cons:

First, it is appropriate for small models, the reason being is that managing your own virtual machine does not allow you to scale your computing power needs. If all of a sudden, the machine needs to be accessed by tens of thousands of users at the same time, it might crash, unless you have already planned a way to scale it. With Sagemaker, this would not be an issue, because the underlying infrastructure is managed by AWS.

The code, on the other hand, is much simpler than the one you would need to use on Sagemaker. You can decide to run your model with simple python files and store the artifacts on different services like S3, or even decide to keep running on your machine, making it available for whoever wishes to access it. Essentially, the code you would use on a python notebook would be the same as the one you would need to run on the Virtual Machine.

The cost is also minimized using a Virtual Machine. You can choose to run it on the lightest instance types, a t2.medium, for example. You only have to be careful that the model does not crash your Virtual Machine, this usually happens when the RAM limit is reached.

However, you will need to manage and schedule your entire pipeline. This means that you will likely need to schedule your VM, run the python processes that activate the Machine Learning pipeline, and make your models available to external users.

Deploying a model on EC2

Below, you can find the architecture of a lightweight model I made for a client and all the other tools connected to the EC2 to implement a working pipeline:

The architecture of a lightweight model deployed on an EC2 instance

The client wanted to run this machine once a week, so that a new model would be created every single time, making predictions on unlabeled data. The architecture works as follows:

The Big Data necessary to create a model is passed through an API to the EC2
The EC2 runs a python file that creates an artifact trained on the data
The artifact is then used to make predictions on unlabeled data
Once the prediction has been made, the data is stored on a DynamoDB storage unit
Through Gateway API and a lambda function, the data can be accessed by the client

Once this instance is active, it costs approximately 0.05 USD per hour, and it only requires a few minutes to complete its job. As a lightweight solution, this is much better than using any MLOps pre-built pipeline. Make sure to follow Plain Simple Software for more software articles!