How to Deploy Any Machine Learning Model to a Cloud in One Line of Code
I still remember how excited I was when I created a machine learning model that solved a real-world problem for the first time. However, what really discouraged me was that the engineering team spent over two months deploying it into the application. The factor that slowed down the model deployment was not the engineers’ skillset but the ability to collaborate and iterate as teams.
What if there is a tool that enables data scientists themselves to deploy a model as long as the model is made? Starting from this idea, my team and I started the journey of building Aibro.
Colab Demo Link & Our Website & Inference Documentation
About Aibro
Aibro is a serverless MLOps tool that helps data scientists train & deploy AI models on cloud platforms in 2 minutes. Meanwhile, Aibro adopted an exclusive cost-saving strategy built for machine learning that reduces cloud costs by 85%. I will post another blog later to explain the details about the cost-saving strategy.
If you are more interested in serverless training, here is my previous post about Train Neural Network on Cloud with One Line of Code
Deployment Workflow
In general, all you need to do is to upload a formatted machine learning model repository and select a hosting server from the Aibro marketplace. Aibro will then create an inference API whose computing power and cost are automatically scaled by workload.
Here is an example of the formatted ML model repository: https://github.com/AIpaca-Inc/AIbro_model_repo.
Step 1: Install aibro python library
pip install aibro
Step 2: Prepare a formatted model Repository
The repo should be structured in the following format:
repo
|__ predict.py
|__ model
|__ data
|__ requirement.txt
|__ other artifacts
predict.py
This is the Aibro entry point.
predict.py
should contain two methods:
load_model():
def load_model():
# Portuguese to English translator
translator = tf.saved_model.load('model')
return translator
This method should load and return your machine learning model from the model
folder. A transformer-based Portuguese to English translator is used in the example repo.
run():
def run(model):
fp = open("./data/data.json", "r")
data = json.load(fp)
sentence = data["data"]
result = {"data": model(sentence).numpy().decode("utf-8")}
return result
This method uses a model as the input, loads data from the “data” folder, predicts and returns the inference result.
Local test tip: predict.py
should be able to return an inference result by:
run(load_model())
model
and data
folders
There is no format restriction on the model
and data
folder as long as thepredict.py
is structured correctly.
requirement.txt
Before deploying the model, packages from requirement.txt are installed to set up the environment.
Other Artifacts
All other files/folders that could be used by predict.py
.
Step 3: Create an inference API
from aibro import Inference
api_url = Inference.deploy(
model_name = "my_fancy_transformer",
machine_id_config = "c5.large.od",
artifacts_path = "./aibro_repo",
)
Assume the formatted model repo is saved at path “./aibro_repo”, we can now use it to create an inference job. The model name should be unique for all current active inference jobs under your profile.
In this example, we deployed a public custom model from “./aibro_repo” called “my_fancy_transformer” on machine type “c5.large.od” and used an access token for authentication.
Once the deployment is finished, an API URL is returned with the syntax:
{client_id}: if your inference job is public, {client_id}
is filled by the string "public"
. Otherwise, {client_id}
would be filled by one of your clients’ ID.
Step 4: Complete the inference job
from aibro import Inference
Inference.complete(job_id)
Once the inference job is no longer used, to avoid unnecessary cost, please remember to shut down the API by Inference.complete()
.
The End
It’s as simple as that! If you found Aibro is helpful, please join our community from aipaca.ai. We will share the latest update of Aibro. We welcome any feedback from you!