Productionize ML/DL Models as Easy as a Pie: BentoML

Published in

Whispering Wasps

7 min readDec 12, 2021

ML/DL model is trained and ready, but is it really ready to be moved to production?

Data Science Teams and the Data Scientists in an organisation come up with ML/DL models needed for their use cases, building either custom or fine-tuned pre-trained models, with a reasonably quick turnaround.

However it takes almost forever (in most cases, the time taken for productionizing the model is more than that taken for the development of the model) for the models to be shipped to production — I hear this often from my peers in the ML community, and it has mostly been one of my personal experiences as well.

It was then that I bumped into BentoML. My primary aim was to discover a way of deploying the ML/DL model that our team had developed, however I stumbled upon this open-source treasure trove that offered me more than I anticipated and desired for.

When we think about taking our ML/DL model to production, we usually chose either of the below ways to go about: Model Deployment and/or Model Serving.

Model Deployment vs Model Serving:

Model Deployment refers to the process of taking a trained ML/DL model and making its predictions available to users or other systems.

Model Serving also does something similar, but allows you to host machine learning models from Model Registry as REST endpoints that are updated automatically based on the availability of model versions and their stages. It usually means that a model is deployed as a web service, and other services can communicate with it, ask for predictions and use the results to make their own decisions.

All in all, there is not much of a difference between the two, except that in model serving, we have a server that is specialised for predicting using models. The idea is that it can be used to serve multiple endpoints in one model server.

Also, when there are growing number of models and versions of models, model deployment becomes complex and it becomes difficult to manage all the models and versions, where as model serving eases things for us.

Role of BentoML:

So, what exactly is the role of BentoML here, you may ask.

To be simple and precise, BentoML makes shipping ML/DL models to production faster. It makes it easy to test and deploy models extending DevOps practices to the AI world, effectively bridging the gap between Data Science and DevOps.

This sounds great, but is my framework covered? The good news is, Yes. Here are a few frameworks that BentoML supports out-of-the-box.

ML Frameworks supported by BentoML:

Scikit-Learn - Docs | Examples
PyTorch - Docs | Examples
Tensorflow 2 - Docs | Examples
Tensorflow Keras - Docs | Examples
XGBoost - Docs | Examples
LightGBM - Docs | Examples
FastText - Docs | Examples
FastAI - Docs | Examples
H2O - Docs | Examples
ONNX - Docs | Examples
Spacy - Docs | Examples
Statsmodels - Docs | Examples
CoreML - Docs
Transformers - Docs
Gluon - Docs
Detectron - Docs
PaddlePaddle - Docs | Example
EvalML - Docs
EasyOCR -Docs
ONNX-MLIR - Docs

Core Elements:

Now that we have got an idea what BentoML does, I will try to give a high level understanding of the core elements involved in the workflow of BentoML, with examples, wherever applicable, which will help us to get up and running with model serving.

Element 1: Model Store

BentoML provides easy to use stores for creating and managing our ML/DL models.

BentoML Local Stores — can be used to save models using the ML framework specific APIs. (For example, to save a sk-learn model to the bentoML local store, it can be done by importing the bentoML library and with a quick sklearn.save())

Imported from the framework specific Model Registry— Models can also be imported from model registries.(For example, a model can be imported from the MLFlow Model Registry.)

Element 2: Bento Artifact

BentoML’s model artifact API allows us to specify the trained models required by a Bento Service.

import bentoml
from bentoml.adapters import DataframeInput
from bentoml.frameworks.sklearn import SklearnModel
from bentoml.frameworks.xgboost import _XgBoostModel@bentoml.env(infer_pip_packages=True)
@bentoml.artifacts([
    SklearnModel("model_a"),
    _XgBoostModel("model_b")
])

BentoML has framework specific artifacts, and the model artifact handles serialization and deserialization automatically according to the artifact chosen.

We also have an option of building our own custom bentoML artifact for ML frameworks not supported by bentoML out-of-the-box.

Element 3: Bento Service

Once we figure out the way to create and manage our model, we now move to defining the model serving logic, using Bento Service.

Using Bento Service, we define the properties specific to the inference API, and pack the model with bentoML, with some of the properties being:

import bentoml
from bentoml.adapters import DataframeInput
from bentoml.frameworks.sklearn import SklearnModel
from bentoml.frameworks.xgboost import _XgBoostModel@bentoml.env(infer_pip_packages=True)
@bentoml.artifacts([
    SklearnModel("model_a"),
    _XgBoostModel("model_b")
])
class MyPredictionService(bentoml.BentoService):@bentoml.api(input=DataframeInput(), batch=True)
    def predict(self, df):
        # assume the output of model_a will be the input of model_b in this example:
        df = self.artifacts.model_a.predict(df)return self.artifacts.model_b.predict(df)

Handling the data type — input and output — expected by the endpoint(using the input and output bentoML adapters)
Handling the dependencies, if any(using the infer_pip_packages)
Handling how the model handles the input data and how the output is generated,(with pre-processing and post-processing scripts), and more.

Element 4: Web UI Customisation

With @web_static_content decorator, we can add our frontend project directory to our BentoService class.

@env(auto_pip_dependencies=True)
@artifacts([SklearnModel('model')])
@web_static_content('./static')
class IrisClassifier(BentoService):

    @api(input=DataframeInput(), batch=True)
    def predict(self, df):
        return self.artifacts.model.predict(df)

BentoML will automatically bundle all the web UI files for us, and host them when starting the API server.

An example Web UI customization using BentoML

Element 5: Save as Bento

After writing our model training/evaluation code and BentoService definition, we need to save it as a Bento, which can then be used to serve our model. The steps involved in saving as a bento are as below:

Model Training
Creating BentoService instance
Packing the trained model artifacts with pack(
Saving to a Bento with save()

from sklearn import svm
from sklearn import datasets# 1. Model training
clf = svm.SVC(gamma='scale')
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf.fit(X, y)# 2. Create BentoService instance
iris_classifier_service = IrisClassifier()# 3. Pack trained model artifacts
iris_classifier_service.pack("model", clf)# 4. Save
saved_path = iris_classifier_service.save()

Element 6: Model Serving

Once a BentoService is saved as a Bento, it is ready to be deployed for many different types of serving workloads.

There are 3 main types of model serving that BentoML provides-

Online Serving — clients access predictions via API endpoints in near real-time, with a high-performance API server that can load a saved Bento and expose a REST API for client access
Offline Batch Serving — pre-compute predictions and save results in a storage system, with tools to load the Bento and feed it with a batch of inputs for offline inference
Edge Serving — distribute the model and run it on mobile or IoT devices, e.g. model serving in a router or a Raspberry Pi.

Element 7: Production-ready API Server Dockerization

With Docker:

Once we are done with the bentoML workflow and ready with a bento, a docker image containing our model API server can be easily created with BentoML.

While saving a Bento, a Dockerfile is also generated by BentoML in the same directory.

For those who are new to Docker, Dockerfile is a text document that contains all the commands required for creating a docker image, and docker build command builds an image from a Dockerfile.

# Find the local path of the latest version IrisClassifier saved bundle
saved_path=$(bentoml get IrisClassifier:latest --print-location --quiet)# Build docker image using saved_path directory as the build context, replace the
# {username} below to your docker hub account name
docker build -t {username}/iris_classifier_bento_service $saved_path# Run a container with the docker image built and expose port 5000
docker run -p 5000:5000 {username}/iris_classifier_bento_service# Push the docker image to docker hub for deployment
docker push {username}/iris_classifier_bento_service

On Kubernetes:

Once a docker image is built, it has to be pushed to a docker hub registry, for the image to be specifies into the manifest file, for the model to be deployed to Kubernetes cluster.

The following is an example YAML file for specifying the resources required to run and expose a BentoML model server in a Kubernetes cluster. Replace {docker_username} with your Docker Hub username and save it to iris-classifier.yaml

#iris-classifier.yamlapiVersion: v1
kind: Service
metadata:
    labels:
        app: iris-classifier
    name: iris-classifier
spec:
    ports:
    - name: predict
      port: 5000
      targetPort: 5000
    selector:
      app: iris-classifier
    type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
    labels:
        app: iris-classifier
    name: iris-classifier
spec:
    selector:
        matchLabels:
            app: iris-classifier
    template:
        metadata:
            labels:
                app: iris-classifier
        spec:
            containers:
            - image: {docker_username}/iris-classifier
              imagePullPolicy: IfNotPresent
              name: iris-classifier
              ports:
              - containerPort: 5000

In addition to the above, BentoML provides a fully automated deployment management for AWS EC2, AWS Lambda, AWS SageMaker, and Azure Functions. It provides the basic model deployment functionalities with minimum setup. Here are the detailed guides for each platform:

BentoML also makes it very easy to deploy our models on any cloud platforms or your in-house custom infrastructure. Here are deployment guides for popular cloud services and open source platforms:

This brings my attempt to briefly explain all about BentoML and the bentoML workflow, to an end.

Some examples of BentoML model serving can be found here: in the bentoml/gallery repository, grouped by the main ML training framework used in the project, which serves as a complete guide to BentoML model serving.

So the next time you are looking to deploy or serve your ML/DL model, look no further than BentoML!

Do let me know your feedback and comments, which are always welcome :)

Productionize ML/DL Models as Easy as a Pie: BentoML

Role of BentoML:

ML Frameworks supported by BentoML:

Core Elements:

Written by Vishnu Priya Vangipuram