Serve Huggingface Sentiment Analysis Task Pipeline using MLflow Serving

Published in

InfinStor

4 min readMar 30, 2021

Huggingface (huggingface.co) offers a collection of pretrained models that are excellent for Natural Language Processing tasks. They also have the notion of ‘tasks’ which are prebuilt pipelines for common tasks such as sentiment analysis, NER (Named Entity Recognition), etc.

MLflow is a very popular open source Machine Learning Operations platform. It can be used for experiment tracking, model management, etc. The open source version of MLflow does not include capabilities such as authentication, authorization, etc. Commercially supported versions of MLflow such as Databricks, InfinStor, Azure Machine Learning, etc. add these essential Enterprise capabilities.

While there is some MLflow integration in Huggingface, there is hardly any support for serving huggingface pipelines using MLflow’s model serving capabilities. This writeup describes a small piece of opensource software that I wrote to serve huggingface pipelines using the MLflow model serving feature. The software can be found on github — the project is infinstor/huggingface-sentiment-analysis-to-mlflow

There are three steps in this project:

Run a script that logs the huggingface sentiment-analysis task as a model in MLflow
Serve the model locally, i.e. 127.0.0.1:5000
Use ‘curl’ to POST an input to the model and get an inference output

Run script that logs the huggingface sentiment-analysis task as a model in MLflow

$ python ./log_model.py

The above simple command logs the huggingface ‘sentiment-analysis’ task as a model in MLflow. Note that your python environment or conda environment should have pytorch, mlflow and transformers installed.

The above script has a wrapper class called SentimentAnalysis that wraps the huggingface sentiment analysis task pipeline

class SentimentAnalysis(mlflow.pyfunc.PythonModel):
    def __init__(self):
        from transformers import pipeline
        self.nlp = pipeline('sentiment-analysis')    def do_nlp_fnx(self, row):
        s = self.nlp(row['text'])[0]
        return [s['label'], s['score']]    def predict(self, context, model_input):
        print('model_input=' + str(model_input), flush=True)
        model_input[['label', 'score']] =    model_input.apply(self.do_nlp_fnx, axis=1, result_type='expand')
        return model_input

This class SentimentAnalysis initializes the sentiment-analysis pipeline and stores it in the instance variable self.nlp

The function do_nlp_fnx is going to be called for every row in the input pandas DataFrame. The format of the input is described later in this article.

The function predict is MLflow’s entry point to calling the model class when inferences are required. Input schema is defined by the ModelSignature class that we create in the following code segment:

inp = json.dumps([{'name': 'text', 'type': 'string'}])outp = json.dumps([{'name': 'text', 'type':'string'},
                   {'name': 'label', 'type':'string'},
                   {'name': 'score', 'type': 'double'}])signature = ModelSignature.from_dict({'inputs': inp, 'outputs': outp})

As you can see, the input is defined to be a pandas DataFrame with at least one column named text of type string.

The output format is a pandas DataFrame with three columns — text, which is the same text that is in input, a string column called label which can take the values NEGATIVE or POSITIVE and a column called score of type double.

Logging the model to the MLflow database is accomplished as follows:

with mlflow.start_run():
    mlflow.pyfunc.log_model('model',
                            loader_module=None,
                            data_path=None,
                            code_path=None,
                            conda_env=None,
                            python_model=SentimentAnalysis(),
                            artifacts=None,
                            registered_model_name=None,
                            signature=signature,
                            input_example=None,              await_registration_for=0)

You can now go look in the MLflow UI and determine run id of the above script run. A screencapture of the MLflow GUI with the run id is show below.

Serve the model locally:

We use standard MLflow commands to serve the model. Here’s an example of serving the model locally. Note the use of the run id that we determined from the UI.

$ mlflow models serve -m runs:/0-351e1a1e91334d9ca5cb704b0792d9b3/model --no-conda

The program prints out something like the following:

2021/03/30 14:42:53 INFO mlflow.models.cli: Selected backend for flavor 'python_function'
2021/03/30 14:43:09 INFO mlflow.pyfunc.backend: === Running command 'gunicorn --timeout=60 -b 127.0.0.1:5000 -w 1 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app'
[2021-03-30 14:43:09 -0700] [37322] [INFO] Starting gunicorn 20.0.4
[2021-03-30 14:43:09 -0700] [37322] [INFO] Listening at: http://127.0.0.1:5000 (37322)
[2021-03-30 14:43:09 -0700] [37322] [INFO] Using worker: sync
[2021-03-30 14:43:09 -0700] [37325] [INFO] Booting worker with pid: 37325

The IP address and port where the MLflow serving program is listening is displayed above as 127.0.0.1:5000. We will use that in the inference step next.

Use ‘curl’ to POST an input to the model and get an inference output

curl -X POST -H "Content-Type:application/json" --data '{"dataframe_split": {"columns":["text"],"data":[["This is meh weather"], ["This is great weather"]]}}' http://127.0.0.1:5000/invocations

As you can see, we asked for inference of two statements ‘This is meh weather’ and ‘This is great weather’

The program responds with:

{"predictions": [{"text": "This is meh weather", "label": "NEGATIVE", "score": 0.7533413171768188}, {"text": "This is great weather", "label": "POSITIVE", "score": 0.9998377561569214}]}

And there you have it folks — sentiment analysis using huggingface sentiment-analysis task pipeline stored as an MLflow Model and served on the local machine using ‘MLflow models serve’

If you want the capability to serve models from a VM in the cloud, our InfinStor Multicloud ML Platform enables you to do that. Visit https://infinstor.com for more information.

Serve Huggingface Sentiment Analysis Task Pipeline using MLflow Serving

Written by Jagane Sundar