TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

End-to-End NLP Project with Hugging Face, FastAPI, and Docker

10 min readMar 7, 2024

--

Photo by Joshua Hoehne on Unsplash

Many AI projects fail, according to various reports (eg. Hardvard Business Review). I speculate that part of the barrier to AI project success is the technical step from having built a model to making it widely available for others in your organization.

So how do you make your model easily available for consumption? One way is to wrap it in an API and containerize it so that your model can be exposed on any server with Docker installed. And that’s exactly what we’ll do in this tutorial.

We will take a sentiment analysis model from Hugging Face (an arbitrary choice just to have a model that’s easy to show as an example), write an API endpoint that exposes the model using FastAPI, and then we’ll containerize our sentiment analysis app with Docker. I’ll provide code examples and explanations all the way.

The tutorial code has been tested on Linux, and should work on Windows too.

Step 1: Create Hugging Face model pipeline

We will use the Pipeline class from Hugging Face’s transformers library. See Hugging Face’s tutorial for an introduction to the Pipeline if you’re unfamiliar with it.

The pipeline makes it very easy to use models such as sentiment models. Check out Hugging Face’s sentiment analysis tutorial for a thorough introduction to the concept.

You can instantiate the pipe with several different constructor arguments. One way is to pass in a type of task:

from transformers import pipeline

pipe = pipeline(task="sentiment-analysis")

This will use Hugging Face’s default model for the provided task.

Another way is to pass the model argument specifying which model you want to use. You don’t have to provide a task if the model on Hugging Face hub already defines it.

from transformers import pipeline

pipe = pipeline(model="roberta-large-mnli")

To use the pipe, simply call it directly by passing in the string you want to analyze:

from transformers import pipeline

pipe = pipeline("text-classification")
pipe("This restaurant is awesome")

It’s also possible to pass in a list:

from transformers import pipeline

pipe = pipeline("text-classification")
pipe(["This restaurant is awesome", "This restaurant is awful"])

Now with the basic model code in place, let’s continue to writing an API endpoint.

Step 2: Write API endpoint for Hugging Face model with FastAPI

We will use the Python library FastAPI to write an API endpoint that exposes the model we implemented in the first step.

Here’s the full API code. It’s a pretty standard way of writing an API in FastAPI. Below I’ll go through it piece wise.

from fastapi import FastAPI
from pydantic import BaseModel
from transformers import pipeline

# For other task options, see https://huggingface.co/docs/transformers/en/main_classes/pipelines
sentiment_pipeline = pipeline("sentiment-analysis")

app = FastAPI()

data = ["I love you", "I hate you"]
sentiment_pipeline(data)


class RequestModel(BaseModel):
input_string: str

@app.post("/analyze")
def your_function(request: RequestModel):

input_string = request.input_string

sentiment = sentiment_pipeline(input_string)
return {"result":
{"sentiment" : sentiment[0]["label"],
"score" : sentiment[0]["score"]}
}

First, you’ll see the pipeline that we introduced in the previous section.

Then, we instantiate an object called app. This is a must as it is the main entry point to using FastAPI.

In the next code block above, you’ll notice a class called RequestModel. It inherits from pydantic’s BaseModel class and in Pydantic parlay it is thus a so-called model (one might say a “data model”) that defines a schema for the requests, i.e. what fields a request must contain. Pydantic also enforces this data schema during runtime. This means that an error will be thrown if a request does not contain the arguments and data types specified in the data model.

Then, we have a function called your_function() which is decorated with the @app.post() decorator. The decorator tells FastAPI that the endpoint accepts post requests. In APIs, requests are made to ask the API to perform a certain task — in this case to analyze a piece of text. Post requests are a specific type of requests that submits data to be processed to a specified resource. Other common request types are put and get requests. However, they would be unconventional to use for our purpose.

It doesn’t matter how you name your_function(). It could be called something completely different. What matters is the path you specify in the @app.post() decorator, because that’s the name of the endpoint that users of your API send requests to. In the code above, we’ve called it “/analyze”. This means that when a user wants to send a request to your endpoint, they’ll send it to a URL like this f”http://{address}:{port}/analyze . The address could be localhost . Later, we’ll take a closer look at what the port part means.

Inside your_function() we implement the logic that handles the request and calls the Hugging Face pipe.

The return value of your_function()is a dictionary with a result key whose value is a dictionary with two keys: “sentiment” and “score”. You don’t have to define your return value like this. You could in theory return a string or a list instead, but I prefer to return a dictionary because it can be easily converted to JSON.

When you’ve coded your API and you want to run your app, open a terminal and run a command with this structure:

uvicorn <file name>:app -p <host port>:<app port>

The -p flag tells uvicorn what ports to publish and what port on the host system (eg your local workstation or a server VM) to map to what port in the app. An illustrative example: if you use 8000:9000, the API application is serving on port 9000, and that service is being exposed through port 8000 on your host machine. API requests should be made to the host system’s port. So in this case, assuming your running on your local machine, the URL would be http://localhost:8000/analyze .

Step 3: Build Docker image and run container with your Hugging Face model and FastAPI endpoint

Now let’s consider how to “containerize” your app.

Containerization means that you put your app, e.g. your FastAPI app, in a container, eg a Docker container. A Docker container is an instance of a Docker image. A container has its own operating system (e.g. Ubuntu) and contains all the dependencies of your application. For instance, you can install Python in the container along with all the packages that your app requires. You can run as many Docker containers from a single image as your infrastructure allows.

So, to run a Docker container with our sentiment analysis app, we first need to build a Docker image from which we can run a Docker container (if Docker is not installed on your system, go to Docker’s website).

To build a Docker image, we need to write a so-called Dockerfile. The Dockerfile is the “recipe” for the Docker image. It tells Docker what to put into the Docker image. Below is the Dockerfile we’ll use for this project. It’s called Dockerfile in the repo.

# Use an official Python runtime as a parent image
FROM python:3.8-slim

# Set the working directory in the container
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Make port 8000 available to the world outside this container
EXPOSE 8000

# Run app.py when the container launches
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]

I normally place the Dockerfile in the root of my project. Then, when you want to build your image, you open a terminal, make sure you’re standing in the folder that contains your Dockerfile, and run this command:

docker build -t sentiment-app .

The -t is a flag that tells Docker to name the image whatever comes after the -t flag. So in this case, the image will be called “sentiment-app”. The dot tells Docker to look for a Dockerfile in the current directory.

Depending on the networking setup on your computer, you may need to add --network=host after build like this:

docker build --network=host -t sentiment-app .

This will tell Docker to run the Dockerfil commands in the host’s network environment.

The build process will take a few moments. When it is done, you’re ready to run a container from the image. Here’s the command for that:

docker run -d -p 8000:8000 sentiment-app:latest

The -p 8000:8000 option in the docker run command is used to publish and map ports between the Docker container and the host system. This option follows the format `-p HOST_PORT:CONTAINER_PORT`.

HOST_PORT: This is the port on your host machine where you want to expose the service.

CONTAINER_PORT: This is the port inside the Docker container where your application is running.

So, in the case of `-p 8000:8000`, it means:

  • The application inside the Docker container is serving on port `8000`.
  • Expose that service on the host machine at port `8000` as well.

This allows you to access the application running in the Docker container through `http://localhost:8000` on your host machine. If your FastAPI application is configured to run on a different port inside the container, you would adjust the `CONTAINER_PORT` accordingly.

For example, if your FastAPI application inside the container is running on port 80, and you want to expose it on port 8000on your host machine, you would use -p 8000:80 .

The docker run command will print the container ID to the terminal. If you want to verify that the container is running, run this command which shows all running containers:

docker ps

Step 4: How to call your Hugging Face model API

Let’s suppose you now have a Docker container running on a server exposing the API endpoint from step 2. How do you call the API? That’s what I’ll show in this last section. We’ll see both how to call the API from a Python script and how to call it from the terminal.

First, here’s the full Python script from call_api.pyin the repo. Below we’ll go through it step by step.

import requests

# Set the URL of your FastAPI endpoint
url = "http://localhost:8000/analyze"


messages = ["This tutorial is very useful", "I did not sleep well so I am grumpy"]


for message in messages:

# Define the input data as a dictionary
data = {"input_string": message}


# Check if the request was successful (status code 200)
try:

# Make a POST request to the endpoint
response = requests.post(url, json=data)

# Print the response from the server
print(f"The sentiment is {response.json()["result"]["sentiment"]} with a score of {round(response.json()["result"]["score"], 3)}")

except Exception as err:

# Print an error message if the request was not successful
print(f"Error: {response.status_code} - {err}")

First we import the requests library. That’s a neat library for sending web requests such as API post requests.

Then, we specify the URL of the endpoint. If your API Docker container is running on the same host as the script that calls the API, you can leave the url as is. Otherwise, you’ll have to change “localhost” to the IPv4 address of the API container’s host.

Next, we have an list called “messages”. These are the messages we will analyze with our sentiment analysis model. You can imagine that this is a list of customer messages that you have pulled from some database or some other realistic use case.

Next, we loop over each message in the list. Inside the loop, we define a dictionary called “data”. That’s the data we’ll send to our API. Recall that in the previous step, we used Pydantic to specify how the API input data should look, and we specified that it must have a field called “input_string”.

Then, we make the actual API call inside the try statement and print the result. In a realistic use case, you might collect the sentiment results in some object (e.g. a list or dictionary) so that you can work with them after the loop finishes.

You can also call the API from your terminal like this if you just need to call it every now and then:

curl -X 'POST' \ 'http://localhost:8000/analyze' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "input_string": "This tutorial is very useful" }'

The result should look like this:

{
"result": [
{
"label": "POSITIVE",
"score": 0.9992365837097168
}
]
}

This concludes the essence of this tutorial. You can find the GitHub repository for this article HERE.

Further work

Now that you hopefully have a working API that exposes a Hugging Face sentiment analysis model running inside a Docker container, there are some additional steps that you may want to explore on your own, e.g.:

  1. Implement a queue to handle situations where multiple users send requests to your API
  2. Fine tune the sentiment analysis model on your own data
  3. Alter the API such that it also accepts lists as input
  4. Write a new API accepts a file as input and calls a model that works on a file, e.g. Whisper on audio files.

Conclusion

I venture that you can increase the chances of organizational adoption of your machine learning models if you can wrap them in an API and Docker image yourself such that the model can easily be deployed. In this tutorial we explored how to do this with a simple sentiment analysis model. You may want to swap this model for something more advanced, but this tutorial hopefully gives you an idea of how to do that. Once you’re familiar with the steps involved, the process is a breeze. To wrap up, the steps were:

  1. Write model code
  2. Write API that exposes model
  3. Build Docker image and run a container from it
  4. Call the API

That’s it! I hope you enjoyed the story. Let me know what you think!

Follow me for more on AI and sustainability and subscribe to get my stories via email when I publish.

I also sometimes write about time series forecasting and green AI topics.

And feel free to connect on LinkedIn.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Kasper Groes Albin Ludvigsen
Kasper Groes Albin Ludvigsen

Written by Kasper Groes Albin Ludvigsen

I write about LLMs, time series forecasting, sustainable data science and green software engineering

Responses (5)