How to deploy ML Models on AWS ECS using Docker and FastAPI

Highly secure, reliable, and scalable way to run ML Models in containers!!

6 min readJan 14, 2024

🎯Scenario:

For any given use-case you have built the ML model in your local machine using jupyter notebook, etc. But as we know, just building the model in jupyter notebook has no business value. So from the business point of view, you need to deploy that model somewhere so that people or applications can use it and in this case your business requirement says you need to deploy that model on AWS ECS.

(Note: The focus of this article is not on how we can develop the ML Model but rather how we can go from jupyter notebook to the production ready app. So by keeping this in mind, i am not going to focus on the model building or training part).

🌍In this article you will going to learn:

How to build the FastAPI app for the model inference.
How to build Docker image for the FastAPI app.
How to push Docker image to the Docker Hub.
How to run Docker image on AWS ECS.

✅Prerequisites:

To implement the solution provided in this article, you should have an AWS account, must be having or able to install docker on your local machine and familiarity with AWS, Docker and FastAPI concepts.

Solution Implementation:

Download the data and code from my Github repository.

1. Let’s build and train the ML model in a Jupyter Notebook

In this article, we will going to build the language detection model. So basically the model will going to take a string (text) as a input and predict the input text’s language like Italian, French, Hindi, etc.

Example: How we build ML Models locally in Jupyter Notebook

Once the model building is done. The next important thing is to save the model as a serialized file (in our case its ‘lang_trained_pipeline.pkl’ file) and also we have to keep in mind the data transformation steps, which we have performed on the training data because we have to perform those same steps on the new prediction data.

2. Now, create the Model Inference script

Once we saved the model, the jupyter notebook work is done. Now will we create the model inference script and in that script we will use the serialized file to make the predictions and before making the predictions we will also perform the data transformation steps.

# Importing packages
import re
import pickle
from pathlib import Path

import warnings
warnings.simplefilter("ignore")


BASE_DIR = Path(__file__).resolve(strict=True).parent

with open(f"{BASE_DIR}/lang_trained_pipeline.pkl", "rb") as f:
    model = pickle.load(f)


classes = [
    "Arabic",
    "Danish",
    "Dutch",
    "English",
    "French",
    "German",
    "Greek",
    "Hindi",
    "Italian",
    "Kannada",
    "Malayalam",
    "Portugeese",
    "Russian",
    "Spanish",
    "Sweedish",
    "Tamil",
    "Turkish",
]


def predict_pipeline(text):
    text = re.sub(r'[!@#$(),\n"%^*?\:;~`0-9]', " ", text)
    text = re.sub(r"[[]]", " ", text)
    text = text.lower()
    pred = model.predict([text])
    return classes[pred[0]]



if __name__ == '__main__':
    text = 'Ciao, come stai?'
    detect = predict_pipeline(text)
    print('Prediction :', detect)

3. Now, create the FastAPI App

We will send the input text string as a post request to one of the app’s endpoint and this app will use the above model inference script to do the prediction.

# Importing packages
from fastapi import FastAPI
from pydantic import BaseModel
from app.model.model_inference import predict_pipeline

app = FastAPI()


class TextIn(BaseModel):
    text: str

class PredictionOut(BaseModel):
    language: str


@app.get("/")
def home():
    return {"health_check": "OK"}


@app.post("/predict", response_model=PredictionOut)
def predict(payload: TextIn):
    language = predict_pipeline(payload.text)
    return {"language": language}

4. Now, we will Dockerize the above FastAPI App

Create the new ‘Dockerfile’ in the root dir or download from here.

FROM tiangolo/uvicorn-gunicorn-fastapi:python3.9

COPY ./requirements.txt /app/requirements.txt

RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt

COPY ./app /app/app

Now run the docker on your local machine and go to the project’s root dir and then execute the below commands in the terminal

docker build -t <your_docker_hub_username>/language-detection-app .

When pushing an image to Docker Hub, you must specify your Docker Hub username as part of the image name.

Here, ‘<your_docker_hub_username>/language-detection-app’ is a name of your Docker image and “.” means current dir.

Now, once the image is built then we can run the container by using below command.

docker run -p 80:80 <your_docker_hub_username>/language-detection-app

Now, go to this ‘http://localhost:80’ address in browser and you should see health check message as ok.

5. Now, push the above Docker image to the Docker Hub

First we need to log in to Docker Hub from command-line interface. To do this, run below command.

docker login

Once we logged in, we can now push our image to Docker Hub by using below command.

docker push <your_docker_hub_username>/language-detection-app

At this point our development work is complete. We built the model then we created the model inference script, then we built the FastAPI app and at last we Dockerized the app and pushed the image to the Docker Hub. Now all we have to do is go to the AWS ECS service and create a container by importing the above image from the Docker Hub.

6. Deploying Docker Hub image on AWS ECS

So in order to deploy the Docker image to the ECS we have to do three things. First we need to create the new ECS Cluster then we need to create a new task definition and then we need to run that task definition as a task inside the ECS cluster.

Creating the ECS Cluster

Now, create a new task definition

Note: In “Image URI” give your docker image path and container port should be 80.

Now, run the above task definition as a new task in the cluster

Go to the cluster then inside cluster go to tasks and then in tasks click on ‘run new task’.

In ‘Networking’ and creat a new security group and in the inbound rules for SG allow traffic from port 80 (HTTP).

That’s all we have deployed the Docker image in AWS ECS, now just go to the task and note down the public ip address.

7. Model Inference (Invoking the AWS ECS endpoint)

As this is a FastAPI application, we can test the endpoint using the FastAPI’s Swagger UI as well. But here we will see, how to invoke the endpoint using Postman and Python’s Request module.

Way-1: Using Postman

Way-2: Using Request Module

import requests

ecs_url = '<your_ecs_public_ip>/predict'

json_body = { 
                "text" : "Ciao, come stai?"
            }

req = requests.post(url=ecs_url, json=json_body)

print(req.text)

🍃Conclusion

In this article we saw how we can deploy the ML Model on AWS ECS using Docker. Now this is one way of doing things, like instead of Docker you can use AWS ECR, likewise instead of deploying to ECS we can deploy the model on AWS Lambda, etc. So the point is, there are multiple ways to deploy the model within a particular cloud provider and at the end of the day you have to choose the deployment type according to your use-case like you want to build serverless application or you want to manage some infrastructure, etc.

Thanks for reading!!