Serving AI Models as APIs on the Edge — Part 1

Published in

Sparque labs

3 min readAug 4, 2023

Introduction

This first in a multi-series article outlines how to package/containerize an AI model as a Docker image and run it on various target platforms.

The next article will show you how to deploy it onto an edge version of Kubernetes using k3s.

We will use gpt2 as an example.

The source for this example is here:

https://github.com/sparquelabs/ai-serving/tree/main/cogs/textgen-gpt2

Containerizing an AI model

Below we show you how to package and containerize an AI model for the sake of serving inferencing.

We will use cog, a tool that can create a Docker image for an AI model along with using a predictable API interface for the model.

What is Cog?

Cog is an open-source tool that lets you package machine learning models in a standard, production-ready container. You can deploy your packaged model to your own infrastructure, or to Replicate.

With Cog, you define your environment with a simple configuration file and it generates a Docker image with all the best practices: Nvidia base images, efficient caching of dependencies, installing specific Python versions, sensible environment variable defaults, and so on
Cog knows which CUDA/cuDNN/PyTorch/Tensorflow/Python combos are compatible and will set it all up correctly for you.
Define the inputs and outputs for your model with standard Python. Then, Cog generates an OpenAPI schema and validates the inputs and outputs with Pydantic
Automatic HTTP prediction server: Your model’s types are used to dynamically generate a RESTful HTTP API using FastAPI
Automatic queue worker. Long-running deep learning models or batch processing is best architected with a queue. Cog models do this out of the box. Redis is currently supported, with more in the pipeline
Ready for production. Deploy your model anywhere that Docker images run

Installing Cog

Below we show how to install cog locally.

# Download cog from https://github.com/replicate/cog
wget https://github.com/replicate/cog/releases/download/v0.8.5/cog_linux_x86_64
sudo mv cog_linux_x86_64 /usr/local/bin/cog
sudo chmod +x /usr/local/bin/cog

Using Cog to create container image

Create a cog yaml file for textgen-gpt2. This file contains everything needed to provision a Container for our AI model.

Please note the python_packages section. Here we can specify the python package dependencies that are needed to run the specific AI model.

We also need to specify the Predictor function and the file in which the Predictor function resides.

This allows us to both test and serve the AI model using the cog.

build:
  gpu: false
  python_version: "3.10"
  python_packages:
    - "torch==1.12.1"
    - "transformers==4.26.1"
    - "sentencepiece==0.1.97"
    - "accelerate==0.16.0"

predict: "predict.py:Predictor"

Download model weights

Download the weights of the model using the scripts/download_weights python script.

cog run script/download_weights

This downloads the pytorch model into the

Testing the AI model once using the cog

To test the AI model once using the cog, we can do the following:

cog predict -i prompt="The sailor sailed into the"

You can see that we pass in the parameter for the Predictor function as part of the cog CLI test.

Build the cog model as a Docker container image

Now, we can build the cog AI model as a Docker container image.

cog build

Run the AI model in Docker container

Now, we run the AI model as a Docker container

# run
docker run --rm -p -d 5000:5000 textgen-gpt2

# check
docker ps

It runs on port 5000 as a REST API service that can be invoked with an HTTP client.

Invoke the AI model to make an inference

We can now use curl to invoke the AI model and get an inference.

curl -s -X POST -H 'Content-Type: application/json' http://localhost:5000/predictions   -d '{"input": {"prompt":"The sailor sailed into the "}}' | jq '.output'

"[{'generated_text': 'The sailor sailed into the vernal darkness and began to scream.\\n\\n\"My dear, what\\'s happening?\"\\n\\n\"There\\'s a fire burning in the water…\"\\n\\n\"I can\\'t see anything.\"\\n\\n\"Oh God'}]"

We can see the generated output text from the GPT2 model.

Summary

This article showed you how to quickly package and serve your AI model using a cog-generated Docker Container Image.

We thus avoided having to write any Dockerfile and built everything inside the Docker container.

This way, you can run your AI models without having to install any dependencies locally or using Virtual Environments.

You also saw how you can debug your code using the cog run directive. This allows you to test your Predictor code locally before building the container.