Zul Ahmed
3 min readApr 28, 2024

What is FastAPI and its LLM Applications?

FastAPI Logo

FastAPI is a “Modern web framework for building APIs in Python.”

Github Link to Repository: https://github.com/tiangolo/fastapi

Core Features:

Supports asynchronous programming, automatic OpenAPI documentation, and is based on Pydantic with type hints.

FastAPI is a modern, fast web framework for building APIs with Python. It’s built on top of Starlette for web handling and Pydantic for data validation, making it particularly suited for creating robust and performant web services. Here’s how it can help in building an LLM application:

  1. Async Support: FastAPI supports asynchronous programming which can handle multiple requests efficiently. This is ideal for LLM applications that may need to manage multiple, simultaneous inference requests.
  2. Automatic Documentation: It generates interactive API documentation automatically using Swagger and ReDoc, which makes it easier to test and debug the API endpoints you expose for interacting with your LLM.
  3. Data Validation: FastAPI uses Pydantic for data validation, ensuring that the inputs to your LLM are correctly formatted and typed before processing, reducing errors and improving reliability.
  4. Performance: It is built to be fast. FastAPI can help minimize latency in requests to your LLM, which is crucial for applications requiring real-time responses.
  5. Easy Integration: FastAPI easily integrates with machine learning libraries and tools, facilitating the deployment of models like LLMs into a production environment where you can access them via HTTP requests.

Asynchronous Support

FastAPI’s asynchronous support allows it to handle multiple requests concurrently without blocking the execution of other tasks. This is particularly beneficial for LLM applications that may receive a high volume of requests. By leveraging asynchronous programming, FastAPI can efficiently manage resources and provide faster response times.

For example, consider an LLM-powered chatbot application. When multiple users interact with the chatbot simultaneously, FastAPI’s asynchronous support ensures that each request is handled independently without blocking others. This allows the chatbot to respond to multiple users concurrently, improving the overall performance and user experience.

Automatic Documentation

FastAPI automatically generates interactive API documentation using Swagger and ReDoc. Swagger is an open-source framework that helps in designing, building, documenting, and consuming RESTful APIs. It provides a user-friendly web interface where developers can explore and test the API endpoints.

ReDoc, on the other hand, is a tool that generates documentation from OpenAPI specifications. It provides a clean and responsive documentation layout that is easy to navigate and understand.

By integrating Swagger and ReDoc, FastAPI automatically generates up-to-date and interactive documentation based on your API code. This saves time and effort in manually documenting the API and ensures that the documentation stays in sync with the actual implementation.

Serving an LLM with FastAPI

Here’s a simple example of how you can use FastAPI to serve an LLM:

from fastapi import FastAPI
from pydantic import BaseModel
from transformers import pipeline

app = FastAPI()
nlp = pipeline("text-generation", model="gpt2")

class InputText(BaseModel):
text: str

@app.post("/generate")
async def generate_text(input: InputText):
generated_text = nlp(input.text, max_length=100, num_return_sequences=1)[0]["generated_text"]
return {"generated_text": generated_text}

In this example, we create a FastAPI application and load a pre-trained GPT-2 model using the Hugging Face Transformers library. We define an input model InputText using Pydantic to validate the incoming request data.

The /generate endpoint accepts a POST request with the input text and generates a response using the GPT-2 model. The generated text is returned as a JSON response.

Image by Farhad Malik

I found this image to be helpful in understanding from his blog of:
https://towardsdatascience.com/build-and-host-fast-data-science-applications-using-fastapi-823be8a1d6a0

If you’re looking for a way to develop API endpoints to serve an LLM, This Datacamp tutorial is pretty good:
https://www.datacamp.com/tutorial/serving-an-llm-application-as-an-api-endpoint-using-fastapi-in-python

This is also good if you want to serve a local LLM to somewhere:
https://plainenglish.io/blog/your-local-llm-using-fastapi

Zul Ahmed

SWE @ Microsoft | Colgate University Senior from NYC Looking to share insights throughout my Academia and Projects! https://www.linkedin.com/in/zul-ahmed/