Deploying LangChain Apps with LangServe on GCP

Tahreem Rasul
7 min readFeb 25, 2024

--

In this article, we’ll explore how to deploy LangChain applications using LangServe, a framework designed to simplify the deployment and maintenance of conversational AI applications. We’ll focus on building a content summarization tool, which allows users to upload text, which will then be summarized by OpenAI’s gpt-3.5-turbo-instruct model. We’ll cover both local deployment with LangServe and deployment on Google Cloud Platform’s Cloud Run.

LangServe Playground to test deployed applications

Understanding LangServe

LangServe is a Python framework designed to simplify the deployment of LangChain runnables and chains as REST APIs. A REST API is based on the HTTP protocol and uses HTTP requests to POST (create), PUT (update), GET (read), and DELETE data. These APIs are often exposed over a simple URL and applications can interact with this API by making HTTP requests to URL/endpoint.

LangServe integrates with FastAPI, a modern web framework for building RESTful APIs in Python, facilitating route construction and web service building. Key features of LangServe include:

  1. Schema Detection Automation: LangServe automatically infers input and output schemas from your LangChain object, removing the necessity for manually defining schemas.
  2. Efficient API Endpoints: LangServe includes predefined API endpoints such as /invoke, /batch, and /stream for managing simultaneous requests efficiently.
  3. Performance Tracking: LangServe provides monitoring features to observe and assess the efficiency of your APIs once deployed.

In conclusion, LangServe allows developers to focus on creating LangChain projects while handling the deployment complexities. Additionally, it comes with an interactive playground for testing and refining deployed APIs. We will be utilizing this playground to test our application.

Environment Setup

Create a new conda environment. We need python>=3.8 for LangServe.

conda create -n summarization_bot python=3.11

Activate your conda environment using:

conda activate summarization_bot

Install the requirements:

pip install -r requirements.txt

A quick word on the packages for this project: you need to install langchain-cli that gives us access to the LangChain command line interface. It’s included in the project’s requirements file and will be installed with the command above. This CLI is quite useful when working with LangServe projects or with LangChain templates, although we won’t be using the templates in this project.

Step-by-Step Implementation

Step 1

Since we’d be making requests to OpenAI’s gpt-3.5-turbo-instruct model, you would need an API key. You can sign up at OpenAI and obtain your own key to start making calls to the gpt model. Create a .env file in your project and add your key:

OPENAI_API_KEY="YOUR_OPENAI_API_KEY"

Step 2

We’ll now use the LangChain CLI, which we’ve installed, to create the desired directory structure for our LangServe application. This command also installs required dependencies, including LangChain itself and other necessary packages like FastAPI and Pydantic. Navigate inside your parent directory, and run the following command:

langchain app new .

When you run the above command, you would be asked if you would like to install any packages. This refers to LangChain templates. Since we are not working with those in this tutorial, you can press enter to continue.

Once the command is executed, you’d have the following directory structure inside your parent directory:

.
├── app/
│ ├── __init__.py
│ └── server.py
├── Dockerfile
├── packages/
│ └── README.md
├── pyproject.toml
├── README.md
└── requirements.txt

You’d find some boilerplate application code inside app/server.py script. We will be editing this file to add our own code.

Step 3

In your app/server.py file, begin by importing the necessary dependencies:

from langchain_openai import OpenAI
from langchain.prompts import PromptTemplate
from fastapi import FastAPI
from langserve import add_routes
from dotenv import load_dotenv

load_dotenv()

Step 4

Next, let’s write a prompt to instruct our model to summarize user-provided texts.:

summarization_assistant_template = """
You are a text summarization bot. Your expertise is exclusively in
analyzing and summarizing user-provided texts.
Create a concise and comprehensive summary of the provided text,
retaining all crucial information in a shorter form.
Text for Summarization: {text_for_summarization}"""

summarization_assistant_prompt = PromptTemplate(
input_variables=["text_for_summarization"],
template=summarization_assistant_template
)

The prompt typically needs to be in a specific format for use inside LangChain. In the code snippet above, I’ve used LangChain’s PromptTemplate, which includes the expected input variable for the model and our summarization prompt.

Step 4

Next, we need to define our LLM instance and a runnable object that chains together our prompt and model. In the context of LangChain, a runnable refers to a reusable unit of computation that encapsulates a specific task or series of tasks. These runnables can be chained together to form larger, more sophisticated computational processes called chains.

llm = OpenAI(model='gpt-3.5-turbo-instruct',
temperature=0.5)
llm_chain = summarization_assistant_prompt | llm

While creating the LLM instance, I’ve set the model temperature at 0.5. This would ensure our model gets creative when creating summaries, and paraphrases content as necessary.

Step 5

Next, we will initialize a FastAPI application instance to run our application:

app = FastAPI(
title="LangChain Server",
version="1.0",
description="Summarization App",
)

add_routes(
app,
llm_chain,
path="/openai"
)

The add_routes function from LangServe is used to map LangChain runnables and chains to specific URL paths within a FastAPI application. When we call this function, the FastAPI application instance we previously created gets passed along with the runnable object or chain. Three common endpoints – /invoke, /batch, and /stream – also become available for usage. Our application will be served at the path defined above, and can be altered.

Step 6

Now that we have created the application, we can include code for serving it using uvicorn, a Python framework for hosting web applications asynchronously.

if __name__ == "__main__":
import uvicorn

uvicorn.run(app, host="localhost", port=8000)

At the end, this is what your app/server.py file should look like:

from langchain_openai import OpenAI
from langchain.prompts import PromptTemplate
from fastapi import FastAPI
from langserve import add_routes
from dotenv import load_dotenv

load_dotenv()

summarization_assistant_template = """
You are a text summarization bot. Your expertise is exclusively in
analyzing and summarizing user-provided texts.
Create a concise and comprehensive summary of the provided text,
retaining all crucial information in a shorter form.
Text for Summarization: {text_for_summarization}"""

summarization_assistant_prompt = PromptTemplate(
input_variables=["text_for_summarization"],
template=summarization_assistant_template
)

llm = OpenAI(model='gpt-3.5-turbo-instruct',
temperature=0.5)
llm_chain = summarization_assistant_prompt | llm

app = FastAPI(
title="LangChain Server",
version="1.0",
description="Summarization App",
)

add_routes(
app,
llm_chain,
path="/openai"
)

if __name__ == "__main__":
import uvicorn

uvicorn.run(app, host="localhost", port=8000)

Running application locally

You can test your application by executing the following command:

langchain serve

This will serve the application at http://localhost:8000/openai/playground

Dockerfile Adjustments

Deployment on any cloud service involves building the docker container. A Dockerfile is shipped with the application source code when you create your app in the beginning. Go ahead and modify your Dockerfile:

FROM python:3.11
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["uvicorn", "app.server:app", "--host", "0.0.0.0", "--port", "8080"]

Here is a detailed breakdown of the code in the above file:

  1. FROM python:3.11: specifies the base image for the Dockerfile. It instructs Docker to use the official Python 3.11 image as the starting point for building the new image. This base image already contains a Python runtime environment.
  2. COPY . /app: copies the current directory (where the Dockerfile is located) into a directory named /appinside the Docker image.
  3. WORKDIR /app: sets the working directory within the Docker container to /app.
  4. RUN pip install -r requirements.txt: installs Python dependencies listed in the requirements file.
  5. CMD ["uvicorn", "app.server:app", "--host", "0.0.0.0", "--port", "8080"]: specifies the default command to run when a container is started from the image. It runs the Uvicorn ASGI server with the application specified in the server.py file.

Deploying on GCP

Deploying on Google Cloud Platform (GCP) using Langserve is quite straightforward. Cloud Run in GCP is a managed compute platform that lets you run containers directly on top of Google’s scalable infrastructure. You can deploy code written in any programming language on Cloud Run if you can build a container image from it. In fact, we can deploy code directly from the source without first building a container image. This source-based deployment option is available for a few programming languages including Python, where container builds are handled for you. I’ll be detailing deployment through both approaches, and you can use either of the two specified below.

Option 1: source-based deployment

If you do not wish to build a docker image, you can build directly from source using the following command in your main project directory:

gcloud run deploy SERVICE --source . --port 8080 --project PROJECT_ID 
--allow-unauthenticated --region REGION
--set-env-vars=OPENAI_API_KEY=$(OPENAI_API_KEY)

The gcloud run deploy command above deploys the source code in your directory without you having to explicitly build the Docker image. I have defined a few flags above. Here is a quick explanation of what they mean. For further options, check this out:

  1. SERVICE : ID or identifier of service. Replace with what you want to call your service. For example, I named this application service summarization_bot.
  2. --project <PROJECT_ID> : project ID to use for this invocation. If omitted, then the current project is assumed; the current project can be listed using gcloud config list --format='text(core.project)'
  3. --allow-unauthenticated : Whether to enable allowing unauthenticated access to the service.
  4. --region <REGION> : Region in which the resource can be found. You can set this to us-central1. However, you can omit this flag and select your preferred region from a list during the deployment process.

Option 2: image-based deployment

If you want slightly more control over the deployment, you can first build your image and then deploy the application. Use the following command to build your docker image:

gcloud builds submit --tag gcr.io/PROJECT_ID/image_name .

The command above initiates the process of building a container image using Google Cloud Build and then pushes this image to the Google Container Registry (gcr.io) with the specified tag PROJECT_ID/image_name. You can give your image any name of your liking.

Once the image has been successfully built, go ahead and type:

gcloud run deploy SERVICE --image gcr.io/PROJECT_ID/image_name

The command above executes the deployment using the image we built previously, specified by the --image flag.

Running either of the deployment commands should return a URL. This is where our service is being hosted. We can test it out by adding /openai/playground to the returned URL.

Demo

You can view a demo of the deployed application here:

Next Steps

We have deployed a simple summarization application using LangServe, and discussed deployment to GCP. This application logic can be extended, with other LangChain functionalities. You can find the code from this tutorial on my GitHub.

You can follow along as I share working demos, explanations and cool side projects on things in the AI space. Come say hi on LinkedIn and X! I share guides, code snippets and other useful content there. 👋

--

--

Tahreem Rasul

ML Engineer. I am interested in talking about all things in the AI space, specifically language and vision models. https://linktr.ee/tahreemrasul