Boost your applications with 🦜🔗LangChain

Henry Lagarde
Criteo Tech Blog
Published in
13 min readApr 3, 2024
Photo by Steve Johnson on Unsplash

With the rise of the Large Language Models (LLMs), today most of our apps have some AI capabilities in order to simplify our day-to-day jobs. But let’s be honest, most of these “revolutionary” applications (or startups) are a front-end calling the OpenAI API.

It’s in this new era that 🦜🔗LangChain appeared and proposed a new approach to developing applications with an LLM integration and allowing all software engineers to adopt this new technology.

Okay, but what is 🦜🔗LangChain ?

As a Software Engineer, you don't want to implement a new feature based on a specific API interface since any breaking changes will easily impact you.

As its founder and CEO describes it, LangChain is a framework that interconnects with most vector stores, LLMs, and much more. It allows any software engineer to manipulate and combine LLMs with other tools into Chains (now you get why it’s called LangChain 😉). This normalized interface will ease any transition to a more powerful LLM (or less expensive) without breaking or re-implementing your whole application.

LangChain is an open-source project created on October 24, 2022, with a Python 🐍 and TypeScript (TS) package that skyrocketed in only one year to more than 5.5m monthly downloads and 75k GitHub stars ⭐️.

LangChain Github stats

No worries. Even if you are not a Python or JS developer, you can still use LangChain thanks to the awesome community that proposes some implementation of LangChain in most languages (Java, C#, Go, Rust …, etc.).

🧱Fundamentals modules

Today, LangChain is more than just the core package with all the modules; it’s truly an ecosystem that helps you build, publish, and monitor your chain.

LangChain consists of three distinct packages:

  • 🦜🔗LangChain modules: The core package to create your Chain
  • 🖥️LangServe: Publish your Chain into a RESTful
  • 📡LangSmith: Monitor your chain
LangChain universe

🖥️LangServe

LangServe will help you publish your chain into a RESTful API, allowing you to call it from any app. It will expose several endpoints:

  • POST /invoke : Invoke the runnable on a single input.
  • POST /batch - invoke the runnable on a batch of inputs.
  • POST /stream - invoke on a single input and stream the output.
  • POST /stream_log - Invoke on a single input and stream the output, including the output of intermediate steps as it’s generated.
  • POST /astream_events - Invoke on a single input and stream events as they are generated, including from intermediate steps.
  • GET /input_schema - JSON schema for input to the runnable
  • GET /output_schema - JSON schema for output of the runnable
  • GET /config_schema - JSON schema for config of the runnable

This could be a great solution if you still want to use it but do not rewrite your whole application. It could also avoid leakage and abuse of your app. However, there are some limitations to using LangServe. For instance, you can’t use callbacks (for now), which can be quite limiting at a point. Additionally, it’s using FastAPI, limiting the generation of your OpenAPI docs since it doesn’t support Pydantic V2 (GitHub issue).

📡LangSmith

LangSmith will help you monitor, replay, and sandbox your runs with just a few lines of code. This is super practical if you want an easy way to ensure that your chain, when pushed into production, doesn’t go haywire and potentially explode your bill (we’ve already had bad cloud configurations for that)!

https://medium.com/@lei.shang/getting-started-with-langsmith-a-step-by-step-walkthrough-a5ca09adca43

There is also a quite filled hub with ready-to-use chains, agents, or prompt templates that you can copy and paste into your application. The LangChain community is growing daily with new ideas and new integrations, making your day easier every day. Even though you are not using LangChain, the Prompt Template Hub can help you with LLM interaction directly in any UI.

🦜🔗LangChain Modules

This is the part that will interest us the most; this is where the magic happens.

In my point of view, we can split the modules into different parts:

  • LLM Integration
  • Agents & Chains
  • Memory
  • Prompt templating
  • Input/Output formatting
  • Documents Loader and Splitter
  • Vector Store integration
LangChain modules

🤖LLM Integration

LangChain offers a lot of integration with most of the existing LLMs, making it easy to use and switch from one to another. Here are some of my favorites, but you can find the full list here (Not always up to date, so also take a look directly in the Github repo):

  • GPT any version
  • Llama
  • Mistral
  • Hugging Face
  • Google AI

But if by any chance you don’t have the integration for your LLM and want to use a custom one, you can easily implement yours by implementing the following class:

import { LLM, type BaseLLMParams } from "@langchain/core/language_models/llms";
import type { CallbackManagerForLLMRun } from "langchain/callbacks";
import { GenerationChunk } from "langchain/schema";

export interface CustomLLMInput extends BaseLLMParams {
n: number;
}

export class CustomLLM extends LLM {
n: number;

constructor(fields: CustomLLMInput) {
super(fields);
this.n = fields.n;
}

_llmType() {
return "custom";
}

async _call(
prompt: string,
options: this["ParsedCallOptions"],
// Can pass runManager into sub runs for tracing
_runManager: CallbackManagerForLLMRun
): Promise<string> {
return prompt.slice(0, this.n);
}

async *_streamResponseChunks(
prompt: string,
options: this["ParsedCallOptions"],
runManager?: CallbackManagerForLLMRun
): AsyncGenerator<GenerationChunk> {
for (const letter of prompt.slice(0, this.n)) {
yield new GenerationChunk({
text: letter,
});
// Trigger the appropriate callback
await runManager?.handleLLMNewToken(letter);
}
}
}

Once you have selected your favorite LLM model, all you have to do is install it, and you are done. You can already call it with any prompt. Have fun!

const model = new ChatOpenAI({
modelName: "gpt-3.5-turbo-0613",
temperature: 0,
});
await model.invoke(
"What are some theories about the relationship between unemployment and inflation?"
);

⛓️Chains

A Chain is a wrapper around multiple components that will be executed in a defined order. This is the core concept of LangChain. Thanks to this, you will be able to split your logic into multiple calls in a specific sequence to create a coherent application.

The most common way to define a chain is through the LangChain Expression Language (LCEL). You have plenty of examples ready to use that will kickstart your project.

⚙️Agents

The main purpose of an Agent is to decide, thanks to an LLM, the chain of actions in order to accomplish the task. The chosen LLM will be used as the reasoning part of your app and will decide which tool to use and how to use it.

Agents will allow you to combine any LLM with any other module or tools that you define. The great thing about LangChain is the number of already existing integrations (Github, Confluence, Notion, Brave, Serper…).

const search = new Serper();
const searchTool = new DynamicTool({
name: "search_serper",
description:
"useful for when you need to ask with search. input should be a string of the search term",
func: (input: string) => search.call(input),
});
const model = new ChatOpenAI({
modelName: "gpt-3.5-turbo-0613",
temperature: 0,
});

const agent = await createOpenAIFunctionsAgent({
llm: model,
tools: [searchTool],
prompt: prompt,
});
const agentExecutor = new AgentExecutor({
agent,
tools: [searchTool],
verbose: true,
callbacks: [new MyCallbackHandler()],
});
agentExecutor.invoke({ input: input});

In this example, I give my LLM (ChatGPT in this case) access to the capability to search on the Internet thanks to the Serper API. When I give it a prompt, it will be able to think and make the decision to provide me links from its research on the Internet if I ask it to.

You can easily follow the actions of the agent thanks to the callbacks props, which allow you to pass a Callback Class that will trigger some functions and methods depending on the current action of the agent. This can be useful if you need to perform specific actions based on the state of the agent (start using a tool, launch the chain, use the LLM, etc.).

💬Prompt Template

One of LangChain's key features is the Prompt Template, which is similar to the f-string in Python. It allows you to define prompts with variables that you can interpolate with values.

const getCurrentDate = () => {
return new Date().toISOString();
};

const prompt = new PromptTemplate({
template: "Tell me a {adjective} joke about the day {date}",
inputVariables: ["adjective", "date"],
});

const partialPrompt = await prompt.partial({
date: getCurrentDate,
});

const formattedPrompt = await partialPrompt.format({
adjective: "funny",
});

console.log(formattedPrompt);

// Tell me a funny joke about the day 2023-07-13T00:54:59.287Z

Thanks to this feature, you can easily restrict your prompt with variables from your users or other tools. This can be seen as a type of safeguard against Prompt Injections and can help avoid incorrect usage of your chain.

Another key important feature is the capability to distinguish between messages from Humans, Systems, AI, or placeholders.

import { HumanMessage, SystemMessage } from "langchain/chat_models/messages";

const messages = [
new SystemMessage("You're a helpful assistant"),
new HumanMessage("What is the purpose of model regularization?"),
];
await chatModel.invoke(messages);
//AIMessage { content: 'The purpose of model regularization is to prevent overfitting in machine learning models. Overfitting occurs when a model becomes too complex and starts to fit the noise in the training data, leading to poor generalization on unseen data. Regularization techniques introduce additional constraints or penalties to the model's objective function, discouraging it from becoming overly complex and promoting simpler and more generalizable models. Regularization helps to strike a balance between fitting the training data well and avoiding overfitting, leading to better performance on new, unseen data.' }

This will help you define the past messages (but wait for the next part 😉) and some specific behaviors or capabilities that your Agent can have.

🧠Memory

Today, LLMs are mainly used in conversational interfaces, meaning that the agent needs to have access to the chat history in order to better answer and fine-tune responses.

Memory implementation from LangChain

The schema from the LangChain documentation explains quite well how the memory should work. The main idea is that we will add the history before the main prompt.

import { OpenAI } from "@langchain/openai";
import { PromptTemplate } from "@langchain/core/prompts";
import { LLMChain } from "langchain/chains";

const llm = new OpenAI({ temperature: 0 });

const template = `You are a nice chatbot having a conversation with a human.

Previous conversation:
{chat_history}

New human question: {question}
Response:`;

const prompt = PromptTemplate.fromTemplate(template);

// Notice that we need to align the `memoryKey` with the variable in the prompt
const llmMemory = new BufferMemory({ memoryKey: "chat_history" });

const conversationChain = new LLMChain({
llm,
prompt,
verbose: true,
memory: llmMemory,
});

// Notice that we just pass in the `question` variable.
// `chat_history` gets populated by the memory class
await conversationChain.invoke({ question: "What is your name?" });
await conversationChain.invoke({ question: "What did I just ask you?" });

In order to make the chat history work, you have to add the {chat_history} key in your prompt. This is where LangChain will place your history. The inconvenience with this approach is that if you end up with a large history message, the size of the input will become too large, and the LLM won’t handle it. To solve this issue, LangChain has a Conversation Summary Memory that will summarize the conversation into a shorter string, thus avoiding reaching the token limit.

🖨️Input and Output format

What we saw before is already pretty cool, but what happens if we need a specific type in our input or output? LangChain has thought about it and proposes an auto-formatter. You just have to provide the schema (in TypeScript, a Zod schema), and it’s done!

Under the hood, in fact, LangChain will generate a precise prompt to your LLM that will include your schema and some examples to avoid any typing issues.

import { z } from "zod";
const schema = z.object({
mainTitle: z.string(),
sections: z.array(
z.object({
title: z.string(),
content: z.array(z.string()),
images: z.array(z.string()),
sources: z.array(
z.object({
url: z.string().url(),
title: z.string(),
})
),
})
),
});

const powerpoint = new DynamicStructuredTool({
name: "powerpoint_generator",
description:
"usefull to generates a powerpoint presentation, it doesn't provide links and images.",
func: async (toolInput) => generatePPT({ content: toolInput }),
schema,
});

Thanks to the DynamicStructuredTool , LangChain will interpolate your Zod schema to the type and strongly type the input of your Tool function.

Okay, we saw how we could change the type of input for some specific tools, but how could we play with the output of the chain in order to handle the answer of our chains correctly? Here again, LangChain comes with a set of tools that ease this process. By default, the output of the chain will be a string (streamed or not), but here is the list of the built-in tools:

  • String
  • HTTPResponse (binary)
  • OpenAIFunction
  • CSV
  • OutputFixing (a wrapper around another output parser)
  • DateTime
  • Structured (any type, basically)

We will focus on the structured output since it’s the less obvious one. Here again, we can describe the type directly as an object, which will work perfectly if you mainly need string types.

const parser = StructuredOutputParser.fromNamesAndDescriptions({
answer: "answer to the user's question",
source: "source used to answer the user's question, should be a website.",
});

But when you need something with a more complex object structure, Zod will come to help you fix that issue and type your input the way you want.

import { z } from "zod";
import { OpenAI } from "@langchain/openai";
import { RunnableSequence } from "@langchain/core/runnables";
import { StructuredOutputParser } from "langchain/output_parsers";
import { PromptTemplate } from "@langchain/core/prompts";

const parser = StructuredOutputParser.fromZodSchema(
z.object({
answer: z.string().describe("answer to the user's question"),
sources: z
.array(z.string())
.describe("sources used to answer the question, should be websites."),
})
);

const chain = RunnableSequence.from([
PromptTemplate.fromTemplate(
"Answer the users question as best as possible.\n{format_instructions}\n{question}"
),
new OpenAI({ temperature: 0 }),
parser,
]);

console.log(parser.getFormatInstructions());

/*
Answer the users question as best as possible.
You must format your output as a JSON value that adheres to a given "JSON Schema" instance.

"JSON Schema" is a declarative language that allows you to annotate and validate JSON documents.

For example, the example "JSON Schema" instance {{"properties": {{"foo": {{"description": "a list of test words", "type": "array", "items": {{"type": "string"}}}}}}, "required": ["foo"]}}}}
would match an object with one required property, "foo". The "type" property specifies "foo" must be an "array", and the "description" property semantically describes it as "a list of test words". The items within "foo" must be strings.
Thus, the object {{"foo": ["bar", "baz"]}} is a well-formatted instance of this example "JSON Schema". The object {{"properties": {{"foo": ["bar", "baz"]}}}} is not well-formatted.

Your output will be parsed and type-checked according to the provided schema instance, so make sure all fields in your output match the schema exactly and there are no trailing commas!

Here is the JSON Schema instance your output must adhere to. Include the enclosing markdown codeblock:
```
{"type":"object","properties":{"answer":{"type":"string","description":"answer to the user's question"},"sources":{"type":"array","items":{"type":"string"},"description":"sources used to answer the question, should be websites."}},"required":["answer","sources"],"additionalProperties":false,"$schema":"http://json-schema.org/draft-07/schema#"}
```

What is the capital of France?
*/

const response = await chain.invoke({
question: "What is the capital of France?",
format_instructions: parser.getFormatInstructions(),
});

console.log(response);
/*
{ answer: 'Paris', sources: [ 'https://en.wikipedia.org/wiki/Paris' ] }
*/

In this example from the LangChain documentation, we can see how LangChain generates a specific prompt to describe the format of your output. From this point, you can easily handle your chain's output and do whatever you want.

📄Document Loader and Splitter

Having an LLM is fine, but most of the time, you want it to be specific for your use case and your data. In that case, multiple solutions are available:

  • Fine-tuning it on your data, but this can be expensive and require some knowledge about LLMs and AI.
  • Providing a set of documents.

The second solution can be quite handy, especially when you have a large number of documents to load from different sources. LangChain provides many integrations (Confluence, Notion, GitHub, Google Drive, Figma…). There is a high chance that this integration already exists, but also with more common files such as PDFs, text files, or CSVs.

import { PDFLoader } from "langchain/document_loaders/fs/pdf";

const loader = new PDFLoader("src/document_loaders/example_data/example.pdf");
const docs = await loader.load();

With this piece of code, you load the data from a PDF and split it into different documents. The Document interface in LangChain is common to all the providers, allowing you to regroup all the different documents into one set of documents.

export interface DocumentInterface<
Metadata extends Record<string, any> = Record<string, any>
> {
pageContent: string;
metadata: Metadata;
}

The structure of the Document Interface allows you to pass any kind of data in the metadata attributes (link, URL, title, etc.).

The counterpart of this kind of integration is that you can load big documents that will make the search by the LLM complex, as the understanding of it will be too broad. The document splitter helps you to solve that problem by providing specific instructions on how the loader should split the given files:

  • Split by Character
  • Split code and markup
  • Recursive Character Text Splitter (the recommended one by default)
  • Token Text Splitter
  • Custom Text Splitter

I recommend that you check the different text splitters, although most of the time, the Recursive one will do the job perfectly and provide some good results.

🗄️Vector Stores

In order to avoid loading and splitting your documents every time you need to call your Agent, you will want to save them somewhere your LLM could search. LangChain integrates with a lot of vector stores (PineCone, Qdrant, Elasticsearch, etc.), allowing you to save a vectorized version of your documents.

const model = new ChatOpenAI({
modelName: "gpt-3.5-turbo-0613",
});

const embeddings = new OpenAIEmbeddings();
const vectorStore = const vectorStore = await PineconeStore.fromExistingIndex(embeddings, {
pineconeIndex: index,
});

const chain = ConversationalRetrievalQAChain.fromLLM(
model,
vectorStore.asRetriever(),
);

const question = "What did the president say about Justice Breyer?";

const res = await chain.call({ question });
const response = await chain.call({
question: "Was that nice?"
});
return {
text: response.text,
sourceDocuments: response.sourceDocuments,
}

Thanks to this Vector Store integration, you can directly use the vector store as a retriever, completing the requests of your LLM with links and relevant data.

🧑‍💻Some warnings about it

From my experience with 🦜🔗LangChain, we can easily create powerful apps with just a few lines of code. Thanks to the document loaders, I have been able to implement complex apps with large amounts of personal information easily without any need for fine-tuning or deep knowledge of Machine Learning or LLMs.

BUT, be careful! You are still manipulating LLMs, meaning that the output and the answers it provides can change every time. We’ve all faced a lazy GPT once that didn’t want to answer directly; it’s the same here, but with integrations with other tools (you can easily imagine the mess).

On the other hand, this framework allows us (Software Engineers) to easily understand how LLMs work and how we can ‘play’ with them. Since AI will be the future, don’t miss your chance to learn this new tool and its capabilities.

Try to create your own ChatGPT or smart documentation! 🤖

--

--