Unlocking the Power of Retrieval Augmented Generation (RAG) with LangChainJS

Published in

AlamedaDev

17 min readFeb 6, 2024

This is an overview focused on learning the basics of Large Language Model (LLM) applications using LangChainJS, a tool for building and operating LLMs.

LangChain.js is particularly suitable for building applications across multiple platforms, such as browser extensions, mobile apps with React Native, and desktop apps with Electron. Given the extensive use of JavaScript among developers, the ease of deployment, scaling features, and advanced tools available in this ecosystem.

LangChain employs a special language to compose chains of components called runnables. These runnables define core methods, input and output types, and allow for common LLM application functionalities like invoke, stream, batch, and runtime parameter modification.

Retrieval Augmented Generation, or RAG, is an exciting application of Large Language Models (LLMs) that is gaining widespread popularity. In this blog post, we will delve into what RAG is and explore some tools offered by LangChainJS, the JavaScript version of LangChain, to simplify its implementation, including document loaders and text splitters.

Basic Example:

import "dotenv/config";
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage } from "@langchain/core/messages";

const model = new ChatOpenAI({
    modelName: "gpt-3.5-turbo-1106"
});

await model.invoke([
    new HumanMessage("Tell me a joke.")
]);

### RESPONSE:

AIMessage {
  lc_serializable: true,
  lc_kwargs: {
    content: "Why don't skeletons fight each other?\nThey don't have the guts!",
    additional_kwargs: { function_call: undefined, tool_calls: undefined }
  },
  lc_namespace: [ "langchain_core", "messages" ],
  content: "Why don't skeletons fight each other?\nThey don't have the guts!",
  name: undefined,
  additional_kwargs: { function_call: undefined, tool_calls: undefined }
}

Prompt templates

Prompt templates in programming, particularly for Large Language Models (LLMs) like those used in LangChain, are standardized formats or structures used for creating prompts. They incorporate placeholders for variable input, making them reusable and adaptable for different queries. These templates streamline the process of prompt creation, enhance readability, and improve the maintainability of the code by centralizing changes to prompt logic.

In LangChain, prompt templates are implemented using classes like `PromptTemplate` and `ChatPromptTemplate`, which provide methods to create and format prompts with input variables. These templates support different templating engines and offer features like validation of input variables and flexibility in specifying input values.

import { ChatPromptTemplate } from "@langchain/core/prompts";

const prompt = ChatPromptTemplate.fromTemplate(
    `What are three good names for a company that makes {product}?`
)
await prompt.format({
    product: "colorful socks"
});

##>> "Human: What are three good names for a company that makes colorful socks?"

await prompt.formatMessages({
    product: "colorful socks"
});

##> [
  HumanMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "What are three good names for a company that makes colorful socks?",
      additional_kwargs: {}
    },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "What are three good names for a company that makes colorful socks?",
    name: undefined,
    additional_kwargs: {}
  }
]

LangChain Expression Language (LCEL)

LCEL is designed to connect different components, referred to as “runnables,” in a sequence. This allows for the creation of sophisticated workflows where the output of one component becomes the input for another. These are essentially building blocks that define core methods and are designed to accept specific input types and produce defined output types. This makes it easier to integrate different functionalities in a seamless manner.

Runnables come with common methods like ‘invoke’, ‘stream’, and ‘batch’. These methods are essential in LLM applications, allowing for various operations such as invoking the runnable to execute its function, streaming data for continuous processing, and batch processing for handling multiple inputs simultaneously.

const chain = prompt.pipe(model);
await chain.invoke({
    product: "colorful socks"
});

##> AIMessage {
  lc_serializable: true,
  lc_kwargs: {
    content: "1. Rainbow Soles\n2. Vivid Footwear Co.\n3. Chromatic Sockworks",
    additional_kwargs: { function_call: undefined, tool_calls: undefined }
  },
  lc_namespace: [ "langchain_core", "messages" ],
  content: "1. Rainbow Soles\n2. Vivid Footwear Co.\n3. Chromatic Sockworks",
  name: undefined,
  additional_kwargs: { function_call: undefined, tool_calls: undefined }
}

Output Parser

In certain scenarios, it’s more convenient to work with the raw string value of the output rather than its default format (e.g., an AI message object). LangChain addresses this need through the use of an ‘output parser’. An output parser is a tool within LangChain that transforms the chat model output into a different format, such as a simple string.

import { StringOutputParser } from "@langchain/core/output_parsers";

const outputParser = new StringOutputParser();
const nameGenerationChain = prompt.pipe(model).pipe(outputParser);

await nameGenerationChain.invoke({
    product: "fancy cookies"
});

##> "1. Gourmet Cookie Creations\n" +
  "2. Delicate Delights Bakery\n" +
  "3. Heavenly Sweet Treats Co."

Stream

Runnables in LangChain, as well as sequences of runnables, are equipped with a `.stream` method. This method is particularly useful for handling LLM responses that take a long time to generate. It returns the output in an iterable stream, allowing for a more responsive presentation of data, especially in situations where rapid feedback is important.

In this example, the chain is executed with a different product name (“really cool robots”), and the async iterator syntax (`for await…of`) is used to loop over and display the stream chunks. This results in getting individual string chunks of the output, such as names of companies, which can be displayed more quickly in a frontend application.

const stream = await nameGenerationChain.stream({
  product: "really cool robots",
});

for await (const chunk of stream) {
    console.log(chunk);
}

Batch

The `batch` method is useful for performing multiple operations concurrently. This method allows for handling multiple inputs simultaneously, leading to efficient generation of responses. Given where a set of inputs, such as “large calculators” and “alpaca-wool-sweaters”, are processed using the `.batch` method. This results in multiple string outputs corresponding to different company names for each product type. This showcases the method’s ability to handle and generate responses for multiple queries at once.

const inputs = [
    { product: "large calculators" },
    { product: "alpaca wool sweaters" }
];

await nameGenerationChain.batch(inputs);

##> [
  "1. GiantCalc Co.\n2. MegaMath Devices\n3. JumboCalculations Inc.",
  "1. Alpaca Luxe\n2. Sweater Alpaca\n3. Woolly Alpaca Co."
]

Understanding Retrieval Augmented Generation (RAG)

RAG, short for Retrieval Augmented Generation, is a technique that leverages the capabilities of LLMs to generate text while keeping contextual information in mind. It follows a basic workflow:

Document Loading with LangChainJS

LangChainJS offers a powerful suite of document loaders that can collect data from various sources, including the web and proprietary sources, specifically tailored for JavaScript applications. For instance, you can use the GitHub loader provided by LangChainJS to access JavaScript repositories and retrieve files. LangChainJS even allows you to customize the loading process by specifying parameters like file paths and exclusions.

To load a GitHub repository like LangChain.js using LangChainJS, you can initialize the loader, set options, and load the data. This flexibility makes it easy to incorporate JavaScript-based data from diverse sources into your RAG applications using LangChainJS.

##> Loading from Github repo:

import { GithubRepoLoader } from "langchain/document_loaders/web/github";
// Peer dependency, used to support .gitignore syntax
import ignore from "ignore";

// Will not include anything under "ignorePaths"
const loader = new GithubRepoLoader(
  "https://github.com/langchain-ai/langchainjs",
  { recursive: false, ignorePaths: ["*.md", "yarn.lock"] }
);

const docs = await loader.load();
console.log(docs.slice(0, 3));

##> [
  Document {
    pageContent: "coverage:\n" +
      "  status:\n" +
      "    project:\n" +
      "      default:\n" +
      "        informational: true\n" +
      "    patch:\n" +
      "      default"... 151 more characters,
    metadata: {
      source: ".codecov.yml",
      repository: "https://github.com/langchain-ai/langchainjs",
      branch: "main"
    }
  },
  Document {
    pageContent: "# top-most EditorConfig file\n" +
      "root = true\n" +
      "\n" +
      "# Unix-style newlines with a newline ending every file\n" +
      "[*]"... 17 more characters,
    metadata: {
      source: ".editorconfig",
      repository: "https://github.com/langchain-ai/langchainjs",
      branch: "main"
    }
  },
  Document {
    pageContent: "* text=auto eol=lf",
    metadata: {
      source: ".gitattributes",
      repository: "https://github.com/langchain-ai/langchainjs",
      branch: "main"
    }
  }
]

## Loading from PDF

// Peer dependency
import * as parse from "pdf-parse";
import { PDFLoader } from "langchain/document_loaders/fs/pdf";

const loader = new PDFLoader("./data/MachineLearning-Lecture01.pdf");

const rawCS229Docs = await loader.load();
console.log(rawCS229Docs.slice(0, 5));

##>
[
  Document {
    pageContent: "MachineLearning-Lecture01  \n" +
      "Instructor (Andrew Ng): Okay. Good morning. Welcome to CS229, the machin"... 2999 more characters,
    metadata: {
      source: "./data/MachineLearning-Lecture01.pdf",
      pdf: {
        version: "1.10.100",
        info: {
          PDFFormatVersion: "1.4",
          IsAcroFormPresent: false,
          IsXFAPresent: false,
          Title: "",
          Author: "",
          Creator: "PScript5.dll Version 5.2.2",
          Producer: "Acrobat Distiller 8.1.0 (Windows)",
          CreationDate: "D:20080711112523-07'00'",
          ModDate: "D:20080711112523-07'00'"
        },
        metadata: Metadata { _metadata: [Object: null prototype] },
        totalPages: 22
      },
      loc: { pageNumber: 1 }
    }
  },
  ...

Splitting Documents for Contextual Clarity

To ensure that the generated content is coherent and contextually relevant within JavaScript applications, it’s crucial to split documents sensibly. LangChainJS offers different strategies for splitting data, specifically designed for JavaScript content.

For example, when dealing with JavaScript code snippets from GitHub repositories, LangChainJS allows you to split documents at code-specific delimiters, ensuring that related code remains together. This approach makes it easier for LLMs to understand and generate code snippets accurately in a JavaScript context.

In contrast, for textual content in JavaScript applications, you can use a recursive character text splitter provided by LangChainJS. This splitter intelligently splits text into paragraphs, maintaining the flow of ideas and helping LLMs generate coherent text within the JavaScript environment.

import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";

const splitter = RecursiveCharacterTextSplitter.fromLanguage("js", {
  chunkSize: 32,
  chunkOverlap: 0,
});
const code = `function helloWorld() {
console.log("Hello, World!");
}
// Call the function
helloWorld();`;

await splitter.splitText(code);

##> [
  "function helloWorld() {",
  'console.log("Hello, World!");\n}',
  "// Call the function",
  "helloWorld();"
]


import { CharacterTextSplitter } from "langchain/text_splitter";

const splitter = new CharacterTextSplitter({
  chunkSize: 32,
  chunkOverlap: 0,
  separator: " "
});

await splitter.splitText(code);

##> [
  "function helloWorld()",
  '{\nconsole.log("Hello,',
  'World!");\n}\n// Call the',
  "function\nhelloWorld();"
]

const splitter = RecursiveCharacterTextSplitter.fromLanguage("js", {
  chunkSize: 64,
  chunkOverlap: 32,
});

await splitter.splitText(code);

##> [
  'function helloWorld() {\nconsole.log("Hello, World!");\n}',
  'console.log("Hello, World!");\n}\n// Call the function',
  "}\n// Call the function\nhelloWorld();"
]

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 512,
  chunkOverlap: 64,
});
const splitDocs = await splitter.splitDocuments(rawCS229Docs);
console.log(splitDocs.slice(0, 5));

##> [
  Document {
    pageContent: "MachineLearning-Lecture01  \n" +
      "Instructor (Andrew Ng): Okay. Good morning. Welcome to CS229, the machin"... 352 more characters,
    metadata: {
      source: "./data/MachineLearning-Lecture01.pdf",
      pdf: {
        version: "1.10.100",
        info: {
          PDFFormatVersion: "1.4",
          IsAcroFormPresent: false,
          IsXFAPresent: false,
          Title: "",
          Author: "",
          Creator: "PScript5.dll Version 5.2.2",
          Producer: "Acrobat Distiller 8.1.0 (Windows)",
          CreationDate: "D:20080711112523-07'00'",
          ModDate: "D:20080711112523-07'00'"
        },
        metadata: Metadata { _metadata: [Object: null prototype] },
        totalPages: 22
      },
      loc: { pageNumber: 1, lines: { from: 1, to: 6 } }
    }
  },
  ...

Now, we move on to the next critical step: embedding these chunks and storing them in a vector database for efficient retrieval based on user queries.

Leveraging Text Embedding Models

To enable the retrieval of relevant chunks from your vector database, you’ll need a text embedding model. We’re using OpenAI’s hosted model, but you can easily replace it with an embedding provider of your choice. The vector database is a specialized database equipped with natural language search capabilities.

When a user submits a query, the system searches the vector database for an embedding similar to the query. This search results in a set of relevant chunks that the LLM uses to generate its final output.

To demonstrate how this works, we start with the first step: document ingestion. This process involves using the embeddings model, a specialized machine learning model, to convert document contents into vectors. We use OpenAI’s hosted embeddings for this demonstration.

Document Ingestion with LangChainJS

In LangChainJS, you’ll import the embeddings model and instantiate it. It’s essential to set up your environment variables to obtain the required authentication key for the hosted embeddings. For this demo, we use an in-memory vector store, which is not suitable for production but serves our illustrative purposes well.

The result of embedding a query is a vector, represented as an array of numbers. These numbers capture various abstract features of the embedded text, enabling efficient searching within the vector database.

import { OpenAIEmbeddings } from "@langchain/openai";

const embeddings = new OpenAIEmbeddings();

await embeddings.embedQuery("This is some sample text");

##> [
   -0.010464199,  0.0023175413,   -0.000743401,  -0.010936564,  -0.011474536,
     0.02293595,  -0.014735167,   0.0017451267,   -0.01760872,   -0.01938009,
   0.0051369704,    0.03411526,   -0.012274932,  0.0019403052,  0.0046383627,
     0.01308845,    0.02470732,   0.0018664983,  0.0044677868,  -0.006327724,
     ...

Searching for Relevant Chunks

To illustrate the search process, we calculate the similarity between different embeddings using Cosine Similarity. We compare a vector generated from a query to two different vectors: one related and one unrelated. The similarity score helps us determine the relevance of the content.

Once we have prepared our documents by splitting them into smaller chunks, we initialize the vector store and add these chunks to it. This action populates a searchable vector store.

LangChainJS provides a convenient interface for searching directly with natural language queries. We use the similarity search method with a query like “What is Deeplearning?” and retrieve four related documents from our vector store. These documents contain content related to Deep learning, machine learning, and learning algorithms.

import { similarity } from "ml-distance";

const vector1 = await embeddings.embedQuery(
    "What are vectors useful for in machine learning?"
);
const unrelatedVector = await embeddings.embedQuery(
    "A group of parrots is called a pandemonium."
);

similarity.cosine(vector1, unrelatedVector);
## > 0.6962143564715657


const similarVector = await embeddings.embedQuery(
    "Vectors are representations of information."
);

similarity.cosine(vector1, similarVector);
## > 0.8590074042429146

The Power of Retrievers

Now, let’s explore retrievers. Retrievers are a broader abstraction within LangChainJS that retrieve documents related to a given natural language query. Unlike vector stores, retrievers implement the invoke method and are expression language runnables. This means they can be seamlessly integrated into chains with other modules, such as LLMs, after parsers, and prompts.

We can easily instantiate a retriever from an existing vector store with a simple function call, making it a versatile tool for constructing retrieval chains.

// Peer dependency
import * as parse from "pdf-parse";
import { PDFLoader } from "langchain/document_loaders/fs/pdf";
import { 
    RecursiveCharacterTextSplitter
} from "langchain/text_splitter";
import { MemoryVectorStore } from "langchain/vectorstores/memory";

const loader = new PDFLoader("./data/MachineLearning-Lecture01.pdf");

const rawCS229Docs = await loader.load();

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 128,
  chunkOverlap: 0,
});

const splitDocs = await splitter.splitDocuments(rawCS229Docs);

const vectorstore = new MemoryVectorStore(embeddings);

await vectorstore.addDocuments(splitDocs);

const retrievedDocs = await vectorstore.similaritySearch(
    "What is deep learning?", 
    4
);

const pageContents = retrievedDocs.map(doc => doc.pageContent);

pageContents

##> [
  "piece of research in machine learning, okay?",
  "are using a learning algorithm, perhaps without even being aware of it.",
  "some of my own excitement about machine learning to you.",
  "of the class, and then we'll start to talk a bit about machine learning."
]

const retriever = vectorstore.asRetriever();
await retriever.invoke("What is deep learning?")

##> [
  Document {
    pageContent: "piece of research in machine learning, okay?",
    metadata: {
      source: "./data/MachineLearning-Lecture01.pdf",
      pdf: {
        version: "1.10.100",
        info: {
          PDFFormatVersion: "1.4",
          IsAcroFormPresent: false,
          IsXFAPresent: false,
          Title: "",
          Author: "",
          Creator: "PScript5.dll Version 5.2.2",
          Producer: "Acrobat Distiller 8.1.0 (Windows)",
          CreationDate: "D:20080711112523-07'00'",
          ModDate: "D:20080711112523-07'00'"
        },
        metadata: Metadata { _metadata: [Object: null prototype] },
        totalPages: 22
      },
      loc: { pageNumber: 8, lines: { from: 2, to: 2 } }
    }
  },
  ...

Constructing the Retrieval Chain

Now, it’s time to build our retrieval chain, which will consist of several steps:

1. Document Retrieval Wrapper

We’ll create a step called for Document Retrieval in a Chain. While retrievers take string inputs, we prefer our chains to work with objects for flexibility. This step will accept an object with a “question” field as input and format the resulting document’s page content as strings.

import { RunnableSequence } from "@langchain/core/runnables";
import { Document } from "@langchain/core/documents";

const convertDocsToString = (documents: Document[]): string => {
  return documents.map((document) => {
    return `<doc>\n${document.pageContent}\n</doc>`
  }).join("\n");
};

/*
{
question: "What is deep learning?"
}
*/

const documentRetrievalChain = RunnableSequence.from([
    (input) => input.question,
    retriever,
    convertDocsToString
]);

const results = await documentRetrievalChain.invoke({
  question: "What are the prerequisites for this course?"
});
console.log(results);

##> <doc>
course information handout. So let me just say a few words about parts of these. On the 
third page, there's a section that says Online Resources.  
Oh, okay. Louder? Actually, could you turn up the volume? Testing. Is this better? 
Testing, testing. Okay, cool. Thanks.
</doc>
<doc>
of this class will not be very programming intensive, although we will do some 
programming, mostly in either MATLAB or Octave. I'll say a bit more about that later.  
I also assume familiarity with basic probability and statistics. So most undergraduate 
statistics class, like Stat 116 taught here at Stanford, will be more than enough. I'm gonna 
assume all of you know what random variables are, that all of you know what expectation 
is, what a variance or a random variable is. And in case of some of you, it's been a while 
since you've seen some of this material. At some of the discussion sections, we'll actually 
go over some of the prerequisites, sort of as a refresher course under prerequisite class.
...

2. Input and Output Coercion

To bridge the gap between retrievers’ string input and chains’ object input, we’ll use a runnable map. This map will handle the conversion of inputs and outputs, allowing us to seamlessly pass the question through the chain.

import { ChatPromptTemplate } from "@langchain/core/prompts";

const TEMPLATE_STRING = `You are an experienced researcher, 
expert at interpreting and answering questions based on provided sources.
Using the provided context, answer the user's question 
to the best of your ability using only the resources provided. 
Be verbose!

<context>

{context}

</context>

Now, answer this question using the above context:

{question}`;

const answerGenerationPrompt = ChatPromptTemplate.fromTemplate(
    TEMPLATE_STRING
);

import { RunnableMap } from "@langchain/core/runnables";

const runnableMap = RunnableMap.from({
  context: documentRetrievalChain,
  question: (input) => input.question,
});

await runnableMap.invoke({
    question: "What are the prerequisites for this course?"
})

##> {
  question: "What are the prerequisites for this course?",
  context: "<doc>\n" +
    "course information handout. So let me just say a few words about parts of these. On the \n" +
    "third"... 3063 more characters
}

3. Augmented Generation

In this step, we’ll use a chat prompt template to generate human-readable responses. We’ll define a prompt and wrap it in a chat prompt template. To ensure the question is passed through the chain, we’ll use the runnable map we created earlier.

import { ChatOpenAI } from "@langchain/openai";
import { StringOutputParser } from "@langchain/core/output_parsers";

const model = new ChatOpenAI({
    modelName: "gpt-3.5-turbo-1106"
});

const retrievalChain = RunnableSequence.from([
  {
    context: documentRetrievalChain,
    question: (input) => input.question,
  },
  answerGenerationPrompt,
  model,
  new StringOutputParser(),
]);

const answer = await retrievalChain.invoke({
  question: "What are the prerequisites for this course?"
});

console.log(answer);

##> The prerequisites for this course include familiarity with basic probability and statistics, as well as basic linear algebra. The instructor assumes that most undergraduate statistics and linear algebra classes would provide sufficient background knowledge. Specifically, students are expected to already have an understanding of random variables, expectation...

Handling Follow-up Questions

Now that we have our retrieval chain in place, let’s see how it performs with a follow-up question. We’ll ask, “Can you list them referring to the prerequisites of the course in bullet point form?” However, we encounter a limitation. The response states that the information doesn’t specify a specific list in bullet point form. This happens because LLMs lack inherent memory, and without chat history context, they struggle to understand references like “them.”

To address this issue and enhance the model’s ability to handle follow-up questions effectively, you can modify the prompt to consider chat history. However, another challenge arises. The vector store needs to return relevant documents based on the reference in the follow-up query. Currently, the vector store has no knowledge of what “them” refers to, creating a gap in context.

const followupAnswer = await retrievalChain.invoke({
  question: "Can you list them in bullet point form?"
});

console.log(followupAnswer);

##> - No specific list or items were mentioned in the provided context. Therefore, I cannot list them in bullet point form.

const docs = await documentRetrievalChain.invoke({
  question: "Can you list them in bullet point form?"
});

console.log(docs);

##> <doc>
course information handout. So let me just say a few words about parts of these. On the 
third page, there's a section that says Online Resources.  
Oh, okay. Louder? Actually, could you turn up the volume? Testing. Is this better? 
Testing, testing. Okay, cool. Thanks.
</doc>
<doc>
into four major sections. We're gonna talk about four major topics in this class, the first
...

Addressing Conversation History

The problem arises when we ask follow-up questions that reference past information. For instance, if we previously asked for prerequisites and later ask, “Can you list them in bullet point form?” the model can’t understand what “them” refers to. To address this, we need to rephrase the follow-up question to make it standalone and reference-free.

Here’s a high-level overview of the solution:

1. Save Chat History: Each time we pass through the chain, we’ll store the user’s question as a human message and the LLM’s response as an AI message. Later, we’ll provide this chat history as additional context in a prompt for the LLM.

2. Rephrase the Question: We’ll use an LLM to rephrase user input that may contain references to past chat history into a standalone question. This standalone question can be understood by both the vector store and the LLM.

####> Let's split and load our documents into a vector store and create a retriever. Then we will convert its output to a string.

import "dote
import "dotenv/config";
import { loadAndSplitChunks } from "./lib/helpers.ts";

const splitDocs = await loadAndSplitChunks({
    chunkSize: 1536,
    chunkOverlap: 128
});
import { initializeVectorstoreWithDocuments } from "./lib/helpers.ts";

const vectorstore = await initializeVectorstoreWithDocuments({
  documents: splitDocs,
});
const retriever = vectorstore.asRetriever();
import { RunnableSequence } from "@langchain/core/runnables";
import { Document } from "@langchain/core/documents";

const convertDocsToString = (documents: Document[]): string => {
  return documents.map((document) => {
    return `<doc>\n${document.pageContent}\n</doc>`
  }).join("\n");
};

const documentRetrievalChain = RunnableSequence.from([
    (input) => input.question,
    retriever,
    convertDocsToString
]);


####> now that we have a retriever, lets build a retriever chain.

import { ChatPromptTemplate } from "@langchain/core/prompts";

const TEMPLATE_STRING = `You are an experienced researcher, 
expert at interpreting and answering questions based on provided sources.
Using the provided context, answer the user's question 
to the best of your ability using only the resources provided. 
Be verbose!

<context>

{context}

</context>

Now, answer this question using the above context:

{question}`;

const answerGenerationPrompt = ChatPromptTemplate.fromTemplate(
    TEMPLATE_STRING
);
import { ChatOpenAI } from "@langchain/openai";
import { StringOutputParser } from "@langchain/core/output_parsers";

const model = new ChatOpenAI({
    modelName: "gpt-3.5-turbo-1106"
});
const retrievalChain = RunnableSequence.from([
  {
    context: documentRetrievalChain,
    question: (input) => input.question,
  },
  answerGenerationPrompt,
  model,
  new StringOutputParser(),
]);

####> Adding history

import { MessagesPlaceholder } from "@langchain/core/prompts";

const REPHRASE_QUESTION_SYSTEM_TEMPLATE = 
  `Given the following conversation and a follow up question, 
rephrase the follow up question to be a standalone question.`;

const rephraseQuestionChainPrompt = ChatPromptTemplate.fromMessages([
  ["system", REPHRASE_QUESTION_SYSTEM_TEMPLATE],
  new MessagesPlaceholder("history"),
  [
    "human", 
    "Rephrase the following question as a standalone question:\n{question}"
  ],
]);
const rephraseQuestionChain = RunnableSequence.from([
      rephraseQuestionChainPrompt,
      new ChatOpenAI({ temperature: 0.1, modelName: "gpt-3.5-turbo-1106" }),
      new StringOutputParser(),
])
import { HumanMessage, AIMessage } from "@langchain/core/messages";

const originalQuestion = "What are the prerequisites for this course?";

const originalAnswer = await retrievalChain.invoke({
  question: originalQuestion
});

console.log(originalAnswer);
const chatHistory = [
  new HumanMessage(originalQuestion),
  new AIMessage(originalAnswer),
];

await rephraseQuestionChain.invoke({
  question: "Can you list them in bullet point form?",
  history: chatHistory,
});

####> Putting it all together

const convertDocsToString = (documents: Document[]): string => {
  return documents.map((document) => `<doc>\n${document.pageContent}\n</doc>`).join("\n");
};

const documentRetrievalChain = RunnableSequence.from([
  (input) => input.standalone_question,
  retriever,
  convertDocsToString,
]);
const ANSWER_CHAIN_SYSTEM_TEMPLATE = `You are an experienced researcher, 
expert at interpreting and answering questions based on provided sources.
Using the below provided context and chat history, 
answer the user's question to the best of 
your ability 
using only the resources provided. Be verbose!

<context>
{context}
</context>`;

const answerGenerationChainPrompt = ChatPromptTemplate.fromMessages([
  ["system", ANSWER_CHAIN_SYSTEM_TEMPLATE],
  new MessagesPlaceholder("history"),
  [
    "human", 
    "Now, answer this question using the previous context and chat history:\n{standalone_question}"
  ]
]);
import { HumanMessage, AIMessage } from "@langchain/core/messages";
await answerGenerationChainPrompt.formatMessages({
  context: "fake retrieved content",
  standalone_question: "Why is the sky blue?",
  history: [
    new HumanMessage("How are you?"),
    new AIMessage("Fine, thank you!")
  ]
});
import { RunnablePassthrough } from "@langchain/core/runnables";

const conversationalRetrievalChain = RunnableSequence.from([
  RunnablePassthrough.assign({
    standalone_question: rephraseQuestionChain,
  }),
  RunnablePassthrough.assign({
    context: documentRetrievalChain,
  }),
  answerGenerationChainPrompt,
  new ChatOpenAI({ modelName: "gpt-3.5-turbo" }),
  new StringOutputParser(),
]);
import { RunnableWithMessageHistory } from "@langchain/core/runnables";
import { ChatMessageHistory } from "langchain/stores/message/in_memory";
const messageHistory = new ChatMessageHistory();

const finalRetrievalChain = new RunnableWithMessageHistory({
  runnable: conversationalRetrievalChain,
  getMessageHistory: (_sessionId) => messageHistory,
  historyMessagesKey: "history",
  inputMessagesKey: "question",
});
const originalQuestion = "What are the prerequisites for this course?";

const originalAnswer = await finalRetrievalChain.invoke({
  question: originalQuestion,
}, {
  configurable: { sessionId: "test" }
});

const finalResult = await finalRetrievalChain.invoke({
  question: "Can you list them in bullet point form?",
}, {
  configurable: { sessionId: "test" }
});

console.log(finalResult);

##> - Familiarity with basic probability and statistics
   - Knowledge of random variables, expectation, variance
   - Basic understanding of probability theory
   - Recommended undergraduate statistics class such as Stat 116
- Familiarity with basic linear algebra
   - Understanding of matrices, vectors, matrix multiplication
   - Knowledge of matrix inversion
   - Familiarity with eigenvectors (beneficial but not required)
   - Recommended undergraduate linear algebra courses such as Math 51, 103, Math 113, or CS205

Conclusion

Retrieval Augmented Generation (RAG) is a powerful application of Large Language Models, especially when working with JavaScript applications using LangChainJS. By effectively loading documents from various sources and splitting them sensibly, you can harness the full potential of LLMs to generate contextually relevant and coherent text.

Note: This post specifically focuses on LangChainJS, the JavaScript version of LangChain. For information on the Python version of LangChain, please refer to our separate blog post dedicated to LangChain for Python applications here https://medium.com/alameda-dev/extract-useful-information-from-your-data-or-content-with-ai-rag-framework-a2310717b4fc

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

AlamedaDev provides full-service end-to-end software. Experts in modern software development and AI solutions.

From #barcelona

Website: www.alamedadev.com
Alameda-AI: www.alameda.dev