RAG and Few-Shot Prompting in Langchain : Implementation

Published in

The Deep Hub

8 min readApr 3, 2024

Langchain is an innovative open-source orchestration framework for developing applications harnessing the power of Large Language Models(LLM). Langchain’s core mission is to shift control from LLMs to controlled prompting for more predictable and tailored interaction. Langchain adopts a module approach, significantly minimizing code rewriting. Through this architecture, developers can seamlessly integrate various modules across different LLMs, enhancing reusability and scalability. Langchain also does the heavy lifting by providing LangChain Templates which are deployable reference architecture for a wide variety of tasks like RAG Chatbot, OpenAI Functions Agent, etc. Langchain provides tools and resources for the entire application lifecycle(Develop, Production, and Deploy).

At the heart of Langchain’s functionality lies the LangChain Expression Language(LCEL), simply put, can be written as “prompt+LLM”. In this article, I delve into a practical demonstration of LangChain’s capabilities. We will explore the development of a conversational chatbot with the Retrieval Augmented Generation(RAG) model, showcasing the efficacy of Few-shot prompting techniques.

Retrieval Augmented Generation (RAG)

Traditionally, the output of LLM has only relied on the prompt and the training data LLM inherent to the model. However, this approach posed limitations, particularly when dealing with large datasets that exceed token length constraints. To address this challenge, the Retrieval Augmented Generation (RAG) methodology intervenes by enriching LLMs with external data sources.

Consider a scenario where a comprehensive 10-page PDF document contains contextual information exceeding token length limits. To leverage this data effectively, we break down the document into smaller text chunks and convert them into embeddings — vector representations of words. These embeddings are then stored in a vector database. When a prompt is sent to the LLM, it queries this vector database to retrieve relevant contextual data, which is then appended to the prompt. This additional contextual information equips the LLM with a deeper understanding, enabling it to generate more relevant and coherent outputs.

Now, let’s delve into the implementation of RAG within the Langchain framework. In this example, we’ll develop a chatbot tailored for negotiating Software as a Service (SaaS) agreements — a task that often includes large documentation filled with policies and intricacies. To handle such complexities, we’ll integrate the RAG model with GPT-4, a powerful LLM capable of handling intricate negotiations.

Let us start with installing dependencies in Jupyter Notebook.

!pip install langchain-openai
!pip install langchain
!pip install langchain_community
!pip install langchain_core
!pip install gradio
!pip install unstructured
!pip install unstructured[pdf]
!pip install Chromadb
!pip install numpy==1.24.1py

Import the libraries
Additional: Langchain Libraries Packages distribution

langchain-core: Base abstraction and LangChain Expression Language
langchain-community: Third-party integrations.
langchain: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.

import os
from google.colab import userdata
from langchain_openai import ChatOpenAI
from langchain_community.document_loaders import UnstructuredFileLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.schema.runnable import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain.prompts import (
    ChatPromptTemplate,
    FewShotChatMessagePromptTemplate,
)
from operator import itemgetter

I keep my OpenAI API KEY in collab secrets for safekeeping. We will extract the key, which will facilitate calls to “gpt-4”. Initializing the ChatOpenAI object to use OpenAI models with the API key. Although it provides multiple choices, I have used “GPT-4” as LLM for this use case.

# OpenAI API Key from collab secret
OPENAI_API_KEY = userdata.get('openai_api')

# initializing the LLM
llm = ChatOpenAI(
    openai_api_key=OPENAI_API_KEY,
    model_name="gpt-4",
)

Now we will load the document on which we are gonna perform RAG. LangChain provides functions to load unstructured files with “UnstructuredFileLoader”, which supports many types of datatypes like docx, pdf, etc.

print("Loading Data")
loader = UnstructuredFileLoader('/content/SaaS Agreement 2.txt')
raw_doc = loader.load()

Once the document is loaded, we will segment the documents into smaller chunks, typically of size 300. Embedding the text follows this segmentation, with the flexibility to choose embeddings tailored to the use case. In this instance, we utilize OpenAI embeddings. Chroma, our vector database, facilitates the indexing and storage of these embeddings.

# segmenting the document into segments

text_splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=0)
texts = text_splitter.split_documents(raw_doc)

# Document Embedding with Chromadb

embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
docsearch = Chroma.from_documents(texts, embeddings)

By creating a connection to query Chroma indexing through a retriever, we enable the appending of contextual information to prompts for more accurate results. The retriever uses search methods provided by a vector store, like similarity search or MMR, to query the texts in the vector store. It allows the customization of search_kwargs like score_threshold, k (where we’ve set as 4), etc for enhanced retrieval precision. We’ve also developed a function to concatenate the retriever’s outputs for streamlined processing.


# Connection to query with Chroma indexing using a retriever
retriever = docsearch.as_retriever(
    search_type="similarity",
    search_kwargs={'k':4}
)

# Function to add all docs returned by retriever

def format_docs(docs):
  return "\n\n".join(doc.page_content for doc in docs)

Next, we’ll validate the retriever’s functionality by examining its output.

docs = retriever.get_relevant_documents("tell me about subscription term")
for d in docs:
  print(d.page_content)

Output

"""
Acumatica provides a subscription service to which you intend to subscribe. This Agreement

sets forth the terms pursuant to which you will be permitted access to the Service, and becomes

effective upon the earlier of your first use of the Service or execution of the Acumatica Order
“Subscriber Input” means suggestions, enhancement requests, recommendations or other

feedback provided by you, your Users and Authorized Parties relating to the operation or

functionality of the Service.

“Subscription Service Fees” means all amounts invoiced and payable by you for the Service.
(iii) as an institution of higher education, for use by staff and/or students, withoutSubscription

Service Fees, or (iv) when otherwise no Subscription Service Fees are charged for use of the

Service, then, subject to your compliance with the terms and conditions of this Agreement,
Schedule 1) for three (3) consecutive months. Upon any termination by you pursuant to this

section, Acumatica shall refund to you any prepaid Subscription Service Fees for the affected

Service that were to be provided after the effective date of termination."""

Few-Shot Prompting

Few-shot prompting is a simple yet powerful technique that helps in leveraging LLMs to perform specific tasks. The idea is to collect or make the desired output and feed it to LLM with the prompt to mimic the generation. Few-shot prompting will be more effective if few-shot prompts are concise and specific. In the code, we compile a list of dictionaries containing desired input-output pairs. Using ChatPromptTemplate, we construct the few-shot prompt, which can be seen in the output below.

few_shot_examples = [
{"input":"Could you please clarify the terms outlined in section 3.2 of the contract?",
"output":"Certainly, I will provide clarification on the terms in section 3.2."},
{"input":"We are interested in extending the payment deadline to 30 days instead of the current 15 days. Additionally, we would like to add a clause regarding late payment penalties.",
"output":"Our request is to extend the payment deadline to 30 days and include a clause on late payment penalties."},
{"input":"""The current indemnification clause seems too broad. We would like to narrow it down to cover only direct damages and exclude consequential damages.
Additionally, we propose including a dispute resolution clause specifying arbitration as the preferred method of resolving disputes.""",
"output":"""We suggest revising the indemnification clause to limit it to covering direct damages and excluding consequential damages.
Furthermore, we recommend adding a dispute resolution clause that specifies arbitration as the preferred method of resolving disputes."""},
{"input":"I believe the proposed changes are acceptable.",
"output":"Thank you for your feedback. I will proceed with implementing the proposed changes."}
]

few_shot_template = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}")
    ]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=few_shot_template,
    examples=few_shot_examples,
)

print(few_shot_prompt.format())

OUTPUT

# Human: 
# AI: Thank you for uploading the document. I am analyzing it now….
# Human: 
# AI: We would like to remove the auto-renewal for the Subscription term and just have a 1-year term. We also would like broader termination rights (e.g., due to SLA failures, privacy issues, and due to a decrease in service features or functionality). Can you also include the option to terminate the agreement for any reason as well as change the governing law to California?
# Human: The auto-renewal is there so we can create a long-standing business relationship and have reduced churn. 
#                 We would prefer to leave it in. We cannot give you the option to terminate for any reason and we would 
#                 like the governing law to be Texas because that is where we are headquartered.  We can give you broader 
#                 termination rights for our breach, SLA failures, privacy issues, and due to a decrease in service features or functionality.
# AI: Can we have a 60-day notice before the end of each term to remind us of the auto-renewal?
#                  We would like to add broader termination rights. We can agree to the governing law being Texas.
# Human: I can agree with those changes.
# AI: Great! Please make the changes to the contract and upload the new contract to this portal for me to review.

We have created our few-shot prompt and stored embeddings of the uploaded document in the vector database. Now we are gonna use LangChain Expression Language(LCEL) to create a chain that will act like a pipeline. To understand this, you should know about the pipe “ | ” operator in LCEL, which takes the output from the left and feeds it into the function on the right.

There are two important parts to creating a chain, the first being creating the prompt and the second being creating the chain itself. The prompt includes a system prompt, which basically instructs LLM its role, few-shot prompts developed before by us, that the user question will be provided at runtime, along with the context provided by the retriever at runtime. As you can see, prompts keep placeholders for the runtime variables, user questions, and context.

The final step is creating a chain, First, we make a dictionary of context and question. As you can see, “context” takes the question provided by the user into retriever, which feeds its output to the format_docs function to join the output. Then, context and questions are fed to our main prompts, which add more context with a few shot prompts and system roles. Then the main prompt is fed to LLM, and finally, LLM output is fed to StrOutputParser() to parse the output.

# Creating prompt and LCEL
negotiate_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a contract negotiation expert. Do your best to negotiate toward the target term and start with lower term then slowly increase with the supplier.  Reply up to three sentences as it is for chat purpose."),
    few_shot_prompt,
    ("user", "{question}"),
    ("user", "{context}")
])

# Langchain Expression Language to call our LLM using the prompt template above
# RAG chain
negotiate_chain = (
    {"context": itemgetter("question") | retriever | format_docs,
     "question": itemgetter("question")}
    | negotiate_prompt
    | llm
    | StrOutputParser()
    )

With the chain primed, we proceed to execute it, observing its functionality in action.

negotiate_chain.invoke({"question":"but I would like to add the term that SailPoint will be given the opportunity to cure the faulty or failing SaaS Services within 30 days of notice to SailPoint of the problem.  Also, we can do net 60."})

And here is the output

# We can agree to give SailPoint the opportunity to cure the faulty or 
# failing SaaS Services within 30 days of notice. However, we would like to 
# add that if the problem is not resolved within the specified timeframe, 
# we reserve the right to terminate the agreement. Regarding the payment 
# terms, we can do net 60, but we need to ensure that late payments are 
# subject to a maximum interest charge of 1.5% per month.

I trust this walkthrough has provided valuable insights. If you found it beneficial, please consider showing your support by clapping. Your encouragement fuels my motivation to produce more content.

You can follow me on Linkedin : https://www.linkedin.com/in/shivamsharma00/

RAG and Few-Shot Prompting in Langchain : Implementation

Retrieval Augmented Generation (RAG)

Written by Shivam Sharma