ChatGPT Integration 101: Custom Data Queries in Your AI App — Part 2
In part 1, we talked about the use cases of LLMs, created the basic structure of our project, and laid out the blueprint of our main.py
file.
In this part, we will use LangChain to play around with simple use cases such as getting answers to a question from a PDF document.
To get started, we need to install LangChain. Activate the virtual environment and then install it.
pip install langchain
We will be writing the code for the llm functionality inllm_processing.py
.
For this demo, since we will be interacting with OpenAI models, we need a third party wrapper library called langchain-openai
to use ChatGPT via langchain.
Install langchain-openai
.
pip install langchain-openai
To interact with the OpenAI API, we require an API key. It can be generated from here. We can start building the llm part of our app after your API key has been created.
In llm_processing.py
:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(openai_api_key="your_key" # add your API key here.
llm.invoke("How many carbon emissions were emitted in 2022?")
A few pointers:
ChatOpenAI
refers to the ChatGPT model.- The
invoke
method is used to ask a question to the model. - Currently, the model will answer incorrectly since it has been trained on data till Jan 2022. To get an accurate answer, we will need to feed it custom data source.
We have 2 options to get around this issue.
- We can provide the text which might contain the answer directly to the model.
- We can use embeddings to find the most similar match of a text and use it to answer the question.
Let’s explore both the options one by one.
-
Option 1: Provide the text directly to the model:
In llm_processing.py,
:
from langchain_openai import ChatOpenAI
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.documents import Document
llm = ChatOpenAI(openai_api_key="your_key" # add your API key here.
prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:
<context>
{context}
</context>
Question: {input}""")
document_chain = create_stuff_documents_chain(llm, prompt)
document_chain.invoke({
"input": "How many carbon emissions were emitted in 2022?",
"context": [Document(page_content="37.15 billion metric tons CO2e were emitted in 2022")]
})
A couple of pointers here:
prompt
is used as a template for passing into the llm.context
andinput
are variables which can be passed into the template whendocument_chain.invoke()
method is called.create_stuff_documents_chain
helps in using the prompt when asking a question to the llm.document_chain.invoke()
method sends a request to the OpenAI API to get a response based on the question asked and the context provided.context
is the information that is needed by the model to accurately answer questions which are out of its training data or questions which are specific to a use case.
The API will give the correct response now since we have provided the information related to the answer of the question.
In the next part, we will explore how we can use an entire PDF instead of spoon feeding the answer to the llm to answer our question.