Harnessing the Power of AI and Python for Conversational PDF Interaction: A Deep Dive
Introduction
In the rapidly evolving landscape of artificial intelligence and natural language processing, Streamlit has emerged as a formidable python library for developers. The article showcases an innovative application integrating various LLM’s libraries to create a conversational AI application for interacting with PDF documents. This article offers an in-depth examination of the code, highlighting best practices and the synergy between different technologies..
Key Libraries and Their Roles
import torch
import streamlit as st
from dotenv import load_dotenv
from PyPDF2 import PdfReader
import openai
from langchain.text_splitter import CharacterTextSplitter
# from langchain.embeddings import OpenAIEmbeddings, HuggingFaceInstructEmbeddings
from langchain_community.embeddings import HuggingFaceInstructEmbeddings
from langchain_community.embeddings.openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from langchain_community.chat_models import ChatOpenAI
from langchain_community.llms import HuggingFaceHub
from htmlTemplates import css, bot_template, user_template
- PyTorch and Streamlit: PyTorch, a leading deep learning framework, is complemented by Streamlit, a powerful tool for building interactive web applications.
- dotenv: A utility for managing environment variables, crucial for handling sensitive data like API keys.
- PyPDF2: A versatile PDF library in Python, used for reading PDF files and extracting text.
- OpenAI and LangChain Community Libraries: These libraries are instrumental in leveraging OpenAI’s language models and constructing conversational AI chains.
- HTML Templates: For enhancing the UI with pre-defined CSS and HTML templates.
Core Functionalities Explained
Extracting Text from PDFs
The get_pdf_text
function is a critical component. It iterates through each page of the provided PDF documents, extracting and concatenating the text content.
def get_pdf_text(pdf_docs):
text = ""
for pdf in pdf_docs:
pdf_reader = PdfReader(pdf)
for page in pdf_reader.pages:
text += page.extract_text()
return text
Text Chunking for Efficient Processing
get_text_chunks
utilizes CharacterTextSplitter
from LangChain to split the extracted text into manageable chunks. This approach optimizes the processing and ensures efficient memory usage.
def get_text_chunks(text):
text_splitter = CharacterTextSplitter(
separator="\n",
chunk_size=1000,
chunk_overlap=200,
length_function=len
)
)
chunks = text_splitter.split_text(text)
return chunks
Building a Vector Store for Text Retrieval
The get_vectorstore
function illustrates an advanced use of embeddings and vector storage. It generates embeddings for text chunks using OpenAI's models and stores them in a FAISS vector store for rapid retrieval.
def get_vectorstore(text_chunks, api_key):
embeddings = OpenAIEmbeddings(openai_api_key=api_key)
# embeddings = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-xl")
vectorstore = FAISS.from_texts(texts=text_chunks, embedding=embeddings)
return vectorstore
Conversational AI Chain
get_conversation_chain
assembles the conversational AI chain. It integrates a language model, a retriever mechanism (the vector store), and a memory buffer for dynamic conversation flow.
def get_conversation_chain(vectorstore, api_key):
llm = ChatOpenAI(openai_api_key=api_key)
# llm = HuggingFaceHub(repo_id="google/flan-t5-xxl", model_kwargs={"temperature":0.9, "max_length":512})
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
conversation_chain = ConversationalRetrievalChain.from_llm(
llm=llm,
retriever=vectorstore.as_retriever(),
memory=memory
)
return conversation_chain
Inpute Handling Method
handle_userinput
demonstrates how to process user input, manage conversation history, and render the chat interface using Streamlit and custom HTML templates.
def handle_userinput(user_question):
response = st.session_state.conversation({'question': user_question})
st.session_state.chat_history = response['chat_history']
for i, message in enumerate(st.session_state.chat_history):
if i % 2 == 0:
st.write(user_template.replace("{{MSG}}", message.content), unsafe_allow_html=True)
else:
st.write(bot_template.replace("{{MSG}}", message.content), unsafe_allow_html=True)
Detailed Explanation of the Streamlit Application Implementation
Import and Environment Setup: The code begins by importing necessary libraries including streamlit
(as st
). Additionally, load_dotenv
from the dotenv
package is used to manage environment variables, particularly for securely handling API keys.
Streamlit Page Configuration
Page Setup: st.set_page_config
is called to configure the page. This function sets the page's title to "Chat with custom PDFs" and assigns a book emoji as the page icon. This step is crucial for enhancing user experience and providing a clear context of the application's purpose.
User Interface Elements
API Key Input: A sidebar text input (st.sidebar.text_input
) is created for users to enter their OpenAI API key. This is a secure way to handle sensitive information, ensuring the key is not exposed in the code.
Main Interface Components
CSS and Templates: The st.write
method is used to inject custom CSS for styling the application. This enhances the visual appeal and user experience.
Chat Interface: The main interface includes a header (st.header
) to title the section. A text input box (st.text_input
) is provided for users to type their queries about the PDF documents.
Handling User Input and Document Processing
Initial State Setup: The code checks if certain keys ("conversation"
and "chat_history"
) are not in st.session_state
. If absent, they are initialized. This step is crucial for maintaining state in a Streamlit app, especially for keeping track of user interactions over time.
User Question Processing: When a user enters a question and submits it, the handle_userinput
function is called. This function processes the user's query, updates the conversation history, and displays the chat history using Streamlit’s write
method with custom HTML templates.
PDF Document Upload and Processing
Document Upload Interface: In the sidebar, a file uploader (st.file_uploader
) is implemented, allowing users to upload multiple PDF files.
Processing Button: A button (st.button
) is provided to trigger the processing of the uploaded PDFs.
PDF Processing: Upon clicking the ‘Process’ button, the application:
- Extracts text from the uploaded PDFs using
get_pdf_text
. - Chunks the text for processing (
get_text_chunks
). - Creates a vector store from these chunks (
get_vectorstore
). - Initializes the conversational AI chain (
get_conversation_chain
).
def main():
load_dotenv()
st.set_page_config(page_title="Chat with custome PDFs",
page_icon=":books:")
st.write(css, unsafe_allow_html=True)
# Input for OpenAI API key
openai_api_key = st.sidebar.text_input("Enter your OpenAI API Key:", type="password")
if openai_api_key:
openai.api_key = openai_api_key # Set the OpenAI API key
st.write(css, unsafe_allow_html=True)
if "conversation" not in st.session_state:
st.session_state.conversation = None
if "chat_history" not in st.session_state:
st.session_state.chat_history = None
st.header("Chat with custome PDFs :books:")
user_question = st.text_input("Ask question about your document:")
if user_question:
handle_userinput(user_question)
with st.sidebar:
st.subheader("Your documents")
pdf_docs = st.file_uploader(
"Upload PDFs here and click 'process'", accept_multiple_files=True)
if st.button("Process"):
with st.spinner("Processing"):
# Get pdf text
raw_text = get_pdf_text(pdf_docs)
# Get the text chunks
text_chunks = get_text_chunks(raw_text)
# Create the vector store
vectorstore = get_vectorstore(text_chunks, openai_api_key)
# create conversation chain
st.session_state.conversation = get_conversation_chain(vectorstore, openai_api_key)
if __name__ == "__main__":
main()
Conclusion
The provided article represents a sophisticated integration of Large Language Model implementation. It demonstrates how developers can leverage these tools to create interactive, intelligent applications. As AI continues to advance, such implementations will become increasingly vital in bridging the gap between complex data and user-friendly interfaces.
https://github.com/enendufrankc/llm-project-1/blob/master/main.py