Harnessing the Power of AI and Python for Conversational PDF Interaction: A Deep Dive

4 min readJan 11, 2024

Introduction

In the rapidly evolving landscape of artificial intelligence and natural language processing, Streamlit has emerged as a formidable python library for developers. The article showcases an innovative application integrating various LLM’s libraries to create a conversational AI application for interacting with PDF documents. This article offers an in-depth examination of the code, highlighting best practices and the synergy between different technologies..

first Implementation of LLM

Key Libraries and Their Roles

import torch
import streamlit as st
from dotenv import load_dotenv
from PyPDF2 import PdfReader
import openai
from langchain.text_splitter import CharacterTextSplitter
# from langchain.embeddings import OpenAIEmbeddings, HuggingFaceInstructEmbeddings
from langchain_community.embeddings import HuggingFaceInstructEmbeddings
from langchain_community.embeddings.openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from langchain_community.chat_models import ChatOpenAI
from langchain_community.llms import HuggingFaceHub
from htmlTemplates import css, bot_template, user_template

PyTorch and Streamlit: PyTorch, a leading deep learning framework, is complemented by Streamlit, a powerful tool for building interactive web applications.
dotenv: A utility for managing environment variables, crucial for handling sensitive data like API keys.
PyPDF2: A versatile PDF library in Python, used for reading PDF files and extracting text.
OpenAI and LangChain Community Libraries: These libraries are instrumental in leveraging OpenAI’s language models and constructing conversational AI chains.
HTML Templates: For enhancing the UI with pre-defined CSS and HTML templates.

Core Functionalities Explained

Extracting Text from PDFs

The get_pdf_text function is a critical component. It iterates through each page of the provided PDF documents, extracting and concatenating the text content.

def get_pdf_text(pdf_docs):
    text = ""
    for pdf in pdf_docs:
        pdf_reader = PdfReader(pdf)

        for page in pdf_reader.pages:
            text += page.extract_text()
        
        return text

Text Chunking for Efficient Processing

get_text_chunks utilizes CharacterTextSplitter from LangChain to split the extracted text into manageable chunks. This approach optimizes the processing and ensures efficient memory usage.

def get_text_chunks(text):
    text_splitter = CharacterTextSplitter(
        separator="\n",
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len
    )
    )
    chunks = text_splitter.split_text(text)
    return chunks

Building a Vector Store for Text Retrieval

The get_vectorstore function illustrates an advanced use of embeddings and vector storage. It generates embeddings for text chunks using OpenAI's models and stores them in a FAISS vector store for rapid retrieval.

def get_vectorstore(text_chunks, api_key):
    embeddings = OpenAIEmbeddings(openai_api_key=api_key)
    # embeddings = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-xl")
    vectorstore = FAISS.from_texts(texts=text_chunks, embedding=embeddings)
    return vectorstore

Conversational AI Chain

get_conversation_chain assembles the conversational AI chain. It integrates a language model, a retriever mechanism (the vector store), and a memory buffer for dynamic conversation flow.

def get_conversation_chain(vectorstore, api_key):
    llm = ChatOpenAI(openai_api_key=api_key)
    # llm = HuggingFaceHub(repo_id="google/flan-t5-xxl", model_kwargs={"temperature":0.9, "max_length":512})
    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
    conversation_chain = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=vectorstore.as_retriever(),
        memory=memory
    )
    return conversation_chain

Inpute Handling Method

handle_userinput demonstrates how to process user input, manage conversation history, and render the chat interface using Streamlit and custom HTML templates.

def handle_userinput(user_question):
    response = st.session_state.conversation({'question': user_question})
    st.session_state.chat_history = response['chat_history']

    for i, message in enumerate(st.session_state.chat_history):
        if i % 2 == 0:
            st.write(user_template.replace("{{MSG}}", message.content), unsafe_allow_html=True)
        else:
            st.write(bot_template.replace("{{MSG}}", message.content), unsafe_allow_html=True)

Detailed Explanation of the Streamlit Application Implementation

Import and Environment Setup: The code begins by importing necessary libraries including streamlit (as st). Additionally, load_dotenv from the dotenv package is used to manage environment variables, particularly for securely handling API keys.

Streamlit Page Configuration

Page Setup: st.set_page_config is called to configure the page. This function sets the page's title to "Chat with custom PDFs" and assigns a book emoji as the page icon. This step is crucial for enhancing user experience and providing a clear context of the application's purpose.

User Interface Elements

API Key Input: A sidebar text input (st.sidebar.text_input) is created for users to enter their OpenAI API key. This is a secure way to handle sensitive information, ensuring the key is not exposed in the code.

Main Interface Components

CSS and Templates: The st.write method is used to inject custom CSS for styling the application. This enhances the visual appeal and user experience.

Chat Interface: The main interface includes a header (st.header) to title the section. A text input box (st.text_input) is provided for users to type their queries about the PDF documents.

Handling User Input and Document Processing

Initial State Setup: The code checks if certain keys ("conversation" and "chat_history") are not in st.session_state. If absent, they are initialized. This step is crucial for maintaining state in a Streamlit app, especially for keeping track of user interactions over time.

User Question Processing: When a user enters a question and submits it, the handle_userinput function is called. This function processes the user's query, updates the conversation history, and displays the chat history using Streamlit’s write method with custom HTML templates.

PDF Document Upload and Processing

Document Upload Interface: In the sidebar, a file uploader (st.file_uploader) is implemented, allowing users to upload multiple PDF files.

Processing Button: A button (st.button) is provided to trigger the processing of the uploaded PDFs.

PDF Processing: Upon clicking the ‘Process’ button, the application:

Extracts text from the uploaded PDFs using get_pdf_text.
Chunks the text for processing (get_text_chunks).
Creates a vector store from these chunks (get_vectorstore).
Initializes the conversational AI chain (get_conversation_chain).

def main():
    load_dotenv()
    st.set_page_config(page_title="Chat with custome PDFs", 
                       page_icon=":books:")
    
    st.write(css, unsafe_allow_html=True)

    # Input for OpenAI API key
    openai_api_key = st.sidebar.text_input("Enter your OpenAI API Key:", type="password")

    if openai_api_key:
        openai.api_key = openai_api_key  # Set the OpenAI API key

    
    st.write(css, unsafe_allow_html=True)

    if "conversation" not in st.session_state:
        st.session_state.conversation = None
    
    if "chat_history" not in st.session_state:
        st.session_state.chat_history = None

    st.header("Chat with custome PDFs :books:")
    user_question = st.text_input("Ask question about your document:")
    if user_question:
        handle_userinput(user_question)


    with st.sidebar:
        st.subheader("Your documents")
        pdf_docs = st.file_uploader(
            "Upload PDFs here and click 'process'", accept_multiple_files=True)
        
        if st.button("Process"):
            with st.spinner("Processing"):
                
                # Get pdf text
                raw_text = get_pdf_text(pdf_docs)


                # Get the text chunks
                text_chunks = get_text_chunks(raw_text)

                # Create the vector store
                vectorstore = get_vectorstore(text_chunks, openai_api_key)

                # create conversation chain
                st.session_state.conversation = get_conversation_chain(vectorstore, openai_api_key)


if __name__ == "__main__":
    main()

Conclusion

The provided article represents a sophisticated integration of Large Language Model implementation. It demonstrates how developers can leverage these tools to create interactive, intelligent applications. As AI continues to advance, such implementations will become increasingly vital in bridging the gap between complex data and user-friendly interfaces.

https://github.com/enendufrankc/llm-project-1/blob/master/main.py