Unlocking the Power of Intelligence: Building an Application with Gemini, Python, and FAISS for PDF-Driven Multi-Input Processing

5 min readJan 19, 2024

In this blog, I delve into the world of advanced AI integration, showcasing a step-by-step guide on developing an application that efficiently processes multiple inputs from PDF files. Leveraging the cutting-edge Gemini AI model from Google and the versatility of Python, we explore the seamless integration of these technologies to transform data from PDFs into a robust vector database.

Discover how our application harnesses the capabilities of FAISS (Facebook AI Similarity Search) to organize and store vectors, enabling lightning-fast search and retrieval of information. Learn about the intricacies of building a powerful question-answering system that taps into the stored vectors, providing accurate and context-aware responses.

Whether you’re a seasoned developer or just stepping into the world of AI applications, this blog offers valuable insights, code snippets, and practical tips to guide you through the process. Elevate your application’s intelligence quotient and stay at the forefront of innovation with this exploration into the fusion of Gemini, Python, and FAISS.

Step 1: Getting API key from website https://makersuite.google.com/app/apikey

Step 2: Install requirements using

pip3 install streamlit google-generativeai python-dotenv langchain PyPDF2 chromadb faiss-cpu langchain_google_genai langchain-community

About requirements used:

streamlit : Streamlit is a Python library designed for creating web applications with minimal effort. It simplifies the process of turning data scripts into shareable web apps by providing an intuitive and easy-to-use interface. With Streamlit, developers can quickly build interactive and visually appealing applications for data analysis, visualization, and machine learning, making it an excellent tool for prototyping and deploying projects with speed and simplicity.
google-generativeai: The Google AI Python SDK enables developers to use Google’s state-of-the-art generative AI models (like Gemini and PaLM) to build AI-powered features and applications. This SDK supports use cases like:

Generate text from text-only input
Generate text from text-and-images input (multimodal) (for Gemini only)
Build multi-turn conversations (chat)
Embedding

3. python-dotenv: python-dotenv is a Python library that simplifies the management of environment variables in your Python projects. It allows you to store configuration settings, API keys, and other sensitive information in a separate file called .env and then access those variables in your Python code.

4. langchain: LangChain is a framework designed to simplify the creation of applications using large language models. As a language model integration framework, LangChain’s use-cases largely overlap with those of language models in general, including document analysis and summarization, chatbots, and code analysis.

5. PyPDF2: PyPDF2 is a Python library for working with PDF files. It allows you to manipulate and extract information from PDF documents. Some common tasks that PyPDF2 facilitates include merging and splitting PDFs, rotating pages, extracting text, and more.

6. chromadb: Embeddings databases (also known as vector databases) store embeddings and allow you to search by nearest neighbors rather than by substrings like a traditional database. By default, Chroma uses Sentence Transformers to embed for you but you can also use OpenAI embeddings, Cohere (multilingual) embeddings, or your own.

7. faiss-cpu: faiss-cpu refers to the CPU version of FAISS (Facebook AI Similarity Search). FAISS is a library developed by Facebook that is specifically designed for efficient similarity search and clustering of dense vectors. It is widely used in machine learning and information retrieval applications.

8. langchain_google_genai: This module integrates Google’s Generative AI models, specifically the Gemini series, with the LangChain framework. It provides classes for interacting with chat models and generating embeddings, leveraging Google’s advanced AI capabilities.

9. langchain-community: LangChain Community contains third-party integrations that implement the base interfaces defined in LangChain Core, making them ready-to-use in any LangChain application.

Step 3: Storing API key

Create a file .env in project folder

# Assigning value to variable
GOOGLE_API_KEY='Paste API key here'

step 4: Creating file app.py and importing installed packages

import streamlit as st
from PyPDF2 import PdfReader
from langchain.text_splitter import RecursiveCharacterTextSplitter
import os
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_community.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain
import google.generativeai as genai
from langchain.prompts import PromptTemplate
from dotenv import load_dotenv

load_dotenv() # Loads .env file

genai.configure(api_key=os.getenv("GOOGLE_API_KEY")) # Loads API key

Step 5: Extracting text from PDF

def get_pdf_text(pdf_docs):
    text = " "
    # Iterate through each PDF document path in the list
    for pdf in pdf_docs:
        # Create a PdfReader object for the current PDF document
        pdf_reader = PdfReader(pdf)
        # Iterate through each page in the PDF document
        for page in pdf_reader.pages:
            # Extract text from the current page and append it to the 'text' string
            text += page.extract_text()

    # Return the concatenated text from all PDF documents
    return text

Step 6: Getting Chunks

# The RecursiveCharacterTextSplitter takes a large text and splits it based on a specified chunk size.
def get_chunks(text):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=1000)
    chunks = text_splitter.split_text(text)
    return chunks

For detailed explaination of RecursiveCharacterTextSplitter visit https://dev.to/eteimz/understanding-langchains-recursivecharactertextsplitter-2846

Step 7: Creating a vector store using FAISS

def get_vector_store(text_chunks):     
    # Create embeddings using a Google Generative AI model
    embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

    # Create a vector store using FAISS from the provided text chunks and embeddings
    vector_store = FAISS.from_texts(text_chunks, embedding=embeddings)

    # Save the vector store locally with the name "faiss_index"
    vector_store.save_local("faiss_index")

Step 8 : Developing Q&A chain


def get_conversational_chain():
    # Define a prompt template for asking questions based on a given context
    prompt_template = """
    Answer the question as detailed as possible from the provided context, make sure to provide all the details,
    if the answer is not in the provided context just say, "answer is not available in the context", don't provide the wrong answer\n\n
    Context:\n {context}?\n
    Question: \n{question}\n

    Answer:
    """

    # Initialize a ChatGoogleGenerativeAI model for conversational AI
    model = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.3)

    # Create a prompt template with input variables "context" and "question"
    prompt = PromptTemplate(
        template=prompt_template, input_variables=["context", "question"]
    )

    # Load a question-answering chain with the specified model and prompt
    chain = load_qa_chain(model, chain_type="stuff", prompt=prompt)

    return chain

Step 9: Take User Input

def user_input(user_question):
    # Create embeddings for the user question using a Google Generative AI model
    embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

    # Load a FAISS vector database from a local file
    new_db = FAISS.load_local("faiss_index", embeddings)

    # Perform similarity search in the vector database based on the user question
    docs = new_db.similarity_search(user_question)

    # Obtain a conversational question-answering chain
    chain = get_conversational_chain()

    # Use the conversational chain to get a response based on the user question and retrieved documents
    response = chain(
        {"input_documents": docs, "question": user_question}, return_only_outputs=True
    )

    # Print the response to the console
    print(response)

    # Display the response in a Streamlit app (assuming 'st' is a Streamlit module)
    st.write("Reply: ", response["output_text"])

Step 10: main function of the application where all functions are called.


def main():
    st.set_page_config("Chat PDF")
    st.header("Chat with PDF using Gemini")

    user_question = st.text_input("Ask a Question from the PDF Files")

    if user_question:
        user_input(user_question)

    with st.sidebar:
        st.title("Menu:")
        pdf_docs = st.file_uploader(
            "Upload your PDF Files and Click on the Submit & Process Button",
            accept_multiple_files=True,
        )
        if st.button("Submit & Process"):
            with st.spinner("Processing..."):
                raw_text = get_pdf_text(pdf_docs)
                text_chunks = get_chunks(raw_text)
                get_vector_store(text_chunks)
                st.success("Done")


if __name__ == "__main__":
    main()

Complete code link — https://github.com/AkashHiremath856/GeminiStreamlitApp

Unlocking the Power of Intelligence: Building an Application with Gemini, Python, and FAISS for PDF-Driven Multi-Input Processing

Written by Akash.