Document Querying with LLMs — Google PaLM API (Semantic Search With LLM Embeddings)

Syed Muhammed Hassan Ali
6 min readSep 8, 2023

Introduction

With Large Language Models (LLMs), we can integrate domain-specific data to answer questions. This is especially useful for data unavailable to the model during its initial training, like a company’s internal documentation or knowledge base.

The architecture is called Retrieval Augmentation Generation or less commonly used Generative Question Answering.

This article helps you understand how to implement this architecture using LLMs and a Vector Database. We can significantly decrease the hallucinations that are commonly associated with LLMs.

It can be used for a wide range of use cases. It reduces the time we need to interact with documents. There is no need for us anymore to search for answers in search results. The LLM takes care of precisely finding the most relevant documents and using them to generate the answer right from your documents.

Apply LLMs to your domain-specific data

In this example, I will be using Google PaLM API that will help document search with embeddings and the usage of VectorDB with Chroma.

What is a Vector Database?

Efficient data processing has become more crucial than ever for applications that involve large language models, generative AI, and semantic search.

All of these new applications rely on vector embeddings, a type of data representation that carries within it semantic information that’s critical for the AI to gain understanding and maintain a long-term memory they can draw upon when executing complex tasks.

Embeddings are generated by AI models (such as Large Language Models) and have a large number of attributes or features, making their representation challenging to manage. In the context of AI and machine learning, these features represent different dimensions of the data that are essential for understanding patterns, relationships, and underlying structures.

With a vector database, we can add advanced features to our AIs, like semantic information retrieval, long-term memory, and more. The diagrams below give us a better understanding of the role of vector databases in this type of application:

Let's jump to the Code

Step 1: Enable Needed APIs

Run gcloud init to authenticate with your GCP user and project.

Enable the necessary APIs:

gcloud services enable aiplatform.googleapis.com --async

Then generate a PaLM key from the console.

Step 2: Setup a Flask Project

Although in this example I have used Flask, you can use any Python-based framework. You can also use this template.

Install the necessary packages:

pip install Flask Flask-Cors google-generativeai requests PyPDF2

Step 3: Encode any PDF

The API developed will require an encoded PDF as a request body and a query parameter that specifies what the API should do with the document. Example:

curl -X POST "https://your-api-url.com/search-pdf?query=keyword" \
-H "Content-Type: application/pdf" \
--data-binary "@path/to/encoded.pdf"

In the routes file, define the API endpoint.

from flask import Blueprint
from src.controllers.reorderController import getSolutions

reorderBlueprint = Blueprint('blueprintt', __name__)

# suggesting solutions by reading PDF file

reorderBlueprint.route('/getSolutions/<query>', methods=['POST'])(getSolutions)

The controller forwards the API request to the service layer.

from flask import jsonify, request
from src.services.palm_api_Service import get_solutions


def set_response_headers(response):
response.headers['Access-Control-Allow-Origin'] = '*'
response.headers['Access-Control-Allow-Headers'] = 'Origin, X-Requested-With, Content-Type, Accept'
return response


def getSolutions(query):
try:
pdf_content = request.get_data()
solutions = get_solutions(pdf_content, query)
structuredData = jsonify(data=solutions)
response = set_response_headers(structuredData)
return response, 200

except Exception as e:
return jsonify(error=str(e)), 500

Then comes the implementation of the service layer.

Decode the original PDF of n pages to get back the content. And then pass the whole content to the vector database. Create a collection store where you store your embeddings, documents, and any metadata.

@staticmethod
def get_solutions(pdf_content, query):
try:
# Decode the base64-encoded PDF content
pdf_content_dict = json.loads(pdf_content)
pdf_content_encoded = pdf_content_dict['pdfContentEncoded']
pdf_content_decoded = base64.b64decode(pdf_content_encoded)

# Read the content of each page of the PDF and concatenate them into a single string
pdf_reader = PyPDF2.PdfReader(io.BytesIO(pdf_content_decoded))
pdf_content = ""
for page in range(len(pdf_reader.pages)):
pdf_content += pdf_reader.pages[page].extract_text()

# Set up the DB
db = create_chroma_db([pdf_content], "exampleDBCollection")

temperature = 0.65

response = answer(text_model, query, db, temperature)

# Return the concatenated text
return response
except Exception as e:
print("Error generating response: " + str(e))
return str(e)

Configure the PaLM with your API key and get the embeddings and text generation model

palm.configure(api_key='YOUR_API_KEY')


models = [m for m in palm.list_models(
) if 'embedText' in m.supported_generation_methods]
model = models[0]


text_models = [m for m in palm.list_models(
) if 'generateText' in m.supported_generation_methods]
text_model = text_models[0]

You will create a custom function for performing embedding using the PaLM API. By inputting a set of documents into this custom function, you will receive vectors or embeddings of the documents.

def embed_function(texts: Documents) -> Embeddings:
# Embed the documents using any supported method
return [palm.generate_embeddings(model=model, text=text)['embedding']
for text in texts]

Note that the embedding function from above is passed as an argument to the create_collection .



def create_chroma_db(documents, name):
chroma_client = chromadb.Client()
db = chroma_client.create_collection(
name=name, embedding_function=embed_function)
for i, d in enumerate(documents):
db.add(
documents=d,
ids=str(i)
)
return db

Step 4: Getting the relevant document

Db is a Chroma collection object. You can call query on it to perform a nearest neighbours search to find similar embeddings or documents.

def get_relevant_passage(query, db):
passage = db.query(query_texts=[query], n_results=1)['documents'][0][0]
return passage

Now that you have found the relevant passage in your set of documents, you can use it to make a prompt to pass into the PaLM API.

def make_prompt(query, relevant_passage):
escaped = relevant_passage.replace(
"'", "").replace('"', "").replace("\n", " ")
prompt = ("""You are a helpful and informative bot that answers questions using text from the reference passage included below. \
Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. \
However, you are talking to a non-technical audience, so be sure to break down complicated concepts and \
strike a friendly and converstional tone. \
If the passage is irrelevant to the answer, you may ignore it.
QUESTION: '{query}'
PASSAGE: '{relevant_passage}'

ANSWER:
""").format(query=query, relevant_passage=escaped)

return prompt

def answer(model, query, db, temperature):
passage = get_relevant_passage(query, db)
prompt = make_prompt(query, passage)
answer = palm.generate_text(prompt=prompt, model=model, candidate_count=3,
temperature=temperature, max_output_tokens=1024,)
return answer.candidates[0]['output']

The answer function will generate a response based on the query you have passed in. It retrieves the relevant document, and from there calls the PaLM text generation API to generate a response to the query.

Example API

Conclusion

Searching through documents to find answers is tedious and time-consuming. However new AI techniques are emerging to automate this process. As outlined in this article, Large Language Models like Google’s PaLM can be leveraged to rapidly search documents and generate natural language answers to queries.

The key is encoding documents into vector embeddings that capture semantic meaning. These are indexed in a vector database to enable fast semantic search. When a question is asked, the most relevant passage is retrieved to provide context. PaLM takes this passage and generates a conversational response, eliminating the need to read through documents.

You can access the complete source code at https://github.com/Syed007Hassan/Document-Querying-With-VectorDB.

If you enjoyed this article, please click on the clap button 👏 and share to help others find it!

--

--