Step-by-Step Guide to Integrate Azure Cognitive Search’s Vector search in Your ChatGPT-like App — Part 2

Akshay Kokane
10 min readSep 3, 2023

--

This is continuation to part 1, where I discussed pros and cons of Semantic and Vector Search. https://medium.com/@akshaykokane09/building-knowledge-base-for-your-llm-powered-app-using-azure-cognitive-search-part-1-4686127c49cb

This part, I will focus on implementation of Vector Search for your LLM based AI app.

Let’s start by defining Objective:

Objective: Creating a HR Chatbot Utilizing OpenAI Model and Azure Cognitive Search, Empowered by Vector Search, to Retrieve Pertinent Information from company’s Documents and Provide Natural Language Responses to User Queries

This is the high level design for the ap

Perquisites:

  1. Python environment
  2. IDE/Notebook
  3. Azure Subscription
  4. Open AI / Azure Open AI API keys

Step 1: Creating Vector Index

Azure Learn as excellent doc on “How to create vector index” . Feel free to refer that too for creating of vector index.

Create an Azure Cognitive Search service: If you haven’t already, create an Azure Cognitive Search service in the Azure portal. Go to the Azure portal (portal.azure.com), click on “Create a resource,” search for “Azure Cognitive Search,” and follow the prompts to create a new service.

Create Index with Vector Field and confgiure the vector. 1536 is dimensions for text-embeddding-ada-002 model, that we will be using in our LLM application

Create Vector Field “contentVector”, with dimension as 1536
Configuration for your vector field. I kept everything as default

I will also add one more field, “ActualContent,” which will store the actual content document. This field should be “Retrievable” and “Searchable.” We need to have at least one string searchable field if we want to do Hybrid search (Semantic + Vector). Yes, you heard it right, Azure Cognitive Search allows us to do hybrid search. Isn’t that interesting?

Azure Cognitive Search opens the door to a spellbinding capability: hybrid search. Brace yourself to wield the dual might of Semantic and Vector Search simultaneously. It’s like having two aces up your sleeve, ready to dazzle your search experience!

Adding String field “actualContent”

We are ready to create index now:

Index ready to be created

Step 2: Prepare data for ingestion

For the example purpose, I am using GPT created dataset.

Disclaimer: The dataset presented in this context is solely created for illustrative and example purposes using OpenAI’s ChatGPT — chat.openai.com. It does not depict or represent any real-world data, individuals, or entities.

Dataset has 2 columns, DocumentName and DocumentContent.

Dataset Example
  1. Init: Install the azure-search-documents and openai Python packages using pip by running the following commands in your Python environment. Import the necessary Python libraries and setup necessary
#! pip install azure-search-documents --pre
#! pip install openai

import pandas as pd
import openai
import pandas as pd
import openai
import json
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.models import Vector
# Open AI Key
openai.api_key = "sk-<YOUR_OPEN_AI_KEY>"
# embedding model
embedding_model = "text-embedding-ada-002"

# ref: https://learn.microsoft.com/en-us/azure/search/search-security-api-keys?tabs=portal-use%2Cportal-find%2Cportal-query
service_endpoint = "https://<YOUR_ACS_INSTANCE_NAME>.search.windows.net"
key = "<YOUR_ACS_INSTANCE_KEY>"
index_name = "medium-article-2"
credential = AzureKeyCredential(key)

2. Utilize the OpenAI Embedding Model “text-embeddding-ada-002” to transform our documents into vectors.

# Define a function to get text embeddings using OpenAI's text-embedding model.
def get_embedding(text, model="text-embedding-ada-002"):
# Replace newline characters with spaces in the input text.
text = text.replace("\n", " ")

# Call OpenAI's text-embedding API to obtain embeddings for the input text.
embeddings = openai.Embedding.create(input=[text], model=model)

# Extract the embedding vector from the API response and return it.
embedding_vector = embeddings['data'][0]['embedding']
return embedding_vector

# Apply the get_embedding function to each document content in the 'data' DataFrame
# and store the resulting embeddings in a new column called 'embedding'.
data["embedding"] = data.DocumentContent.apply(lambda x: get_embedding(x, model=embedding_model))

3. Let’s transform data into required fields

Index fields that we created

# DataFrame with assigned columns
assigned_df = data.assign(
contentVector=data["embedding"],
actualContent=data["DocumentContent"],
id=data["DocumentName"]
)

# DataFrame with index columns
filtered_df = assigned_df.drop(
["embedding", "DocumentContent", "DocumentName"],
inplace=False,
axis=1
)
Your final data frame should look like this

4. Store the processed data in the json format.

filtered_df.to_json('/output/data.json', orient='records')

Step 3: Ingest Data to your index

In Azure Cognitive Search there are two ways to ingest data

  1. Pull Model : The pull model in Azure Cognitive Search automates the process of fetching data from various supported sources and then importing that data into your search index. This functionality is achieved through components called indexers.
  • Step 1: Set up a Storage Account in your Azure subscription.
  • Step 2: Create a container in the Blob Storage of your Storage Account. Upload the JSON file that you generated in the previous step to this container.
  • Step 3: Establish a connection between your data source (Azure Blob Storage) and Azure Cognitive Search.
Adding Data source
  • Step 4: Create an Indexer, which is a tool that automatically pulls data from your Azure Blob Storage container and loads it into your Azure Cognitive Search index.
Creating Indexer

2. Push Model:

Push Model is the programmatic approach to ingest your data into index.

# Open and read the 'data.json' file, which contains the documents to be uploaded and queried
with open('data.json', 'r') as file:
documents = json.load(file)

# Create a Search Client instance for uploading and querying data
search_client = SearchClient(endpoint=service_endpoint, index_name=index_name, credential=credential)

# Upload the documents to the specified search index using the Search Client
result = search_client.upload_documents(documents)

# Print the number of documents successfully uploaded
print(f"Uploaded {len(documents)} documents")

If you want to learn more check this sample code from Azure Cogntiive repo https://github.com/Azure/cognitive-search-vector-pr/blob/main/demo-python/code/azure-search-vector-python-sample.ipynb

🎉 Congratulations! Now you’re all set to start querying and searching your data in Azure Cognitive Search.

View from Search Explorer

You can also query vector data from search explorer. Follow this learn link to : https://learn.microsoft.com/en-us/azure/search/vector-search-ranking

Step 4: Query Data from Semantic Kernel App

I have created the LLM AI App using Semantic Kernel for my previous article.

def do_vector_search(query):
# Initialize the search client with the appropriate service endpoint, index name, and credentials
search_client = SearchClient(service_endpoint, index_name, credential=credential)

# Get the vector representation of the query using the 'get_embedding' function
vector = Vector(value=get_embedding(query), k=3, fields="contentVector")

# Perform a vector search with the specified vector and retrieve relevant fields
results = search_client.search(
search_text=None,
vectors=[vector],
select=["actualContent", "id"],
top=1
)

# Iterate through the search results and return the id and actualContent of the first result
# You can also check the @score returned, and decide the threshold for the match
# Vector match score is bounded between -1 to +1
for result in results:
return result['id'], result['actualContent']

Step 5: Add function in your Chat Function and in Prompt to do RAG

Build Semantic Kernel and create semantic function for chatbot using the below prompt.

# Reference: https://github.com/microsoft/semantic-kernel/blob/main/python/samples/kernel-syntax-examples/chat.py

import semantic_kernel as sk
import semantic_kernel.connectors.ai.open_ai as sk_oai

sk_prompt = """
ChatBot can only answer questions from the information it has from this {{$document}}.
It can give explicit instructions or say 'I don't know'
when it doesn't know the answer.

User:> {{$user_input}}
ChatBot:>
"""

kernel = sk.Kernel()

kernel.add_chat_service(
"chat-gpt", sk_oai.OpenAIChatCompletion("gpt-3.5-turbo", "sk-<REPLACE_WITH_YOUR_KEY>")
)

prompt_config = sk.PromptTemplateConfig.from_completion_parameters(
max_tokens=2000, temperature=0.7, top_p=0.4
)

prompt_template = sk.PromptTemplate(
sk_prompt, kernel.prompt_template_engine, prompt_config
)

function_config = sk.SemanticFunctionConfig(prompt_config, prompt_template)
chat_function = kernel.register_semantic_function("ChatBot", "Chat", function_config)

Now our prompt is ready to be used. Create Chat Function with following code:

# Define an asynchronous function to get an answer from the ChatBot
async def get_answer(input, context_vars: sk.ContextVariables) -> bool:
# Set the user's input in the context variables
context_vars["user_input"] = input

# Perform a vector search to find relevant content
id, content = do_vector_search(input)

# Set the found content as the document in context variables
context_vars["document"] = content

# Use the kernel to asynchronously run the chat function and get an answer
answer = await kernel.run_async(chat_function, input_vars=context_vars)

# Print the ChatBot's response and the source ID
print(f"ChatBot:> {answer}. ")
print(f"Source {id}")

# Return True to indicate successful completion
return True

# Define an asynchronous chat function
async def chat(input) -> None:
# Create a context with variables for the conversation
context = sk.ContextVariables()

# Call the get_answer function to interact with the ChatBot
await get_answer(input, context)

✌️ And you are all set

await chat("can i take 10 days PTO in a year?")

'''
ChatBot:> The information provided does not specify the exact number of PTO days an employee can take in a year. It states that PTO accrual rates will be outlined in the employee's offer letter or contract. I recommend referring to your offer letter or contract to determine the specific number of PTO days you are entitled to..
Source PTOPolicy
'''

await chat("provide advice in points for company's coding style?")

'''
ChatBot:> Sure! Here are some advice for your company's coding style:
1. Use descriptive names: Use meaningful and descriptive names for variables, functions, classes, and files to enhance code clarity and readability.

2. Follow naming conventions: Follow established naming conventions such as camelCase for variables and functions, PascalCase for class and type names, and prefix interfaces with "I" to maintain consistency.

3. Maintain consistent indentation and formatting: Use consistent indentation with 4 spaces for each level of code block. Place curly braces on their own lines for control structures and functions. Limit line length to 100 characters for improved readability.

4. Provide comments and documentation: Add comments to explain complex code, algorithms, or any non-obvious logic. Document public functions, methods, and classes using clear and concise descriptions. Use JSDoc-style comments for documenting JavaScript code.

5. Implement proper error handling: Always include proper error handling to ensure graceful failure and meaningful error messages. Use try-catch blocks for exception handling and avoid using empty catch blocks.

6. Break down complex logic: Break down complex logic into smaller, reusable functions or methods. Follow the Single Responsibility Principle (SRP) to keep functions and classes focused on specific tasks.

7. Utilize version control: Use version control systems like Git for all code repositories. Follow the established branching strategy and commit message conventions to ensure efficient collaboration and code management.

Remember, these guidelines are designed to align your software development practices with industry best practices and facilitate collaboration among your development teams..
Source CodingStyle

'''

Some closing thoughts,

UPDATE:

The Semantic Kernel now allows Vector Search with the Memory Connector (https://devblogs.microsoft.com/semantic-kernel/announcing-semantic-kernel-integration-with-azure-cognitive-search/).

This is a great development, as using the Semantic Kernel Memory abstracts out the embedding logic.

#create kernel with memory storage as AzureCognitiveSearchMemoryStore
var kernel = new KernelBuilder()
.WithAzureTextEmbeddingGenerationService(
"text-embedding-ada-002",
AZURE_OPENAI_ENDPOINT,
AZURE_OPENAI_API_KEY)
.WithMemoryStorage(new AzureCognitiveSearchMemoryStore(
AZURE_SEARCH_ENDPOINT,
AZURE_SEARCH_ADMIN_KEY))
.Build();

#save the data in vector index
await kernel.Memory.SaveReferenceAsync(
collection: "GitHubFiles",
externalSourceName: "GitHub",
externalId: entry.Key,
description: entry.Value,
text: entry.Value);

One drawback of the above approach is that the Semantic Kernel requires data to be indexed only in the MemoryRecord format, which includes fields such as ‘externalSourceName,’ ‘externalId,’ ‘description,’ ‘text,’ etc. Therefore, if you are ingesting data from an external source, you will need to convert the data into the required fields before ingesting it.

References:

Disclaimer : This blog is not affiliated with, endorsed by, or sponsored in any way by Microsoft Corporation or any of its subsidiaries. Any references to Microsoft products, services, logos, or trademarks are used solely for the purpose of providing information and commentary. The views and opinions expressed on this blog are the author’s own and do not necessarily reflect the views or opinions of Microsoft Corporation

--

--

Akshay Kokane

Software Engineer at Microsoft | Microsoft Certified AI Engineer & Google Certified Data Engineer