Step-by-Step Guide to Integrate Azure Cognitive Search’s Vector search in Your ChatGPT-like App — Part 2

10 min readSep 3, 2023

This is continuation to part 1, where I discussed pros and cons of Semantic and Vector Search. https://medium.com/@akshaykokane09/building-knowledge-base-for-your-llm-powered-app-using-azure-cognitive-search-part-1-4686127c49cb

This part, I will focus on implementation of Vector Search for your LLM based AI app.

Let’s start by defining Objective:

Objective: Creating a HR Chatbot Utilizing OpenAI Model and Azure Cognitive Search, Empowered by Vector Search, to Retrieve Pertinent Information from company’s Documents and Provide Natural Language Responses to User Queries

This is the high level design for the ap

Perquisites:

Python environment
IDE/Notebook
Azure Subscription
Open AI / Azure Open AI API keys

Step 1: Creating Vector Index

Azure Learn as excellent doc on “How to create vector index” . Feel free to refer that too for creating of vector index.

Create an Azure Cognitive Search service: If you haven’t already, create an Azure Cognitive Search service in the Azure portal. Go to the Azure portal (portal.azure.com), click on “Create a resource,” search for “Azure Cognitive Search,” and follow the prompts to create a new service.

Create Index with Vector Field and confgiure the vector. 1536 is dimensions for text-embeddding-ada-002 model, that we will be using in our LLM application

Create Vector Field “contentVector”, with dimension as 1536

Configuration for your vector field. I kept everything as default

I will also add one more field, “ActualContent,” which will store the actual content document. This field should be “Retrievable” and “Searchable.” We need to have at least one string searchable field if we want to do Hybrid search (Semantic + Vector). Yes, you heard it right, Azure Cognitive Search allows us to do hybrid search. Isn’t that interesting?

Azure Cognitive Search opens the door to a spellbinding capability: hybrid search. Brace yourself to wield the dual might of Semantic and Vector Search simultaneously. It’s like having two aces up your sleeve, ready to dazzle your search experience!

We are ready to create index now:

Step 2: Prepare data for ingestion

For the example purpose, I am using GPT created dataset.

Disclaimer: The dataset presented in this context is solely created for illustrative and example purposes using OpenAI’s ChatGPT — chat.openai.com. It does not depict or represent any real-world data, individuals, or entities.

Dataset has 2 columns, DocumentName and DocumentContent.

Init: Install the azure-search-documents and openai Python packages using pip by running the following commands in your Python environment. Import the necessary Python libraries and setup necessary

#! pip install azure-search-documents --pre
#! pip install openai

import pandas as pd
import openai
import pandas as pd
import openai
import json
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.models import Vector

# Open AI Key
openai.api_key = "sk-<YOUR_OPEN_AI_KEY>"
# embedding model
embedding_model = "text-embedding-ada-002"

# ref: https://learn.microsoft.com/en-us/azure/search/search-security-api-keys?tabs=portal-use%2Cportal-find%2Cportal-query
service_endpoint = "https://<YOUR_ACS_INSTANCE_NAME>.search.windows.net"
key = "<YOUR_ACS_INSTANCE_KEY>" 
index_name = "medium-article-2"
credential = AzureKeyCredential(key)

2. Utilize the OpenAI Embedding Model “text-embeddding-ada-002” to transform our documents into vectors.

# Define a function to get text embeddings using OpenAI's text-embedding model.
def get_embedding(text, model="text-embedding-ada-002"):
    # Replace newline characters with spaces in the input text.
    text = text.replace("\n", " ")
    
    # Call OpenAI's text-embedding API to obtain embeddings for the input text.
    embeddings = openai.Embedding.create(input=[text], model=model)
    
    # Extract the embedding vector from the API response and return it.
    embedding_vector = embeddings['data'][0]['embedding']
    return embedding_vector

# Apply the get_embedding function to each document content in the 'data' DataFrame
# and store the resulting embeddings in a new column called 'embedding'.
data["embedding"] = data.DocumentContent.apply(lambda x: get_embedding(x, model=embedding_model))

3. Let’s transform data into required fields


# DataFrame with assigned columns
assigned_df = data.assign(
    contentVector=data["embedding"],
    actualContent=data["DocumentContent"],
    id=data["DocumentName"]
)

# DataFrame with index columns
filtered_df = assigned_df.drop(
    ["embedding", "DocumentContent", "DocumentName"],
    inplace=False,
    axis=1
)

Your final data frame should look like this

4. Store the processed data in the json format.

filtered_df.to_json('/output/data.json', orient='records')

Step 3: Ingest Data to your index

In Azure Cognitive Search there are two ways to ingest data

Pull Model : The pull model in Azure Cognitive Search automates the process of fetching data from various supported sources and then importing that data into your search index. This functionality is achieved through components called indexers.

Step 1: Set up a Storage Account in your Azure subscription.
Step 2: Create a container in the Blob Storage of your Storage Account. Upload the JSON file that you generated in the previous step to this container.
Step 3: Establish a connection between your data source (Azure Blob Storage) and Azure Cognitive Search.

Step 4: Create an Indexer, which is a tool that automatically pulls data from your Azure Blob Storage container and loads it into your Azure Cognitive Search index.

2. Push Model:

Push Model is the programmatic approach to ingest your data into index.

# Open and read the 'data.json' file, which contains the documents to be uploaded and queried
with open('data.json', 'r') as file:
    documents = json.load(file)

# Create a Search Client instance for uploading and querying data
search_client = SearchClient(endpoint=service_endpoint, index_name=index_name, credential=credential)

# Upload the documents to the specified search index using the Search Client
result = search_client.upload_documents(documents)

# Print the number of documents successfully uploaded
print(f"Uploaded {len(documents)} documents")

If you want to learn more check this sample code from Azure Cogntiive repo https://github.com/Azure/cognitive-search-vector-pr/blob/main/demo-python/code/azure-search-vector-python-sample.ipynb

🎉 Congratulations! Now you’re all set to start querying and searching your data in Azure Cognitive Search.

You can also query vector data from search explorer. Follow this learn link to : https://learn.microsoft.com/en-us/azure/search/vector-search-ranking

Step 4: Query Data from Semantic Kernel App

I have created the LLM AI App using Semantic Kernel for my previous article.

def do_vector_search(query):
    # Initialize the search client with the appropriate service endpoint, index name, and credentials
    search_client = SearchClient(service_endpoint, index_name, credential=credential)

    # Get the vector representation of the query using the 'get_embedding' function
    vector = Vector(value=get_embedding(query), k=3, fields="contentVector")

    # Perform a vector search with the specified vector and retrieve relevant fields
    results = search_client.search(
        search_text=None,
        vectors=[vector],
        select=["actualContent", "id"],
        top=1
    )

    # Iterate through the search results and return the id and actualContent of the first result
    # You can also check the @score returned, and decide the threshold for the match
    # Vector match score is bounded between -1 to +1
    for result in results:
        return result['id'], result['actualContent']

Step 5: Add function in your Chat Function and in Prompt to do RAG

Build Semantic Kernel and create semantic function for chatbot using the below prompt.

# Reference: https://github.com/microsoft/semantic-kernel/blob/main/python/samples/kernel-syntax-examples/chat.py

import semantic_kernel as sk
import semantic_kernel.connectors.ai.open_ai as sk_oai

sk_prompt = """
ChatBot can only answer questions from the information it has from this {{$document}}.
It can give explicit instructions or say 'I don't know'
when it doesn't know the answer.

User:> {{$user_input}}
ChatBot:>
"""

kernel = sk.Kernel()

kernel.add_chat_service(
    "chat-gpt", sk_oai.OpenAIChatCompletion("gpt-3.5-turbo", "sk-<REPLACE_WITH_YOUR_KEY>")
)

prompt_config = sk.PromptTemplateConfig.from_completion_parameters(
    max_tokens=2000, temperature=0.7, top_p=0.4
)

prompt_template = sk.PromptTemplate(
    sk_prompt, kernel.prompt_template_engine, prompt_config
)

function_config = sk.SemanticFunctionConfig(prompt_config, prompt_template)
chat_function = kernel.register_semantic_function("ChatBot", "Chat", function_config)

Now our prompt is ready to be used. Create Chat Function with following code:

# Define an asynchronous function to get an answer from the ChatBot
async def get_answer(input, context_vars: sk.ContextVariables) -> bool:
    # Set the user's input in the context variables
    context_vars["user_input"] = input

    # Perform a vector search to find relevant content
    id, content = do_vector_search(input)

    # Set the found content as the document in context variables
    context_vars["document"] = content

    # Use the kernel to asynchronously run the chat function and get an answer
    answer = await kernel.run_async(chat_function, input_vars=context_vars)

    # Print the ChatBot's response and the source ID
    print(f"ChatBot:> {answer}. ")
    print(f"Source {id}")

    # Return True to indicate successful completion
    return True

# Define an asynchronous chat function
async def chat(input) -> None:
    # Create a context with variables for the conversation
    context = sk.ContextVariables()

    # Call the get_answer function to interact with the ChatBot
    await get_answer(input, context)

✌️ And you are all set

await chat("can i take 10 days PTO in a year?")

'''
ChatBot:> The information provided does not specify the exact number of PTO days an employee can take in a year. It states that PTO accrual rates will be outlined in the employee's offer letter or contract. I recommend referring to your offer letter or contract to determine the specific number of PTO days you are entitled to.. 
Source PTOPolicy
'''

await chat("provide advice in points for company's coding style?")

'''
ChatBot:> Sure! Here are some advice for your company's coding style:
1. Use descriptive names: Use meaningful and descriptive names for variables, functions, classes, and files to enhance code clarity and readability.

2. Follow naming conventions: Follow established naming conventions such as camelCase for variables and functions, PascalCase for class and type names, and prefix interfaces with "I" to maintain consistency.

3. Maintain consistent indentation and formatting: Use consistent indentation with 4 spaces for each level of code block. Place curly braces on their own lines for control structures and functions. Limit line length to 100 characters for improved readability.

4. Provide comments and documentation: Add comments to explain complex code, algorithms, or any non-obvious logic. Document public functions, methods, and classes using clear and concise descriptions. Use JSDoc-style comments for documenting JavaScript code.

5. Implement proper error handling: Always include proper error handling to ensure graceful failure and meaningful error messages. Use try-catch blocks for exception handling and avoid using empty catch blocks.

6. Break down complex logic: Break down complex logic into smaller, reusable functions or methods. Follow the Single Responsibility Principle (SRP) to keep functions and classes focused on specific tasks.

7. Utilize version control: Use version control systems like Git for all code repositories. Follow the established branching strategy and commit message conventions to ensure efficient collaboration and code management.

Remember, these guidelines are designed to align your software development practices with industry best practices and facilitate collaboration among your development teams.. 
Source CodingStyle

'''

Some closing thoughts,

You can control the relevancy score and set threshold on how exact match you want between user query and document. More detials here https://learn.microsoft.com/en-us/azure/search/vector-search-ranking
There are some vector limit, that you need to take into consideration while designing production ready app https://learn.microsoft.com/en-us/azure/search/vector-search-index-size
As I told earlier, Azure Cognitive Search allows hybrid search, so that you can leverage Semantic and Vector Search while querying your index.
Finally, Azure Cognitive Search’s Vector Index is a game changer and most awaited Vector Store from Azure for your AI App

UPDATE:

The Semantic Kernel now allows Vector Search with the Memory Connector (https://devblogs.microsoft.com/semantic-kernel/announcing-semantic-kernel-integration-with-azure-cognitive-search/).

This is a great development, as using the Semantic Kernel Memory abstracts out the embedding logic.

#create kernel with memory storage as AzureCognitiveSearchMemoryStore
var kernel = new KernelBuilder()
    .WithAzureTextEmbeddingGenerationService(
        "text-embedding-ada-002",
        AZURE_OPENAI_ENDPOINT,
        AZURE_OPENAI_API_KEY)
    .WithMemoryStorage(new AzureCognitiveSearchMemoryStore(
        AZURE_SEARCH_ENDPOINT,
        AZURE_SEARCH_ADMIN_KEY))
    .Build();

#save the data in vector index
await kernel.Memory.SaveReferenceAsync(
        collection: "GitHubFiles",
        externalSourceName: "GitHub",
        externalId: entry.Key,
        description: entry.Value,
        text: entry.Value);

One drawback of the above approach is that the Semantic Kernel requires data to be indexed only in the MemoryRecord format, which includes fields such as ‘externalSourceName,’ ‘externalId,’ ‘description,’ ‘text,’ etc. Therefore, if you are ingesting data from an external source, you will need to convert the data into the required fields before ingesting it.

References:

cognitive-search-vector-pr/demo-python/code/azure-search-vector-ingestion-python-sample.ipynb at…

The official private preview documentation and code samples for the Vector search feature in Azure Cognitive Search. …

github.com

semantic-kernel/python/samples/kernel-syntax-examples at main · microsoft/semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps …