Azure AI Search and Open AI together: Integrated Embeddings and Projection in Generative AI solutions

19 min readDec 12, 2023

In recent months, we have been witnessing an impressively rapid development of Generative AI technologies, with the likes of OpenAI standing out prominently. These advanced technologies truly shine when they are paired with other, more mature AI technologies that have been in the market for several years. A notable example is the Azure AI Search technology, which has been refined with semantic search capabilities.

The fusion of these two technologies allows for the implementation of Retrieval Augmented Generation (RAG) solutions. In this setup, Azure Search takes the first step by extracting the “top documents”. Leveraging its ability to index millions or even billions of documents with remarkable efficiency, Azure Search overcomes the token limitations inherent to OpenAI’s technology.

Then, Azure OpenAI comes into play. It scrutinizes these top documents, performing a deep-dive search within them to extract precise answers. The responses generated in this way are not just accurate, but also clear and narrative, showcasing the unique capabilities of Azure’s AI.

In this article I will delve into the intricacies of this powerful tech alliance.

We will explore how combining the innovative prowess of Generative AI with the tried-and-tested robustness of Cognitive Search can open new frontiers in the AI landscape.

Up until now, we had grown accustomed to implementing our RAG solutions in two distinct steps: first, the extraction from Azure Search, followed by the invocation of OpenAI.

Moreover, nestled between these two steps, we began to incorporate embeddings -the transformation of text into arrays of decimal numbers-, typically constructed through OpenAI’s “text-embedding-ada-002” deployment. This innovation further enhanced the semantic search capability in the initial phase of retrieval.

In recent weeks, technological advancements have propelled us further forward, introducing two new features.

The automatic generation of embeddings -defined as vectorization-, made possible by an Azure Search (now called “Azure AI Search”) skill, called AzureOpenAIEmbeddingSkill. This eliminates the need for manual calculation.
The projection of arrays into a secondary index. Previously, this was a manual task as Azure AI Search imports files, depositing each one in its entirety into a “search document”, by default in its “content” field. This required the use of the SplitSkill to break it into chunks, which ended up in a “vector field of strings” -Collection(Edm.String)-, but still within the same search document. So we then had to read the chunks array and transfer each element of its into a simple string field -Edm.String- of another index, thus enabling us to obtain the “top chunks” in our searches, instead of the “top documents”, which allowed us to avoid OpenAI limit in the number of tokens it can handle.

These advancements in Azure AI Search are ushering in a new era of efficiency and precision, reducing the amount of manual labor and enhancing the quality of search results. In this article, I will delve deeper into these exciting developments in the realm of Generative AI.

Preamble: This tutorial focuses on the advanced features of Azure AI Search and Azure OpenAI, and therefore presumes that we already have a basic understanding of these services. Without this foundational knowledge, this article could become overly lengthy. For in-depth exploration of these two topics, we can use the following starting points respectively: https://learn.microsoft.com/azure/search/ and https://learn.microsoft.com/azure/ai-services/openai/.

Now, let’s roll up our sleeves and get started :-). To guide you step by step through these processes, we’ll start with a simple example in which we vectorize “Title”, a string field (Edm.String type).

In order to achieve automatic text-to-embedding conversion, and subsequently submit a question that is automatically transformed into an embedding and perform a cosine similarity search between the question vector and all vectors of the index, we need to operate in three areas:

Index: Three considerations are necessary:
1.A) The index must have a field of type Collection(Edm.Single).
1.B) The “vectorSearchProfile” parameter of the index must refer to a “vector-profile”.
1.C) We will need to write the configuration of this vector profile in the index itself, specifying its name, the search algorithm and the vectorization service. More specifically:
— As for the algorithm, the only two choices available today are exhaustive k-nearest neighbors (KNN), which effectively performs a brute-force search, and Hierarchical Navigable Small World (HNSW), which is more efficient and more costly, as it implements an efficient approximate nearest neighbor (ANN) search in high-dimensional spaces.
— Regarding the vectorization service, we will use Azure OpenAI of course, and specifically its deployment “text-embedding-ada-002”. This algorithm is intended for scenarios where high recall is of utmost importance, and users are willing to accept the trade-offs in search performance. So here is the most relevant part of the index definition supporting embeddings:

{
  "name": "{{projection_index1_name}}",
  "defaultScoringProfile": null,
  "fields": [
    {...},
    {
      "name": "title_embeddings",
      "type": "Collection(Edm.Single)",
      "searchable": true,
      "filterable": false,
      "retrievable": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null,
      "normalizer": null,
      "dimensions": 1536,
      "vectorSearchProfile": "my-vectorSearch-profile",
      "synonymMaps": []
    },
    {...}
    ],
  "vectorSearch": {
     "profiles": [
        {
          "name": "my-vectorSearch-profile",
          "algorithm": "my-vectorSearch-algorithm",
          "vectorizer": "my-embeddings-vectorizer"
        }
     ],
     "algorithms": [
       {
          "name": "my-vectorSearch-algorithm",
          "kind": "hnsw",
          "hnswParameters": {
            "m": 4,
            "metric": "cosine",
            "efConstruction": 400,
            "efSearch": 500
           }
       }
      ],
      "vectorizers": [
        {
          "name": "my-embeddings-vectorizer",
          "kind": "azureOpenAI",
          "azureOpenAIParameters": {
            "resourceUri": "https://m*****04.openai.azure.com/",
            "deploymentId": "text-embedding-ada-002",
            "apiKey": "23b*******************3d"
          }
        }
      ]
   }
}

2. Skills: We add, to the skillset, the specific skill to perform the embedding, which is again the “text-embedding-ada-002” by OpenAI.

A good question might be, why do we need to specify this service again?

The reason is that in this case we need it to calculate the embeddings of all “Title” field values during the indexing phase, but we also need to perform cosine similarity search between vectors and to transform the query into an embedding. Note that the “embedding” field produced by the skill is mapped to the “title_embeddings” field of the index, which was created in the previous step:

{
  "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
  "description": "Connects a deployed embedding model.",
  "resourceUri": "https://mm******04.openai.azure.com/",
  "deploymentId": "text-embedding-ada-002",
  "apiKey": "23b*******************3d",
  "inputs": [
    {
      "name": "text",
      "source": "/document/title"
    }
  ],
  "outputs": [
    {
      "name": "embedding",
      "targetName": "title_embeddings"
    }
  ]
}

3. Indexer: Lastly, we set the mapping between

the output field of the skill (title_embeddings in the example above), which contains an array and is expressed in the form /document/title_embeddings/*) and
the title_embeddings field of type Collection(Edm.Single) in the index in the outputFieldMappings section of the indexer:

"outputFieldMappings": [
  {...},
  {
    "sourceFieldName": "/document/title_embeddings/*",
    "targetFieldName": "title_embeddings"
  },
  {...}
]

Well, this part is completed. As we can see from the result below, in addition to the “title” field that Azure AI Search has populated, we have the “title_embeddings” field that contains its vectorization, with an array of 1,536 elements (just four of them shown here):

This is a crucial step in the process as it enables us to translate our raw text data into a format that can be understood and utilized by our AI models. The “title_embeddings” field specifically represents a high-dimensional vector space where similar items are located close to each other. This embedded representation allows us to perform more complex queries and operations on our data such as semantic search, similarity matching, and clustering. Here is a graphical representation of the process we implemented above:

We have therefore learned how to create embeddings and configure the index to perform vector searches, letting the fully managed backend service solve the challenges in a scalable way without writing any custom code: we will only pass textual information to the index (via documents) and we will always submit only textual questions to the search engine, but Azure AI Search under the hood will transparently carry out all the necessary mechanisms for embeddings creation, the related update, and then the search, providing us with the answer always in text format. I can assure you that for those who, as I did with some Customers of mine, have implemented these mechanisms through code in the past months, this is a gigantic step forward.

OK, we’re warmed up. But now we’re ready to play hard: the scenario I want to show you now goes far beyond, covering the following key aspects:

1) We can’t afford to do the embedding of the entire “content” field of every file that Azure AI Search has uploaded, because this could exceed the token limit when we then pass such content to OpenAI. But even if OpenAI were able to sustain the tokens amount, the text would still probably be too much to get a good answer, compared to if we had only used the most significant portion of that text.

2) Since the solution to the previous point is to “break” the “content” field into multiple chunks, we will then have to perform the embedding not of a simple string, but of an array of strings.

3) Furthermore, we can’t forget that Azure AI Search has the ability to add other spectacular skills to the skillset such as entity extraction (people / places / e-mails / urls…), key phrases extraction, translation, sentiment analysis, OCR, as well as the possibility to add custom skills. Well, now we will see how to combine these services into a single skillset, putting at the end the cherry on the cake, namely the transformation first into chunks and then into embeddings of all the content enriched by our pipeline!

Let’s think for a moment about the sequence in which we want to order the objectives presented in these three points. Once it’s clear that chunking is a fundamental operation, if we think about it for a moment, I’m sure we all agree that it would be useful to perform chunking not only on the textual content of the document, but also on the complete content enriched with the text extracted from the images found in the document, if any. For this reason, the first will be the OCRSkill and immediately after we will position the MergeSkill (to actually “merge” the text of the images into its position within the document) and finally the SplitSkill. Yes, the LanguageDetectionSkill is also needed, which is really a detail to implement but very important if we have documents that could be written in different languages.

Well, we have used the first four skills. Is that enough? Not at all! So let’s move on to the next “enrichment skills”, which we will apply to the outputs of the SplitSkill (that I will define as chunks).

More specifically, we’ll leverage the EntityRecognitionSkill to extract entities such as peoples/email/urls/locations, and even the KeyPhraseExtraction to extract key phrases.

Now, let think about the point I mentioned two paragraphs above: these last skills apply to the output of the SplitSkill (e.g. the chunks array), NOT to a single-value field; so we need to find a way to “project” all these values into a secondary index, where each “row” -the correct term is “search document”- of the Azure AI Search index corresponds to a chunk. Well, this feature -namely, Index Projection- is implemented in the skillset (attention: not in the indexer!), and specifically in the “projection” section, as we can see in my skillset implementation:

{
  "name": "{{projection_skillset_name}}",
  "description": "Detect language, run OCR, merge content+ocr_text, split in chunks, extract entities and key-phrases and embed the chunks",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Text.LanguageDetectionSkill",
      "name": "LanguageDetectionSkill",
      "context": "/document",
      "description": "If you have multilingual content, adding a language code is useful for filtering",
      "inputs": [
        {
          "name": "text",
          "source": "/document/content"
        }
      ],
      "outputs": [
        {
          "name": "languageCode",
          "targetName": "language"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Vision.OcrSkill",
      "name": "OcrSkill",
      "description": null,
      "context": "/document/normalized_images/*",
      "textExtractionAlgorithm": null,
      "lineEnding": "Space",
      "defaultLanguageCode": "en",
      "detectOrientation": true,
      "inputs": [
        {
          "name": "image",
          "source": "/document/normalized_images/*"
        }
      ],
      "outputs": [
        {
          "name": "text",
          "targetName": "text_from_ocr"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.MergeSkill",
      "name": "MergeSkill",
      "description": null,
      "context": "/document",
      "insertPreTag": " ",
      "insertPostTag": " ",
      "inputs": [
        {
          "name": "text",
          "source": "/document/content"
        },
        {
          "name": "itemsToInsert",
          "source": "/document/normalized_images/*/text_from_ocr"
        },
        {
          "name": "offsets",
          "source": "/document/normalized_images/*/contentOffset"
        }
      ],
      "outputs": [
        {
          "name": "mergedText",
          "targetName": "merged_text"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
      "name": "SplitSkill",
      "context": "/document",
      "textSplitMode": "pages",
      "maximumPageLength": 5000,
      "defaultLanguageCode": "en",
      "inputs": [
        {
          "name": "text",
          "source": "/document/merged_text"
        },
        {
          "name": "languageCode",
          "source": "/document/language"
        }
      ],
      "outputs": [
        {
          "name": "textItems",
          "targetName": "chunks"
        }        
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.LanguageDetectionSkill",
      "name": "LanguageDetectionSkill_by_chunk",
      "context": "/document/chunks/*",
      "description": "If you have multilingual content, adding a language code is useful for filtering",
      "inputs": [
        {
          "name": "text",
          "source": "/document/chunks/*"
        }
      ],
      "outputs": [
        {
          "name": "languageCode",
          "targetName": "chunk_language"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.V3.EntityRecognitionSkill",
      "name": "EntityRecognitionSkill",
      "description": null,
      "context": "/document/chunks/*",
      "categories": [
        "Person",
        "URL",
        "Email"
      ],
      "defaultLanguageCode": "en",
      "minimumPrecision": 0.5,
      "modelVersion": null,
      "inputs": [
        {
          "name": "text",
          "source": "/document/chunks/*"
        },
        {
          "name": "languageCode",
          "source": "/document/chunks/*/chunk_language"
        }
      ],
      "outputs": [
        {
          "name": "persons",
          "targetName": "persons"
        },
        {
          "name": "urls",
          "targetName": "urls"
        },
        {
          "name": "emails",
          "targetName": "emails"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.KeyPhraseExtractionSkill",
      "defaultLanguageCode": "",
      "modelVersion": "",
      "name": "KeyPhraseExtractionSkill",
      "description": "",
      "context": "/document/chunks/*",
      "inputs": [
        {
          "name": "text",
          "source": "/document/chunks/*"
        },
        {
          "name": "languageCode",
          "source": "/document/chunks/*/chunk_language"
        }
      ],
      "outputs": [
        {
          "name": "keyPhrases",
          "targetName": "key_phrases"
        }
      ]
    }
  ],
  "cognitiveServices": {
    "@odata.type": "#Microsoft.Azure.Search.CognitiveServicesByKey",
    "description": "{{COG_SERVICES_NAME}}",
    "key": "{{COG_SERVICES_KEY}}"
  },
  "indexProjections": {
    "selectors": [
      {
        "targetIndexName": "{{projection_index2_name}}",
        "parentKeyFieldName": "ParentKey",
        "sourceContext": "/document/chunks/*",
        "mappings": [
          {
            "name": "title",
            "source": "/document/title"
          },
          {
            "name": "content",
            "source": "/document/chunks/*"
          },
          {
            "name": "language",
            "source": "/document/chunks/*/chunk_language"
          },
          {
            "name": "persons",
            "source": "/document/chunks/*/persons"
          },
          {
            "name": "urls",
            "source": "/document/chunks/*/urls"
          },
          {
            "name": "emails",
            "source": "/document/chunks/*/emails"
          },
          {
            "name": "key_phrases",
            "source": "/document/chunks/*/key_phrases"
          }
        ]
      }
    ]
  }
}

Yeah, I know, the above json is quite long, I’m sorry for that. BTW, you may notice that it even contains LanguageSkill twice, but it’s not a mistake: in fact I use it initially to split the whole document, but when I extract the key phrases and the entities from each single chunk, it’s a good practice to use the specific chunk language as an input of these skills too, should we have different chunks in different languages within the same document. To make it easier for you to read the skillset definition above, here is a pretty representation of its indexProjections section, showing how I mapped the skills outputs to the index fields:

There is an important detail to address now: every Azure AI Search index must have a “key” field, as we well know.

Indeed, in the first index, I ensured the uniqueness of this field by taking the absolute path of the file, which I then conveniently encoded into BASE64 format. That’s fine for the first index, but this field is no longer suitable in the second index because as we have seen because it wouldn’t be unique any more, since each document may be broken down into chunks. However, if we properly inform Azure AI Search about this point, it will use the following technique to secure key uniqueness: it will deterministically “build” the value of the key field by concatenating the hash of the document with the value of the key field of the primary index, with a sequential number that corresponds to the chunk number of that document: voilà, uniqueness is assured!

And how do we “properly inform” Azure AI Search about this fact?

we add a field (I called it ParentKey) to the secondary index that will host the value of the key field (id) of the primary index,
we specify the name of this field in the configuration of the “Projection” section of the skillset (“parentKeyFieldName”: “ParentKey”),
we assign the “Keyword” value to the “Analyzer” property of the key field, as shown here:

In summary, this configuration ensures that, during the execution of the indexer, the values that would end up in an array are written on separate search documents in the secondary index.

Ok, but there’s still one element missing: the embedding of the chunk.

Which complicates things a bit, because the embedding as we know is already an array (of 1,536 decimal numbers), and even the chunks are arrays too: as a result, the embeddings of the chunks are in fact “arrays of arrays”. But that’s not a problem, because in the secondary index we have prepared precisely an array field for the embeddings, so the projection will simply send each “array of numbers” into the content_embedding field of the secondary index!

And how do we perform the embedding of the document? With the AzureOpenAIEmbeddingSkill of course, and we know how to achieve this because we used it in the first exercise at the beginning of this article, remember? Anyway, for your convience I’m reporting here the skill you should add to the skills section within the skillset (please also recall to update the mappings section of the indexProjections.selectors branch of the the skillset):

{
    "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
    "name": "AzureOpenAIEmbeddingSkill",
    "description": null,
    "context": "/document/chunks/*",
    "resourceUri": "https://m*********4.openai.azure.com",
    "apiKey": "23*******************3d",
    "deploymentId": "text-embedding-ada-002",
    "inputs": [
    {
        "name": "text",
        "source": "/document/chunks/*"
    }
    ],
    "outputs": [
    {
        "name": "embedding",
        "targetName": "chunk_embedded"
    }
    ],
    "authIdentity": null
}

At this point, we might ask ourselves a Hamlet-like question: why would we still need the first index? :-|

And the answer is that it is actually not necessary at all, if our aim is just to use it to build the second index! This does not mean that in certain contexts it might be inappropriate to maintain a “general index” with a search document for each search file, where we may collect higher-level information that could make sense for small documents, for example. But I repeat, it is not mandatory at all.

So, if we don’t want it, we simply specify (within the indexer) the name of the secondary (and unique, in this case!) index, rather then the primary index name; furthermore, we must make sure to associate an empty array to the “outputFieldMappings” parameter of the indexer definition:

{
  "name": "{{projection_indexer_name}}",
  "dataSourceName": "{{projection_datasource_name}}",
  "targetIndexName": "{{projection_index2_name}}",
  "skillsetName": "{{projection_skillset_name}}",  
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_path",
      "targetFieldName": "id",
      "mappingFunction": {
        "name": "base64Encode"
      }
    },
    {
      "sourceFieldName": "metadata_title",
      "targetFieldName": "title"
    }
  ],
  "outputFieldMappings": [],
  "parameters": {
    "maxFailedItems": -1,
    "maxFailedItemsPerBatch": -1,
    "configuration": {
      "dataToExtract": "contentAndMetadata",
      "parsingMode": "default",
      "imageAction": "generateNormalizedImages"
    }
  }
}

, since this index won’t be filled through the indexer, but it will be actually filled through the indexProjections / selectors section of the skillset:

{
 "name": "projection-skillset",
 ...
 "knowledgeStore": null,
 "indexProjections": {
    "selectors": [
      {
        "targetIndexName": "{{projection_index2_name}}",
        "parentKeyFieldName": "ParentKey",
        "sourceContext": "/document/chunks/*",
        "mappings": [
          {
            "name": "title",
            "source": "/document/title"
          },
          {
            "name": "content",
            "source": "/document/chunks/*"
          },
          {
            "name": "language",
            "source": "/document/chunks/*/chunk_language"
          },
          {
            "name": "persons",
            "source": "/document/chunks/*/persons"
          },
          {
            "name": "urls",
            "source": "/document/chunks/*/urls"
          },
          {
            "name": "emails",
            "source": "/document/chunks/*/emails"
          },
          {
            "name": "key_phrases",
            "source": "/document/chunks/*/key_phrases"
          },
          {
            "name": "content_embedded",
            "source": "/document/chunks/*/chunk_embedded"
          }
        ]
      }
    ]
  }
}

In essence, the decision to use one or two indices depends on our specific use case and requirements: having a single index can simplify the configuration and management of our Azure AI Search solution; however, maintaining a separate general index can be beneficial for storing high-level information and providing a broader view of your data, especially for smaller documents. With Azure AI Search, we have the flexibility to choose the approach that best suits our needs.

So this is the final result, where we can see, for each search document of the secondary (and eventually only) index, all realted elements we need, namely its embeddings, key phrases, entities, text extracted from images contained therein:

By the way, in this case we have implemented a “single” field for embeddings, however Azure AI Search allows for the creation of multiple embedding fields, thus enabling multi-dimensional analysis based on different types of embeddings.

We have completed all the tasks required to input all our documents into the index, divided into chunks and enriched through OCR and entity extraction. So, to wrap up the article, all that’s left is to see if it works — shall we give it a go?

Be warned, this is no trivial task, because our aim isn’t just to search the Azure Search index to get a bunch of results containing what we’re interested in. No, we want to ASK a QUESTION and get a SPECIFIC ANSWER, extracted from the most relevant information resulting from a search within the index.

To achieve this goal, we currently have two possible routes:

1. Two steps: we make a first call to Azure AI Search to extract the “top chunks” that are then passed to OpenAI through a second call. This is well illustrated in the Azure Cognitive Search & OpenAI Solution Accelerator overseen by my colleague Pablo Marin. I won’t go into the details now as they are very well explained in the third notebook and in the get_search_results function of utils.py, leveraging some LangChain libraries dedicated to retrieval-augmented generation. However, here is a Python snippet that allows us to quickly test our solution:

# Collect constants
from dotenv import load_dotenv
load_dotenv("credentials_my.env")

# Text-based Indexes that we are going to query (from Notebook 01 and 02)
index   = "projection-index2"

# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_BASE"]    = os.environ["AZURE_OPENAI_ENDPOINT"]
os.environ["OPENAI_API_KEY"]     = os.environ["AZURE_OPENAI_API_KEY"]
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]
os.environ["OPENAI_API_TYPE"]    = os.environ["OPENAI_API_TYPE"]
MODEL                            = os.environ["COMPLETION432_DEPLOYMENT"]
COMPLETION_TOKENS                = 1000

top_search_vector_k              = 5

# the question to ask - enable just one
# QUESTION = "Tell me briefly what are the advantages of defining compositionality using MDL."
# QUESTION = "What is a meaning function?"
QUESTION = "Why is von Neumann is considered as one of fathers of modern computers?"



# Azure AI Search query
import requests, json

# Setup the Payloads header
headers = {'Content-Type': 'application/json','api-key': os.environ['AZURE_SEARCH_KEY']}
params  = {'api-version': os.environ['AZURE_SEARCH_API_VERSION_NEW']} # NEW VERSION!!!

# search query payload
search_payload = {
  "count": "true",
  "speller": "lexicon",
  "queryLanguage": "en-us",

  "top": top_search_vector_k,
  "select": "id, name, title, location, chunk",
  "vectorQueries": [
    {
      "kind": "text",
      "k": top_search_vector_k,
      "fields": "chunkVector",
      "text": QUESTION
    }
  ]
}

r = requests.post(os.environ['AZURE_SEARCH_ENDPOINT'] + "/indexes/" + index + "/docs/search",
                     data=json.dumps(search_payload), headers=headers, params=params)



# Format Azure AI Search results as an ordered dictionary
from collections import OrderedDict

ordered_results = OrderedDict()

for result in r.json()['value']:
    if result['@search.score'] > 0.25:# Show answers that are at least 25% of the max possible score=1
        ordered_results[result['id']]={
            "title": result['title'],
            "name": result['name'],
            "location": result['location'],
            "index": index,
            "chunk": result['chunk'],
            "score": result['@search.score']
        }



# From the ordered dictionary, build the "Documents" array, needed by langchain "qa_with_sources" chain
from langchain.docstore.document import Document

top_docs = []
for key,value in ordered_results.items():
    location = value["location"] + os.environ['BLOB_SAS_TOKEN'] # to create "citations"
    top_docs.append(Document(page_content=value["chunk"], metadata={"source": location}))



# Prepare the Open AI deployment
from langchain.chat_models import AzureChatOpenAI

llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0, max_tokens=COMPLETION_TOKENS)



# Extract the specific answer from the Documents array, using Open AI through Langchain
from langchain.chains.qa_with_sources import load_qa_with_sources_chain

# Choose either "stuff" or "map_reduce", depending on how long the text is. Leave just one uncommented:
chain_type = "stuff" # the LLM is able to manage the promtp + request + response in a single call
# chain_type = "map_reduce" # the LLM needs to split the promtp + request in multiple calls

if chain_type == "stuff":
    chain = load_qa_with_sources_chain(llm, chain_type=chain_type)
elif chain_type == "map_reduce":
    chain = load_qa_with_sources_chain(llm, chain_type=chain_type, return_intermediate_steps=True)
    
response = chain({"input_documents": top_docs, "question": QUESTION, "language": "English"})



# Print the final result, including the citation(s)
from IPython.display import display, HTML, Markdown

if chain_type == "map_reduce":
    for step in response['intermediate_steps']:
        display(HTML("<b>Chunk Summary:</b> " + step))

display(HTML(f"<br/><br/><b>FINAL ANSWER with --{chain_type}-- chain type:</b>"))
display(Markdown(response['output_text']))

, which generates the following output:

FINAL ANSWER with — stuff — chain type:
John von Neumann is considered as one of the “fathers” of modern computers due to his theoretical works and the development of the computer EDVAC between 1945 and 1950. He also participated in drawing the logic of quantum mechanics, which is now being used in the development of quantum computers. His work in these areas has contributed significantly to the field of computer science and the development of modern computers. SOURCES: https://myblobstorageaccount.blob.core.windows.net/arxivcs2/0001/0001001v1.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zC*****k%2B0%3D

2. We make a single call to OpenAI, passing all the necessary information to access to Azure Search (its endpoint, authentication method, index name…) in the payload to extract the top chunks from the vector index we built in Azure AI Search.

At this point, the question arises: Who calls Who?

…and the answer is that OpenAI calls Azure Search, there’s no doubt about it as we’ve only invoked the Azure OpenAI endpoint. So here’s how to implement this second method, which is certainly more practical than the previous one, even if potentially less flexible and controllable; to be frank I haven’t tested it that much but I know we may encounter some challenges if Azure Search is located behind a private endpoint. However, here is a Python snipped that I tested successfully; as we see, it’s much shorter than the previous one, but it gets exactly the same answer we’ve got earlier:

# the question to ask - enable just one
# QUESTION = "Tell me briefly what are the advantages of defining compositionality using MDL."
# QUESTION = "What is a meaning function?"
QUESTION = "Why is von Neumann is considered as one of fathers of modern computers?"

# Azure AI Search query
import requests, json

# Setup the Payloads header
headers = {'Content-Type': 'application/json','api-key': os.environ['AZURE_OPENAI_API_KEY']}
params  = {'api-version': '2023-12-01-preview'}

# search query payload
search_payload = {
    "messages": [
        {
            "role": "user",
            "content": QUESTION
        }
    ],
    "temperature": 0,
    "top_p": 1.0,    
    "max_tokens": 1000,
    "dataSources": [
        {            
            "type": "AzureCognitiveSearch",
            "parameters": {                
                "endpoint": os.environ['AZURE_SEARCH_ENDPOINT'],
                "key": os.environ['AZURE_SEARCH_KEY'],
                "indexName": index
            }
        }
    ]
}

r = requests.post(
    f"{os.environ['AZURE_OPENAI_ENDPOINT']}openai/deployments/{MODEL}/extensions/chat/completions",
    data=json.dumps(search_payload), headers=headers, params=params
)

print(r.json()['choices'][0]['message']['content'])

We are at the conclusions.

In this article, we discussed how we can leverage Azure AI Search to create embeddings, chunk data, perform entity extraction, and implement OCR, among other things. We looked at how the projection feature can be used to manage data at a granular level, allowing us to create a secondary index that holds the chunk and its related embeddings, key phrases, entities, and text extracted from images contained therein.

Finally, we discussed the decision to use one or two indices, and how it depends on the specific use case and requirements. And we closed the article showing how to get the final result in one shot to OpenAI, rather than calling Azure AI Search separately. After creating the index, we saw how to configurare Azure Open AI to query it under the hood to provide the final answer back, without requiring the implementation of any intermediate steps by the developer or data sceintist.

To sum up, Azure AI Search is a powerful tool that enables advanced search and analysis capabilities on large volumes of textual data. By integrating AI, it provides a scalable solution to transform and analyze data, helping to generate more relevant and insightful results.

So, the only thing left for us to do is to put this knowledge into practice and start searching! I hope this guide has been helpful in understanding the potential and flexibility of Azure AI Search. Happy searching!

Azure AI Search and Open AI together: Integrated Embeddings and Projection in Generative AI solutions

We are at the conclusions.

Written by Mauro Minella