Unleashing the Power of Semantic Kernel and Azure Cognitive Search: A Step-by-Step Guide to Building Your Own ChatGPT-like App with Internal Data! — Part 1

Akshay Kokane
5 min readJun 30, 2023

--

The article is divided into two parts:

Semantic Kernel is an open-source SDK that makes it easy to combine AI services like OpenAI, Azure OpenAI, and Hugging Face with conventional programming languages like C# and Python. By doing so, you can create AI apps that combine the best of both worlds.

In this blog post, we will show you how to use Semantic Kernel to build an AI-powered app. We will start by discussing the benefits of using Semantic Kernel, and then we will walk you through the steps involved in building an app. You can learn more here about it here : https://learn.microsoft.com/en-us/semantic-kernel/overview/

Benefits of Using Semantic Kernel:

  • Fast integration: Semantic Kernel is designed to be embedded in any kind of application, making it easy for you to test and get running with LLM AI.
  • Extensibility: With Semantic Kernel, you can connect with external data sources and services — giving your apps the ability to use natural language processing in conjunction with live information.
  • Better prompting: SK’s templated prompts let you quickly design semantic functions with dynamic content and behavior.
  • Reusable code: Semantic Kernel’s skills are reusable, so you can easily build new apps by combining existing skills.

Prerequisite:

  1. Knowledge about Semantic Search/Embeddings https://platform.openai.com/docs/guides/embeddings/what-are-embeddings
  2. Azure Subscription (You can use Free credits) — Azure Cognitive Search Instance endpoint or key
  3. Open AI endpoint and key
  4. .NET SDK 7, Visual Studio Code

What is Semantic Search?

Semantic search is a way for AI assistant to understand the meaning behind words and provide more relevant answer based on context.

What are Vectors?

Vectors, in the context of semantic search, are mathematical representations of words or documents. They capture the semantic meaning and context by mapping words or documents to numerical vectors in a high-dimensional space

What is VectorDB?

Vector DB is a database that stores data in a way that represents the relationships and similarities between different pieces of information, making it easier to retrieve relevant results in semantic search.

Why Azure Cognitive Search and not Vector DB?

Azure Cognitive Search offers an exceptional semantic search capability that rivals any other vector database available. What truly sets it apart is the incredible advantage it provides: the elimination of the need to handle embeddings. This service is entirely managed, alleviating the burden of complex infrastructure management.

Vector embeddings serve as a remarkable method of representing data, transforming it into a vector comprised of numerical values. This ingenious representation enables the seamless translation of human-perceived semantic similarity into proximity within a vector space. With Azure Cognitive Search, you can effortlessly harness this powerful technology to enhance your search experiences.

Goal:

Unleash the transformative potential of a remarkable chatbot, destined to astound, as it harnesses the untapped knowledge within your firm’s internal documents to deliver lightning-fast answers and revolutionize the way you engage with information.

Architecture:

Part 1 : Create Azure Cognitive Search Index

To create an Azure Cognitive Search index, follow these steps:

  1. Create an Azure Cognitive Search service: If you haven’t already, create an Azure Cognitive Search service in the Azure portal. Go to the Azure portal (portal.azure.com), click on “Create a resource,” search for “Azure Cognitive Search,” and follow the prompts to create a new service.

2. Prepare your data: Before creating an index, you need to have your data ready. Azure Cognitive Search supports various data sources, including Azure Blob storage, Azure Cosmos DB, Azure SQL Database, and more. Ensure your data is structured and available in one of the supported sources.

Simple Pipeline Flow:

  1. 1. Regularly collects information from specified sources.
  2. Data Cleaning: You can tidy up the data using tools like Databricks/Synapse notebooks. If you prefer not to write code, you can use DataFlow for data cleaning.
  3. Data Chunking: Occasionally, documents are excessively long, so it’s beneficial to break them into smaller parts and store them to achieve better performance.
  4. Data Transformation: The Semantic Kernel needs data in a specific MemoryFormat. To meet this requirement, you must convert the data to the ACS format and import it into ACS. You can check the format in Step 4’s scrrenshot.

3. Define an index schema: An index schema defines the fields and attributes of your data that will be searchable. Determine the fields you want to include in your index, their data types, and any additional properties such as searchability, filtering, sorting, etc. You can define the schema using JSON or use the Azure portal’s visual designer.

4. Create an index:There are multiple ways to create an index in Azure Cognitive Search. You can do it by Azure Portal, CLI, SDK, REST API, etc. I will use Azure Portal for this article

Screenshot of Azure Portal, when you Create Index

5. Enable Semantic Search for you instance:

Screenshot of Portal, when you want to enable Semantic Search on your instance

6. Create Semantic Search Config on created index:

Screenshot of Azure Portal, when adding Semantic Configuration to your created index
Screenshot of Azure Portal, when Semantic Configuration is added to your index

Learn more about Semantic Configuration here.

7. Push data into the index: Once your index is created, you need to push your data into the index. Depending on your data source, you can use various methods such as data ingestion pipelines, Azure Data Factory, Azure Functions, or direct API calls to populate the index with your data.

8. Monitor and refine the index: After data ingestion, monitor the index to ensure it is updated and reflects the latest changes in your data source. You can also refine the index by configuring search settings, custom analyzers, suggesters, scoring profiles, and more to optimize search results and relevance.

That’s it! You have successfully created an Azure Cognitive Search index. You can now start querying the index to perform powerful semantic search for your AI application

Part 2: https://medium.com/@akshaykokane09/how-to-build-chatgpt-like-app-with-semantic-kernel-and-azure-cognitive-search-on-internal-data-814e4694decb

References:

  1. https://learn.microsoft.com/en-us/semantic-kernel/overview/
  2. https://github.com/microsoft/semantic-kernel/blob/main/samples/notebooks/dotnet/06-memory-and-embeddings.ipynb
  3. https://learn.microsoft.com/en-us/azure/search/semantic-search-overview

--

--

Akshay Kokane

Software Engineer at Microsoft | Microsoft Certified AI Engineer & Google Certified Data Engineer