Azure Cognitive Search

brianmrush
Hitachi Solutions Braintrust
8 min readOct 11, 2019

Although this article is about Azure Cognitive Search capabilities, to me its really about a larger theme. With the cloud firmly taking hold (I know duh!) and the enormous ecosystem of services and computing capabilities within the cloud, new systems and approaches to solving problems are now happening with great ease and at a unbelievable pace.

Is Anything Really Ever New?

Take this as a analogy, the mobile phone. If we stop to really think about the mobile phone, in some sense this was an extrapolation on a host of existing technologies. The phone has been around for well over 100 years. This is also true of the personal computer, GPS, cameras, headphones, lithium batteries, wireless, internet, and more. All of these technologies are very much established and have been around for some time now. The mobile phone, in essence, aggregated and extrapolated on these technologies and the rest is well… history.

Tell Me Something New.

This metaphor works well with cloud technologies. Entire new systems are being created by leveraging the vast ecosystem of services offered by the cloud. Examples include big data, AI, distributed systems, natural language processing, search indexing, security, efficient data storage, etc. The list goes on and on. In this article, I want to show how combining existing Azure services make it so simple to create new systems that used to require a lot of custom code. In short, we are combining and extrapolating on Azure’s cloud services to create new systems to solve new types of problems and solve them easily. In this specific case, I want to discuss leveraging Azure’s Cognitive Search. I hope the take away is that you start to see how you can solve new problems by applying well established services from Azure.

Azure Cognitive Skills

So what is Azure Cognitive Search? Azure Cognitive Search is an artificial intelligence (AI) feature of the Azure Search ecosystem. We can use this service to extract text from unstructured data sources and apply intelligence on the contents of the unstructured data. The data sources include items such as images, blobs, documents (Word, PDF, Excel, XML, etc). Through Azure Cognitive Services we can gain valuable insights about the content in these documents. After gaining insights, Azure Search allows for these insights to be indexed and searchable.

Azure Search supports two general categories for enriching our content.

Natural Language Processing

This can be sentiment analysis, language analysis, key phrase and topic extraction, or entity recognition. These type of insights can be mapped into searchable and filterable fields in an Azure Search Index.

Image Processing

This can be things such as:

  1. Feature detection (facial recognition, category recognition (landmarks, famous people)
  2. Attribute detection (colors, orientation)

These features are achieved by leveraging Azure’s Machine Learning APIs for text analysis and computer vision.

With all this being said, here is a picture of how unstructured data can flow through an Azure Search Index pipeline, get enriched by applying Azure Cognitive Services, and then added to an Azure Search Index:

Azure AI Enriched Search Pipeline

In this example, we are leveraging existing technologies such as Azure Machine Learning, Azure Cognitive Skills, and Azure Search. We are not writing the algorithms but rather leveraging them.

So What’s The Scenario?

So lets walk through an example and put this pipeline into action. Here is a business scenario. We have a repository of a large amount of blog posts. We want to open the blog posts, read the contents, categorize, and extract key words used with Cognitive Services. In addition, we want to see the names of people, organizations, and locations all mentioned in these blogs. Once we have that information we want to take that enriched output and feed it into an Azure Search index. For the sample I am going to describe below, I used a large set of blog posts from this set of training data.

Here is the high-level approach on how to accomplish our business goal:

Phase 1: Data Source and Cracking

In this phase, we are getting the unstructured blob posts uploaded into blob storage and then “cracking” them open to read the contents from the source data into the AI Enriched Azure Search Pipeline.

Phase 2: AI Enrichment

In this phase, we leverage Azure AI capabilities and algorithms. More specifically, we apply Azure Cognitive skills to extract the following information from the documents:

  • Key Phrases — This skill uses a pre-trained model to detect important phrases based on term placement, linguistic rules, proximity to other terms, and how unusual the term is within the source data.
  • Language Detection — This skill uses a pre-trained model to detect which language is used (one language ID per document). When multiple languages are used within the same text segments, the output is the LCID of the predominantly used language.
  • Entity Recognition — This skill uses a pre-trained model to establish entities for a fixed set of categories: people, location, organization, emails, URLs, and datetime fields.

As the cognitive skills are applied, we take the results and then feed them into and Azure Search Index.

Phase 3: Index and Search

In this phase, Azure Search will index and create the storage schema for all are documents running through the pipeline. Once our documents are indexed, we can query the documents through a known api to extract relevant documents.

Lets Do It for Real

So lets walk through the steps of creating the pipeline and then querying the search index for results:

Step 1: Create a blog storage and upload the documents to be indexed

  1. Download sample blogs
  2. Create an Azure Blog Container
  3. Once the container is created, upload the blog posts from Step 1
Upload Blogs into Azure Blog Storage

Step 2: Create the Azure Search Index Pipeline

  1. In Azure portal, add an Azure Search service:

2. Import Data into Azure Search

Once your search service completes, import data into the Azure Search by selecting the import button:

Import Data

3. Connect Your Pipeline to a Data Source

In this step, point your pipeline to the blog posts that were uploaded to blob storage:

4. Incorporate Azure Cognitive Skills into the Pipeline

In Azure Portal, create an Azure Cognitive Service:

Create Cognitive Service

Next, attach Cognitive Services to the Azure Search service:

Attach Cognitive Services

Once the Cognitive Services is attached, add enrichment skills. This is where the special sauce of the Azure Cognitive skills kick in and extract that from each of the documents:

Add Cognitive Service Enrichment Skills

5. Setup Azure Search Index

In this step, we are creating a schema of fields that will be indexed in Azure Search:

Azure Search Index Schema Definition

In this case, we can see the fields that are added to the schema as a result of applying the Cognitive Services. This is where the special sauce kicks in. We get insights into each of our documents based on the Azure Cognitive Services skills. For example, the Language field will store the results of the Detect Language cognitive skill. Similarly, the People field will store the results of the extracting people names from the blog documents. In essence, we are enriching our search index with data derived by applying Azure Cognitive AI skills.

6. Configure the Indexer

In this step we basically configure how often our Azure Search Index is updated:

Configure Indexer

Once the index is complete, we can then query our index using the api to extract documents that meet the relevant criteria.

7. Search the Azure Search Index

Now that we have enriched our Azure Search service with an index, we can query that index however we see fit. We can do this using a rest API but for purposes of showing results, I am just going to use the search explorer. The Azure Search service has a rich restful api that allows for accessing the data. I am not going into details of that but rather just showing a few examples.

Given that I had 20K blog posts, finding the relevant blogs based on some criteria could be time consuming. Now that they have been indexed, I can start to search these blogs for information in a more intelligent and efficient manner. For example, lets say I want to get a list of the blogs that mention the NFL in the organization field. As a reminder, the organization field was enriched in our index by applying the entity recognition Cognitive Services skill set. Lets run that query:

Searching Index for Blogs that have NFL in the Organization Field

We can see our restful query that will search the index of all blog documents that have the “NFL” as a value in the organization field. The result is 72 blogs. Also note the search score, which is an indicator of how relevant each blog is to the term “NFL” in the organization field.

Lets look at another example. Lets search for any blog entries that have the term “weather” as a key phrase.

Search for “weather” key phrase

Our result is 1,699 blog entries, along with a pointer to where they reside in blob storage.

So What Did We Find?

In this article, we saw how we could leverage existing services in Azure to enrich an Azure Search Index by applying Azure Cognitive Skills. We described the analogy of the mobile phone and how the phone aggregated a host of existing technologies to create a new solution. Similarly in Azure, we can assemble a series of services to create new solutions. In this specific case, we assembled a collection of cloud services, namely Azure Blob Storage, Azure Search Services, and Azure Cognitive Services, which leverage intelligent AI algorithms. And, we produced a searchable repository of our blog articles with minimal effort.

--

--