Discovering Azure’s Computer Vision and Cloud Search Services — Part 1

Nicholas Hurt
11 min readAug 8, 2018

--

Background

I recently heard about a client’s challenge around matching images of receipts to store and product metadata in their database. They had already built a rather sophisticated solution in C#, which ran queries against the database to find the best match based on a number of permutations. It reminded me of a similar challenge I once faced in another life, where we had to write some crazy complex matching against some really large tables. There were two main lessons I took away from this experience: 1. writing one’s own matching algorithm is hard and can take time, a lot of time. At times. it can seem like a never-ending rabbit hole of fine-tuning. 2. You inevitably face issues of scale and throughput, especially if it’s tightly coupled to a database that’s not distributed. The more complex the algorithm the longer it will take to return a result, and the more CPU cycles it will “steal” from other operational processes.

Based on my experience I would now always recommend moving intensive workloads outside of the database where possible, particularly if they originate out of the database, or are on a one way ticket to another system. This may sometimes require learning new skills or embracing new technologies but you should allow your database the “head room” to do what it was originally designed for. Choose the correct tool for the job and harnessing the power of the cloud. In this case, the Microsoft Azure cloud!

Feeling rather inspired, I decided to build a small proof-of-concept to solve this problem. I knew the two services which I thought would be suitable for task at hand but often it requires a simple and inexpensive prototype to validate your decision.

The Business Challenge

In it’s simplest form, the challenge is to recognise text from an image and match it to some “internal” metadata.

Let’s assume our company, Acme (a subsidiary of Contoso) wants to migrate their on-prem receipt matching software to the Azure Cloud. Let’s build a little POC to demonstrate the capabilities of the platform. This challenge could be relevant to any company that is extracting metadata from an image, and / or searching against their internal system for known artefacts in order to derive insights from these positively identified instances.

I have set myself a few goals when building this POC. The solution must:

  • use a serverless & event driven architecture
  • be scalable & cost-effective, ideally the POC should be completed within a free trial/tier
  • use boilerplate code where possible to minimise custom code
  • demonstrate a working prototype in a few days

A tough challenge? Not with pre-built services :)

Microsoft’s Cognitive Services are a fantastic set of pre-built AI services, ranging from language to vision to search. These intelligent services could save you the time and effort of training your own ML models, especially if you don’t have the skills to begin with. They give you, the citizen data scientist, or the database developer, an opportunity to “infuse AI” into the solutions you build. This is part of Microsoft’s strategy; to democratise AI.

Azure Cloud Search is a fully managed and scalable search service. It reduces the complexity of building and runing your own search algorithms and has a rich set of capabilities for text analysis, scoring and filtering.

To see some of these services in action check out the JFK files or the Insurance bot demos!

In the first of this three part series, I’ll start by solving what could be deemed as the two main technical challenges: 1. Text recognition: extracting the text from an image, and 2. Search: match the text to internal metadata. In the second post we’ll write code to orchestrate these two tasks and in the final post, we’ll define the architecture and complete the infrastructure required and run the end-to-end demo.

My objective in this post is to prove the solution “could” work, before we start diving into any code. The idea being, that with enough curiosity and determination, anyone from any background could start to experiment with the awesome capabilities of the Azure platform, without spending a dime!

If you’re new to Azure and want to follow along, you can sign up for a free trial which will simply suspend once you go over the free credit limit of $150. Once signed up I would also highly recommend creating a new resource group and using it for each resource you create. Then when you’re done, all you need to do is delete the resource group rather than deleting each individual resource.

Secondly, I have hyperlinked so many parts of this blog (probably too many) but this will allow you to quickly dive deeper or they may reference step-by-step guides others have taken the time to produce. Now on with the show…

Text Recognition

In terms of image recognition, Azure Computer Vision has a powerful set of intelligent capabilities from describing what it “sees” in the image, to character recognition (OCR) to identifying landmarks or celebrities. The standard OCR API will output the tokens (words) found in the image, but what we need for this solution is a digitised form of the receipt, which returns the snippets or phrases of text found in the (receipt) image. So for this we need to use the v2 API which can even recognise hand-writing! Please be sure to take note the image requirements such as file size, format and dimension. Looking at the pricing, I’ve chosen the most expensive of all the Vision services but at $2.50 per 1000 transactions it’s still a bargain considering, you’re not paying for the server, or the AI developer or worst case, a person manually transcribing 1000s of images.

To get a sneak preview of the capabilities, go to the homepage and check out some of the demos. Then scroll down to “read text in images” and upload an image of one of your receipts. Once your image has uploaded, you should see the result and “magically” your image has been digitised.

Nice! Now let’s explore the API a little further…

Vision API requires either the image binary or a URL passed to it, and I’ve decided on the latter for simplicity. To reference the image via URL we’re going to upload it to Azure Store and enable static website hosting. Do this by setting up a general purpose v2 storage account and enabling static website. Take note of the primary endpoint provided once it’s enabled.

To upload the image into storage there are a number of ways to skin this cat, but let’s use Azure Storage Explorer. A recently updated (version 1.3.1) in July 2018 enabled support for the $web container used by your static website so you should find it there under blob containers once you’ve signed in.

After the upload is complete I can test my URL which is simply the endpoint you took note of earlier with the image filename appended e.g. https://receiptstogo.z5.web.core.windows.net/receipt1osm.jpg

Next, let’s set up our OCR service by either creating a new Computer Vision resource in the Azure portal or signing up for the free 7 day trial. Either way, grab your keys and it’s ready to use, that simple!

Hop over to the recognize text section of the API documentation, which is also accessible from the API menu item on the homepage. Review the details of the API and you will notice that you actually need to make two calls to use this service, the first one to obtain the Operation-Location ID, which you use in the second API call to get the text from in the image.

Lower down on that page, choose a region and on the next page specify the mode parameter, and enter your key in the header. Scroll down to the request body and replace the test url with the url to your image. The HTTP request is going to look like this…

When you click send, you should receive a 202 response with the operation location. Copy that hash looking ID at the end of the url and head over to the “Get Recognize Text Operation Result” section, enter the ID and your key and click send.

Success! You should end up with a JSON response with the recognised text and bounding boxes.

In my case, this pre-trained model did a great job of recognising the text except for the misidentified pound signs, but I think we can live with that for now. If you have a very specialised case or the accuracy is not sufficient for your requirement, then you would need to think about training your own custom AI model, but that will require significant time, effort and cost. Obviously, I am only testing with one image here so one would need to conduct further tests with a decent sample of images. For now though, I’ve proved we can obtain lines of text from the receipt by making two simple API calls. Now we’re ready to explore the next part of the solution…

Cloud Search

As the title suggests, for search we’ll look no further than Azure Cloud Search which provides powerful search capabilities on top of your existing data. It uses the mighty Lucene library for full text search and you can also add cognitive search capabilities to the indexing process, such as entity extraction, NLP and image processing skills.

You may be wondering why we need to use this service at all. Why can’t we use our own logic, SQL or full text search in the database? I can think of two good reasons which I’ve mentioned before. Firstly, the matching you’re likely to use is going to only get you so far before you need to add more logic and more libraries to expand the capabilities of your solution. The other reason is that it’s going to be slow and eat up valuable database cycles. So why not offload your search activities to the Cloud, where you can perform

“specialized text processing features such as linguistic-aware text processing (stemming, lemmatization, word forms) in over 55 languages. It also supports autocorrection of misspelled words, synonyms, suggestions, scoring controls, facets, and custom tokenization.”

It even supports geospatial searching and filtering too. Now try coding all of those capabilities yourself, you’ll be busy for months if not years! What about unstructured data I hear you say? Well here’s a blog which covers that.

If you’re still not convinced then I recommend you read the FAQ. If there’s still that killer feature you need, please submit your request via UserVoice. It’s safe to say I’m a fan of this technology, so on with the show…

In our scenario, Acme’s data resides in an Azure SQL database but they haven’t supplied any sample data so we’ll need to simulate that. Incidentally, you can also index data from other sources, or even push data into the search index. Azure Search has crawler like capabilities for certain data sources using indexers, which do not require a single line of code, and that’s where we’ll begin.

First get a database up and running, create a schema and populate some data. If you know how to do all of that, breeze over the next few paragraphs, otherwise it’s really simple, no prior SQL knowledge required, I promise :)

Log into the Azure portal, create a standard blank SQL Database

Once created, you can conveniently connect to it in Visual Studio by using the connect with menu option in the overview blade.

This configures a firewall entry to allow you to access the database from your local machine and voila! you are connected to your cloud database using your favourite IDE!

Some developers or DBAs would black-list me forever after reading the next bit, but my goal here was to write little to no code, so I’m going to use the designer to help me create tables and populate them with some data…

I added tables, specified fields, slapped on a foreign key and clicked update — now my database has two tables ensuring that products can only belong to a specific store. Next, refresh the tree view, find the tables, click view and punch in some data.

Next let’s create an Azure Search service…

Once the service is ready (usually seconds) go to the resource and click import data…

Select the database we just created as the data source and choose the stores table for now.

Skip the Cognitive Search step in this case as we’re not wanting to enrich the data as we load it into the index, we’re just wanting to import the data as is. In the Customize target index panel, keep the defaults but select retrievable and searchable for the store field.

Click OK. In the final panel, give your indexer a name and choose to run it once off.

There are a couple of options to keep your data synchronised in the search index: you can choose between a push (most flexible approach) vs pull mode and in the latter you can use either a schedule or change tracking.

When done, head over to the Search Explorer view to test out the search functionality. Enter the following query string, replacing the store name with one that was returned from your Vision API response earlier on…

search=john lewis&$select=Id,Store

Besides the JSON response, you’ll notice the request URL provided, so let’s use Postman to make that GET request by supplying the api-key in the header to authenticate. You’ll find the primary admin key in the Keys section of your search service menu.

Success? Hopefully yes!

Part 1 — Conclusion

Using a very simple test case, we’ve demonstrated that with these two services and three API calls we can retrieve the text from an image and we can run a search against our data which does not hit our source database. We’ve also done this with minimal code and all within the free tier.

In the next post, where we’ll need to roll up our sleeves as we’re going to throw a curve ball at the search service and dive into a bit of code which will “plumb” these two services together.

--

--

Nicholas Hurt

My personal blog, usually tech related. My views are my own.