Ran Gong
Ran Gong
Jun 19 · 7 min read

This article is written in collaboration with Anthony Franklin, PhD from Microsoft

With the fast-paced development of technologies, most of the daily transactions we’re dealing with have been migrated to the internet. Although raw scanned documents allow for digital storage, there typically remains the need for intelligent structuring and parsing of the content within the form. Azure AI has best in class OCR services and AI tools that can be leveraged for intelligently extracting information from handwritten documents. This blog will walk you through a specific use case while leveraging both Azure Cognitive Services and Azure Databricks.

Use Case:

Hospital intake forms are required for all visitors, regardless of whether or not they’ve visited before. The hospital wants to utilize millions of filled forms for future research. Since this form is completed on-demand, it is typically handwritten. This implies the hospital’s patient services staff are challenged with a manual and costly form manual process. Most hospitals would find great value and time savings if they could automate this process. So, what should they do?

Why Azure Cognitive & Databricks:

Microsoft Azure provides a suite of ML and AI services for practitioners to leverage across a wide variety of use cases. Two of the most popular sets of services are the Azure Cognitive Services and Azure Databricks. Azure cognitive services are APIs, SDKs and services providing pre-built AI models allowing developers to leverage cognitive features into their intelligent applications. The services are categorized into 6 modules: vision, speech, language, search, labs and knowledge. Any analytics practitioner can perform image classification by simply calling a RESTful API. Typically the image classification process would require training a model with hundreds of thousands or millions of images, then the corresponding infrastructure to deploy the model as a service. All of that is done for you (Thanks Azure!) On the other hand, Azure Databricks (ADB) is a fast, easy and collaborative Apache Spark based analytics service. Azure Databricks provides an interactive workspace with a notebook-like environment for development.

In many analyses, the results of the image classifying service are simply a step in the development, and that information can be used as an input to another process. In the cases where further development of the resulting information requires more significant distributed processing, ADB can be the application of choice. Moreover, there are cases where ADB is being leveraged for data pipeline work and providing complimentary information to the image classification results. In either scenario, there are clear use cases for coupling the Azure Cognitive Services and Azure Databricks.


1. You have created a computer vision cognitive service. (Learn How)

2. You have created a Blob storage account with image files stored. (Learn How)

3. You have created an ADB workspace instance, please follow the instructions in this link. (Learn How).

4. Form Sample: The sample form used in this scenario is in Image 1 below. The “patient” is a gentleman named Benjamin Button who was born on December 25th, 1965 and was hurt while playing basketball. Let’s see if our vision service can capture the same information.

This article is the first step (Part 1) in a series. We will walk through how to convert large set of forms filled with handwritten into computer readable data set in the first part. And talk about how to format the dataset and analyze it in future sections.

How to?

Create ADB Cluster:

We start this example from a previously created ADB workspace instance. Next, we must create an Azure Databricks cluster with the following characteristics:

  • Cluster Name: <Your choice>
  • Autopilot Options: Terminate after “60” minutes of inactivity
  • Worker Type:
  • VM: Standard_DS3_v2
  • Min=2, Max=4
  • Remaining Options: Default

Install Azure Cognitive Services Python SDK:

Once the cluster is created, we need to install the Azure Cognitive Service Computer Vision SDK. First, navigate to the workspace folder structure of this project, and “right-click” in the open space. Select the “Import” option from the menu. Select the hyperlink “click here” at the bottom for importing a library. That will bring you to a dialog box with options to input information about the library (see image below).

Next, select “Upload Python Egg or PyPi” as the source and enter the following PyPi name: “azure-cognitiveservices-vision-computervision” into the frame. Next select “Install Library” button to complete the installation. Repeat this process to install the “requests” library from PyPi. Be sure to navigate to the appropriate cluster you created and attach this library to that cluster.

ADB Notebook:

We need to create a new python notebook in your project folder. Attach the notebook to your active cluster (you will recognize a green dot, when the cluster is active).

We will detail the necessary code snippets to call the cognitive service within your ADB workspace notebook. The code snippets below can serve as a template for calling the computer vision service and simply can be directly inserted into a cell within your ADB python notebook.The complete code can be found here.

Step 1 Import Modules

The first step is to import the specific Azure “ComputerVisionClient”, along with its corresponding class “CognitiveServiceCredentials”. The computer vision client is a custom Azure class designed to extend the service client for making requests. The “CognitiveServiceCredentials” class creates a credentials object that uniquely identifies the client’s subscription. The “requests” library is a popular library for making HTTP requests.

from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials
import requests
import os
import time
import json

Step 2 Mount Blob Storage Container

Azure Databricks is not intended for permanent file storage. So, to access our data and files, we will set up a pointer to our previously created Azure Blob storage container. This pointer is referred to as a mount point. The code below walks through how to create a mount point using the appropriate configurations. First, we must specify the source and point to the Blob url. The mount point attribute is the folder name that will be used within Azure Databricks. It must start with “/mnt/…” for all mount points. The extra configs are used to make sure the user can authenticate and connect to the BLOB container.

dbutils.fs.mount(source = “wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/”, mount_point = “/mnt/<desired-folder-name> “, extra_configs = {“fs.azure.account.key.<your-storagre-account-name>.blob.core.windows.net”:”<your_account_key>”})

Step 3 Define classes

To simply the application of the client, we suggest you create some custom classes to consolidate important information.

Define a class with two attributes:

url: endpoint of your custom vision service, for example (https://eastus.api.cognitive.microsoft.com/)

key: your key to cognitive service. You can retrieve it from your custom vision service, under Resource Management, click on “Keys”:

class CognitiveConfig:def __init__(self, url, key):self.url = urlself.key = key

Define a class for image path:

storagePath: folder path of your mounted blob. For example: “/dbfs/mnt/forms” (notice the appending of “/dbfs” to your mount location you specified above)

imgName: The image file’s name

function getFullPath: Join the folder path and image file path together to a valid file path

class Img:def __init__(self, storagePath, imgName):self.storagePath = storagePathself.imgName = imgNamedef getFullPath(self):return os.path.join(self.storagePath, self.imgName)

Define a class for image to text mode:

Mode: Type of text to recognize. Possible values include: ‘Handwritten’ and ‘Printed’

isRaw: If returns true, will return the direct response alongside the deserialized response

class ImgToTextMode:def __init__(self, mode, isRaw):self.mode = modeself.isRaw = isRaw

Step 4 Initiate Variables

Next we will initiate the variables from the custom classes created above.

cogConfig = CognitiveConfig(“https://eastus.api.cognitive.microsoft.com/", “<your_Cognitive_service_key>”)img = Img(<your_Azure_data_storage_folder_path>,”medical_form.jpg”)mode = ImgToTextMode(“HANDWRITTEN”, True)

Step 5 Define image to text function

In this function, we first initiate the “ComputerVisionClient” and then pass the image in stream format to the function called “recognize_text_in_stream”. The computer vision client requires the account holder’s provisioned cognitive service endpoint and the subscription key of the account holder. The client object is primary agent for calling the computer vision service. Since we are using the OCR functionality, that requires we call the “recognize_text_in_stream” method.

The timer here is simply to give Computer Vision some time to process image, you could also use a while loop checking status

def retrieve_text_from_img(cogConfig, img, imgToTextMode):client = ComputerVisionClient(cogConfig.url, CognitiveServicesCredentials(cogConfig.key))with open(img.getFullPath(), “rb”) as image_stream:txt_analysis=client.recognize_text_in_stream(image_stream, mode = imgToTextMode.mode, raw = imgToTextMode.isRaw)headers = {‘Ocp-Apim-Subscription-Key’:cogConfig.key}url = txt_analysis.response.headers[‘Operation-Location’]time.sleep(20)return json.loads(requests.get(url, headers = headers).text)

Step 6 Call Service Function:

words = retrieve_text_from_img(cogConfig, img, mode)print(words)

If you execute the code outlined in the above steps, you will receive a JSON formatted output from the computer vision service. Now Let’s observe the output of the call. The output is structured to separate each bounding box with coordinates and text (The output is truncated for space). Notice the highlighted text that matches our written text.

‘status’: ‘Succeeded’,
‘recognitionResult’: {
‘lines’: [
‘boundingBox’: [783, ...525],
‘words’: [
‘boundingBox’: [776,..521],
‘text’: ‘Patient’},
‘boundingBox’: [1032, ...526],
‘text’: ‘Medical’
‘text’: ‘Patient Medical History Form
‘boundingBox’: [325...627],
‘words’: [
‘boundingBox’: [326...625],
‘text’: ‘Please’
‘text’: ‘Please provide your name and date of birth
‘boundingBox’: [327...709],
‘words’: [...],
‘text’: ‘First Name :”’
‘boundingBox’: [...],
‘words’: [..],
‘text’: ‘Reason for seeing the doctor*
`boundingBox’: [..],
‘words’: [...],
‘text’: ‘I hurt my leg playing basketball yesterday .’}, …

There you have it! All the questions and answers on the patient’s questionnaire are parsed nicely in JSON format. The next blog will describe how to parse the JSON output text and put into a structured table format to be stored in your mounted data store. Clap it if you find it’s helpful or feel free to leave your questions;)

Slalom Technology

Thought leadership from technologists at Slalom.

Thanks to Shannon Montanez and Santosh Iyer

Ran Gong

Written by

Ran Gong

Consultant @Slalom

Slalom Technology

Thought leadership from technologists at Slalom.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade