Optical Character Recognition with Azure

Senura Vihan Jayadeva
Mar 10 · 6 min read

Hello, I’m Senura Vihan Jayadeva. In this article, I will guide you about the Azure OCR(Optical Character Recognition) cloud service.

First of all, let’s see what is Optical Character Recognition?

OCR (optical character recognition) is the use of technology to distinguish printed or handwritten text characters inside digital images of physical documents, such as a scanned paper document. The basic process of OCR involves examining the text of a document and translating the characters into code that can be used for data processing. OCR is sometimes also referred to as text recognition.

Optical Character Recognition can be used for a variety of applications, including:

  • Scanning printed documents into versions that can be edited with word processors, like Microsoft Word.
  • Automating data entry, extraction, and processing.
  • Deciphering documents into text that can be read aloud to visually-impaired or blind users.
  • Archiving historic information, such as newspapers, magazines, or phonebooks, into searchable formats.
  • Electronically deposit cheques without the need for a bank teller.
  • Placing important, signed legal documents into an electronic database.
  • Recognizing text, such as license plates, with a camera or software.
  • Translating words within an image into a specified language.

Computer Vision includes Optical Character Recognition (OCR) capabilities. You can use the new Read API to extract printed and handwritten text from images and documents. It uses deep learning-based models and works with text on a variety of surfaces and backgrounds. These include business documents, invoices, receipts, posters, business cards, letters, and whiteboards

Briefly, when we upload an image or document they extract the text lines, words, and metadata using deep learning. Then it will be structured to a JSON output. After all, we can get the result by calling an endpoint.

Right, Let’s start with creating a resource.

First, you have to click on the Create a resource and then search Computer Vision

Then click on the Create button

Next, you have to fill those input fields and after that click on the Review + create button.

After successful deployment, click on the Go to Resources and you will navigate to a page like below.

Now go to API Console. Then we will navigate to a page with APIs documentation. Here you can see a lot of services such as Analyze Image, Describe Image, Detect Objects, OCR and etc. Right in order to exact text from an image we have to go with OCR.

Right, Here you can see full documentation about the API.

This is the Endpoint you have to call.

https://{endpoint}/vision/v3.0/ocr[?language][&detectOrientation]

{endpoint} : this is based on the location you had been selected.

For example, if selected centralus as location when you create the resource {endpoint} will be like centralus.api.cognitive.microsoft.com

then as Request parameters, you have to provide the language and detectOrientation. These parameters are actually optional.

According to azure computer vision documentation, it supports 27 different languages.

Supported languages:

  • unk (AutoDetect)
  • zh-Hans (ChineseSimplified)
  • zh-Hant (ChineseTraditional)
  • cs (Czech)
  • da (Danish)
  • nl (Dutch)
  • en (English)
  • fi (Finnish)
  • fr (French)
  • de (German)
  • el (Greek)
  • hu (Hungarian)
  • it (Italian)
  • ja (Japanese)
  • ko (Korean)
  • nb (Norwegian)
  • pl (Polish)
  • pt (Portuguese,
  • ru (Russian)
  • es (Spanish)
  • sv (Swedish)
  • tr (Turkish)
  • ar (Arabic)
  • ro (Romanian)
  • sr-Cyrl (SerbianCyrillic)
  • sr-Latn (SerbianLatin)
  • sk (Slovak)

detectOrientation is detecting the text orientation in the image. With detectOrientation=true the OCR service tries to detect the image orientation and correct it before further processing (e.g. if it’s upside-down).

Example Requested Endpoint :-

https://centralus.api.cognitive.microsoft.com/vision/v3.0/ocr?language=unk&detectOrientation=true

Now as headers we have to provide Content-Type and Ocp-Apim-Subscription-Key.

Give application/json as the Content-Type because we are giving the image URL in the request body. Therefore you have to first upload the image into some cloud storage ( Azure Storage ) and then get the link. This link will be used in the request body.

Next under the RESOURCE MANAGEMENT, you can see a tab called Keys and Endpoint. There you can get the value for Ocp-Apim-Subscription-Key.

After that in the Request body as I mentioned earlier you have to provide the image link that you are going to exact the text.

Example:-

{“url”:”https://miro.medium.com/max/3000/1*6vZNaPXXZRXAZDGb2vGZ2g.png"}

Right, Now click on the Send button to exact the text from the image. Then Azure OCR will analyze the image and give a response like below.

The OCR results in the hierarchy of region/line/word. The results include text, bounding box for regions, lines, and words.

Following are got from their documentation. Therefore this will be helping you to get some idea about the response format.

textAngle
The angle, in radians, of the detected text with respect to the closest horizontal or vertical direction. After rotating the input image clockwise by this angle, the recognized text lines become horizontal or vertical. In combination with the orientation property it can be used to overlay recognition results correctly on the original image, by rotating either the original image or recognition results by a suitable angle around the center of the original image. If the angle cannot be confidently detected, this property is not present. If the image contains text at different angles, only part of the text will be recognized correctly.

orientation
Orientation of the text recognized in the image, if requested. The value (up, down, left, or right) refers to the direction that the top of the recognized text is facing, after the image has been rotated around its center according to the detected text angle (see textAngle property).
If detection of the orientation was not requested, or no text is detected, the value is ‘NotDetected’.

language
The BCP-47 language code (user-provided or auto-detected) of the text detected in the image.

regions
An array of objects, where each object represents a region of recognized text. A region consists of multiple lines (e.g. a column of text in a multi-column document).

lines
An array of objects, where each object represents a line of recognized text.

words
An array of objects, where each object represents a recognized word.

boundingBox
Bounding box of a recognized region, line, or word, depending on the parent object. The four integers represent the x-coordinate of the left edge, the y-coordinate of the top edge, width, and height of the bounding box, in the coordinate system of the input image, after it has been rotated around its center according to the detected text angle (see textAngle property), with the origin at the top-left corner, and the y-axis pointing down.

text
String value of a recognized word.

Hope you learn something from this article.Thank you!.

MS Club of SLIIT

A student-driven community based on SLIIT aiming to bridge the skill gap between an Undergraduate and an Industry Professional

Senura Vihan Jayadeva

Written by

Software Engineering undergraduate of Sri Lanka Institute of Information Technology | Physical Science Undergraduate of University of Sri Jayewardenepura

MS Club of SLIIT

A student-driven community based on SLIIT aiming to bridge the skill gap between an Undergraduate and an Industry Professional