OCR Conversion in UIPath — Sedin Technologies

Manisha Jena
Sedin Technologies
Published in
4 min readOct 23, 2019

UI PATH: HOW TO USE OCR (OPTICAL CHARACTER RECOGNITION) IN REAL TIME

The OCR Activity is the most used activity nowadays for extracting content from the website, Image, Scanned PDF, Hand Written Text and so on.

Extracting information or data from images, scanned documents, or PDFs is a very tedious job. Normal activities are not recommended for extracting these types of inputs. OCR uses a different method and approach to extract the information.

There are mainly two types of OCR available in UI Path Studio:

These OCRs are available as the individual activities and also used internally in the screen scraping tool. You can select the required OCR according to the purpose. We will discuss about them in detail in this blog further.

Microsoft’s OCR is known as MODI, and Google’s OCR is called Tesseract. OCR is not limited to only these two types of OCR. You are free to use another type of OCR. There are many different flavors of OCR available like third party activities.

Fig. — OCR engines in UI Path

MICROSOFT OCR:

It accepts only the image variables on which we want to perform our OCR activities like GET OCR TEXT etc.

  • None: Does not apply a Pre-processing profile.
  • Screen: Pre-processing suitable for remote desktop applications.
  • Scan: Pre-processing suitable for scanned files.
  • Legacy: Uses the engine’s default settings for Pre-processing images, this is the default option.
  • Multiple languages are supported by default.
  • It is suitable for extracting text from a large area and works very fine if the scale is increased.

Google’s OCR is called Tesseract.

The Properties of the Tesseract OCR are same as the Microsoft OCR but some more options are given for Tesseract OCR Engine.

These are the other options available for Tesseract OCR which are not present for Microsoft OCR.

  • Multiple language support can be added in Google OCR.
  • It is suitable for extracting the text from a small area.
  • It has full support for color inversion.
  • It can filter only allowed characters.

Microsoft Azure Computer Vision OCR:

This OCR uses the Microsoft Azure Computer Vision OCR engine for extracting the specified string from the image.

This OCR engine is capable of extracting the text even if the image is non classified image like contains hand written text, graphs, images etc.

  • It works perfectly for the classified images without any issues.
  • It even works decent if the image is non classified.
  • I used for the extraction of the scanned hand written text and its accurate.
  • We can use the computer vision features if we have Azure account, then the API key and End point pretty easy to get.

Microsoft Project Oxford Online OCR:

It extracts a string and its information from an indicated UI element or image using the MODI Microsoft Cloud OCR engine. It can be used with other OCR activities, such as Click OCR Text, Hover OCR Text, Get OCR Text etc.

This OCR connects with the Microsoft Cloud for performing the extracting features of the OCR. It helps in the more specific extraction of the text and the position of the text.

Google Cloud Vision OCR:

It extracts a string and its information from an indicated UI element or image using the Google Cloud OCR engine.

It gives faster and precise results when compared with the Tesseract OCR engine and is connected with the Cloud.

ResizeToMaxLimitIfNecessary: When selected, the engine attempts downsizing the target image so that it does not exceed the size limit of the Google Cloud Vision engine. By default, this check box is cleared.

It works same as the Microsoft Cloud OCR and works better on the smaller images and comparatively faster than the Microsoft OCR.

This OCR is the third party OCR which is famous for extracting the text more accurate and faster than the other OCR’s available and with many options even for the different kinds of documents.

Correct Orientation: If selected, the page orientation is detected by the engine, and if needed, is corrected automatically. By default, this check box is selected.

Correct Skew: Detects whether the page is skewed and automatically corrects it. The drop-down contains three options,

  • Auto — deskews only images that are detected as being skewed.
  • Yes — forces deskew on all pages.
  • No — does not automatically deskew pages
  • By default, this property is set to Auto.

Custom Recognition Profile Path: The full path to a custom built Recognition Profile. This field supports only strings and String variables.
FineReader Version: Specifies which version of the Fine Reader Engine is to be used. The options are FineReader Engine Predefined Recognition Profile: Specifies the Predefined Recognition Profile that is to be used when processing an image. This field supports only strings and String variables. The Predefined Recognition Profiles present in ABBYY are present in this link. 11 and FineReader Engine 12. By default, this property is set to FineReader Engine 12.

The other properties are similar to the other OCR’s that are available in UI path.

  • This OCR helps in giving accurate and fast results.
  • It contains features for converting the TIFF and JPEG into searchable PDF and PDF/A, and extract data or text from photos or screenshots.
  • It can support multiple languages effectively and accurately.
  • ABBYY FineReader Engine SDK is required.
  • The engine only works with a license distributed by the UI Path sales department.

ABBYY Cloud OCR:

This OCR is accessible only when subscribe to the ABBYY Cloud and then we can use the features given by the ABBYY Cloud platform.

This OCR engine gives better result and has many options or features to perform on the different type of documents.

CONCLUSION:

  • Among all the OCR engines the Cloud OCR engines produce accurate results.
  • These OCR engines are also used with other OCR activities (Click OCR Text, Hover OCR Text, Double Click OCR Text, Get OCR Text, Find OCR Text Position).
  • These OCR are used in the recording wizards like Screen Scrapping, Citrix etc.,
  • Accordingly, the best OCR engine with many options and fast and accurate is ABBY OCR engine and Microsoft Azure computer vision OCR engine.

Blog Credits: Vashisht Devasasi- RPA Consultant

Originally published at https://sedintechnologies.com on October 23, 2019.

--

--