What is the status quo of OCR? And how about its future?

artificial intelligence
6 min readNov 18, 2022

--

1. The definition of OCR

OCR (Optical character recognition) uses techniques to distinguish printed or handwritten text characters in digital images of physical documents, such as scanned paper documents. The basic process of OCR involves examining the text of the document and converting the characters into code that can be used for data processing. Sometimes, OCR is called text recognition. An OCR system features a combination of hardware and software. The system’s goal is to scan the text of a physical document and translate the characters within that document to a code that’s then used for data processing. Think of this in context of postal and mail sorting services — OCR is core to their ability to operate quickly in processing destination and return addresses to sort mail faster and more effectively. The system does this in three steps.

1.1 Image Pre-processing

In step one, the hardware (usually an optical scanner) processes the physical form of the document into an image — such as an image of an envelope. The goal of this step is for the machine to be accurate in its rendition, but also to remove any unwanted distortions. The resulting image is converted to a black and white version, which is then analyzed for light areas (background) versus dark areas (characters). The OCR system may also categorize the image into separate elements if needed, such as tables, text, or inset imagery.

1.2 Intelligent Character Recognition

AI analyzes the dark areas of the image to identify letters and numbers. Typically, AI targets one character, word, or block of text at a time using one of the following methods. The first one is pattern recognition.Teams train the AI algorithm on a variety of text, text formats, and handwriting. The algorithm compares the characters on the scanned envelope image to the characters it has already learned in order to identify matches. The second one is feature extraction. To recognize new characters, the algorithm applies rules regarding specific character features. Features may include the number of angled, crossed, or horizontal lines and curves in a character. An “H” for example has two vertical lines and one horizontal in between; the machine will use those feature identifiers to identify all “H”s on the envelope. After the machine has identified the characters, they’re converted to an ASCII code that can be used for further manipulations.

1.3 Post-processing

In step three, AI corrects errors in the resulting file. One method is to train the AI on a specific lexicon of words that will be found in the document. Restrict the AI’s output to only those words/format to ensure no interpretations fall outside of the lexicon. There are numerous applications of OCR; any business managing physical paperwork stands to benefit from its usage.

2. OCR can foster the efficiency of recognizing images

Text can be recognized with a scan, which is a new function that has appeared in many applications in recent years. For example, when we enter a bank card number, we can scan it directly with the mobile phone camera. And the software can extract the bank card information. The technology used here is Optical character recognition. OCR is the abbreviation of Optical character recognition, which refers to the use of a machine to convert handwritten or printed text in an image into a format that can be directly processed by a computer. As an important branch in the field of computer vision, the typical application of OCR is to input information through image character recognition. At the same time, it can help the machine to understand the image better because of words and symbols contain rich semantic information that is based on OCR. The white paper states the current domestic OCR industry in detail from the dimensions of OCR development background, technological evolution, industrial development status, technology standardization, development trend and so on. This comprehensively promotes the OCR technology industrialization to accelerate the landing and sustainable development.

3. The application fields of OCR

According to the recognition scene, OCR can be roughly divided into two categories. The first one is the special OCR, which can identify specific scenes. The second one is the general OCR, which can identify multiple scenes. For example, ascendant document recognition and license plate recognition are the typical examples of dedicated OCR. General OCR can be used in more complex scenarios. But it also has greater application potential. However, the general OCR is more difficult. The reason is that the scene of the general picture is not fixed and the text layout is diverse. According to the content of the identified picture, the scene can be divided into two categories. The first one is the simple scenes with clear and fixed patterns. The second one is the more complex natural scenes. In natural scenes, the difficulty of text recognition is extremely high. The background of the picture is extremely rich. And it is often faced with problems, such as low brightness, low contrast, uneven illumination, perspective distortion, incomplete occlusion and so on. Moreover, there may be some problems in the layout of the text, such as distortion, fold, commutation and so on, in which the text may have various fonts and different colors. Therefore, the text recognition technology in natural scenes is often listed as scene character recognition technology.

4. The principles of OCR

For OCR,the first step is to use a scanner to process the physical form of the document. After copying all the pages, the OCR software converts the document to the black-and-white version. The bright and dark areas are analyzed in the scanned image or bitmap, where the dark area is identified as the character to be recognized and the bright area is identified as the background. Then, the black area is further processed to find letters or numbers. The techniques of OCR programs may be different. But they usually target only one character, word, or block of text at a time. Then, it is essential to use one of the following two algorithms to recognize characters.

4.1 Feature detection

The OCR program applies rules about the characteristics of specific letters or numbers to identify characters in a scanned document. Features can include the number of angle lines, cross lines or curves in the characters used for comparison. For example, the capital letter A can be stored as two diagonal lines that intersect a horizontal line across the middle.

4.2 Pattern recognition

The OCR program is provided with examples of text in various fonts and formats, which are used to compare and recognize characters in scanned documents. When a character is recognized, it is converted into an ASCII code, which the computer system can use to handle further operations. Before saving the document for future use, users should correct basic errors, proofread and ensure that complex layouts are handled correctly.

5. The future development trend of OCR character recognition

At present, the development technology of OCR mainly carries out image cleaning, stain removal and image correction from image processing. And then it analyzes the image and text, such as text cutting, image-text separation and so on. Finally, the black-and-white binary method is used to obtain binary coding. But the black-and-white binary method and the method of text feature extraction become the key to affect the recognition rate of OCR text. Therefore, text feature extraction is mainly a statistical feature extraction method. When an image text is divided into several regions, the number of black and white points in multiple areas of a text cut is linked to become a combination of spatial numbers. This algorithm is the mainstream algorithm of OCR text features and the text recognition rate can almost reach more than 95%. But for us, the characteristics of Chinese characters are evolved from pictographic characters. So, we can also extract the features of Chinese characters from the strokes of Chinese characters. No matter which recognition algorithm is used, it is necessary to compare the standard coded binary text database after recognition. After the input text calculates the features, there must be a comparison between the database or the feature database. The content of the database should contain all the character sets to be recognized and the feature group that is obtained by the same feature extraction method as the input text. The accuracy of the standard coding database will also directly affect the accuracy of OCR character recognition. Therefore, the future development trend of OCR character recognition technology, on the one hand, will be more accurate in the character coding database. on the other hand, we will transform from one algorithm to multiple algorithms. We can also use a variety of algorithms to compare the text recognition results. And finally, we can choose the best character recognition results. This will greatly improve the recognition rate of OCR characters.

For more information, please check: https://en.speechocean.com/Cy/587.html

--

--