A small introduction of hand-written recognition
Optical character recognition is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. Normally, the system is given several grayscale images as training data and testing data. Through binarizing images, component labeling, component analysis and feature extraction, we will obtain final recognition of test images to evaluate the overall performance of the system. I’ll first write about some background of those procedures and then talk more in detail about my understanding and learning in each of those steps. Finally, I’d like to summarize the key factors that affect the overall recognition rate of hand-written characters.
In computer vision, every image is an array of numbers ranging from 0 to 255. For the grayscale image, the brighter the pixel is, the bigger the value of that pixel. To binarize an image is to divide the image into background part and foreground part and to set the value of background pixel as 0, the value of the foreground pixel as 1. The background part is assumed to contain no information about target objects that we want to recognize from the image. The essential idea behind binarization is that if we can find a threshold, then we can classify the pixels whose value is smaller than the threshold as foreground part and the rest as background. Thus, finding an optimal threshold is the key to binarize an image. Another benefit of thresholding is that through thresholding, we can remove some noise that interferes with our recognition of characters.
After binarizing images, we need to explore further about the image, especially the foreground part. Each connected-component is a self-connected object that has no connection with other objects in the image. We need to label different connected-components within an image because each component is possibly a character. There are several ways to identify a component and even further draw a contour of it and calculate its geometry properties. Those geometric properties are useful for a successful recognition.
Components analysis helps detect the problems of the work that has been done so far. For example, given an image, we know how many characters in total are in this image. However, due to thresholding or noises, the number of components may be smaller or larger than the number of characters. To solve these problems, we need to know the properties of those abnormal components as well as normal components, then find ways to get rid of abnormal components and to strengthen normal components.
To recognize a character, we need to identify its special properties. The feature is a special kind of property of a character that acts as a character descriptor. It’s special because it is invariant with image translation, scaling, orientation. For example, when a character is rotated 90 degrees, the feature of that character stays unchanged. Thus features play an important role in recognition.
In the recognition phase, we’re given some test images to evaluate the performance of training. To get the recognition results of test images, we need to do the exact same things described above to test images with the exact same parameters like thresholding. Then we will get two feature sets, one from the training images and another from the test images. As mentioned above, the feature is the descriptor of a character. If one feature from test images looks like one feature from training images, they’re probably describing the same character. There are several ways to quantize the similarity between two features.