My Machine Learning Diary: Day 57
Today I had an overview on some cool applications of machine learning.
Photo Optical Character Recognition (OCR)
Photo OCR is a technique to recognize texts from an image.
Pipeline
A machine learning pipeline is a technique to break down the problem into smaller pieces. This is very useful in terms of problem solving and working in a team. For photo OCR, the pipeline looks as follow:
Text detection
In text detection stage, given an image, we need to figure out where the texts are. We use a technique called sliding window to check a small rectangular regions of given image. Then we repeat the process all over the image. Finally, we amplify the signal and put a rounding box around the white regions.
Character Segmentation
In character segmentation stage, we want to separate the text to chunks of characters. Again, we will use sliding window to do this.
Character Recognition
In character recognition stage, we simply classify the image into a character.
Artificial Data Synthesis
With low bias algorithms, we can improve the accuracy with more data. Given a data set, we can synthesize more data. For photo OCR, you can combine a random font character with a random background.
Another way to synthesize the data for photo OCR is to add distortions.
Ceiling Analysis
Let’s say we have accuracy of 72% for photo OCR. Which part of the pipeline should we invest our time the most to boost this accuracy? Ceiling analysis is a technique for finding this out.
We choose one component of the pipeline and manually set the accuracy to 100%. Namely, if we choose the text detection, we go to the test case and manually set the bounding boxes. Then we check how much it increases the accuracy. We repeat the process for each component. I’m not quite understanding this part well. I will dig deeper into this when I actually encounter this.
That’s it for today. I finally finished this Coursera ML! I will start another course tomorrow!