My Machine Learning Diary: Day 57

Junhong Wang
3 min readDec 14, 2018

--

Day 57

Today I had an overview on some cool applications of machine learning.

Photo Optical Character Recognition (OCR)

Photo OCR is a technique to recognize texts from an image.

Photo OCR recognizes four texts from an image

Pipeline

A machine learning pipeline is a technique to break down the problem into smaller pieces. This is very useful in terms of problem solving and working in a team. For photo OCR, the pipeline looks as follow:

Photo OCR Pipeline

Text detection

In text detection stage, given an image, we need to figure out where the texts are. We use a technique called sliding window to check a small rectangular regions of given image. Then we repeat the process all over the image. Finally, we amplify the signal and put a rounding box around the white regions.

Text Detection

Character Segmentation

In character segmentation stage, we want to separate the text to chunks of characters. Again, we will use sliding window to do this.

Character Segmentation

Character Recognition

In character recognition stage, we simply classify the image into a character.

Character Recognition

Artificial Data Synthesis

With low bias algorithms, we can improve the accuracy with more data. Given a data set, we can synthesize more data. For photo OCR, you can combine a random font character with a random background.

Synthesize w/ Random Fonts

Another way to synthesize the data for photo OCR is to add distortions.

Synthesize w/ Distortions

Ceiling Analysis

Let’s say we have accuracy of 72% for photo OCR. Which part of the pipeline should we invest our time the most to boost this accuracy? Ceiling analysis is a technique for finding this out.

Ceiling Analysis

We choose one component of the pipeline and manually set the accuracy to 100%. Namely, if we choose the text detection, we go to the test case and manually set the bounding boxes. Then we check how much it increases the accuracy. We repeat the process for each component. I’m not quite understanding this part well. I will dig deeper into this when I actually encounter this.

That’s it for today. I finally finished this Coursera ML! I will start another course tomorrow!

--

--

Junhong Wang

I'm Junhong. I'm a Software Engineer based in LA. I specialize in full stack web development and writing readable code. junhong.wang