Improve reading order of Google OCR

giangpham
SK Geek
Published in
3 min readDec 20, 2019

An introducing

Read Aloud is a new project of my company, it helps blind people can understand the information from a paper by reading them.

The mission is simple, detect text from paper then read them.
For the real-time purpose, some offline library comes up.
But after trying them all, the result is not good.
Then we go online, and Google OCR is the best one.

Use Google OCR and Text To Speed (TTS) we make all things up.
But it’s not just ended here as a happy story.

Problem

Google OCR is a very strong API for text detection, the problem is about reading order.
If you just detect a text block then it’s fine.
But with the whole page with many segments, the reading order will fall down the hole.

Google OCR result, the green box is the text segment

After looking closely at the result of Google-OCR, I see that all the detected-words go with its location on the Image.

With that, I can do sort the reading order on my own.

The way I go

First I draw all the words box on a blank image.

black and white paper with only words

Recognize that nearing-words will be regular in the same segment.
I increase the size of each word with the appropriate value before drawing them, to make the nearing words become connected.

blobs image

Now I got many blobs on my paper, and each blob is a candidate for a segment.

Some filter is applied to remove small blobs and merge overlap blobs.
The rest blobs will become the segment.

Segments image

Then go through all words and put them to the right segments.

Now we got the right segments with the right words.
The last thing to do is sort segments, then sort words in each segment.

Sort segment

First, I sort segments in vertical direction, from top to bottom.
Then separate segments into lines.

In each line, I sort segments in horizontal direction, from left to right.
Then separate segments into columns.

In each column, I sort segments in vertical direction, from top to bottom.

Do separate segment to lines, columns

Then I try to merge 2 continuous lines together if the column inside them matches together. (in this sample, there is no line got merge)

But assume the result of the segment like below.

Sample the case which needs to merge lines

Then line 4 and line 5 will be merged.

After these all steps, I got the order of all segments.

Sort word

Sort words are pretty same as with segments but more simple.
Sort words in vertical direction, from top to bottom.
Then separate words into lines.
In each line, I sort words in horizontal direction, from left to right.

Bingo we are done.

Improve result

Of course, it cannot apply for all cases in the real world, but rather much case especially for paper layout.

--

--