OCR Data Annotation Case Study

As a very important research direction in the field of computer vision, OCR technology involves a variety of application fields. Nowadays, many products have appeared in various application fields, including card identification, ticket identification, structured text information in the video, text recognition in natural scenarios, etc.

The OCR is to identify the corresponding text content( including different languages and other character types) from various kinds of images. OCR is mainly used as an eye filter for AI to identify corresponding characters from unstructured data, then conduct the subsequent processing, such as scanning and translation.

Let’s share an example of OCR annotation

Content: Luggage tag + Box Delivery sticker OCR Annotation

Detection annotation: Label the four corner points of the field, with the first point in the upper left corner and then the upper right, in the clockwise direction.

Identification: transcribe the content of each string: eg. Company name, Po Order, Weigh, Delivery address)

1、 For curved text, label the outline with even points

2、 The occluded text needs to be imagined

3、 When the distance between the text is close, the box can be overlapped

4、 The occluded or blurred text depends on whether it is recognizable through pure eyes

5、 Label per line. Those that are not in the same line or read in a different order cannot be marked together

6、 In the same line, two words in distance should be labeled separately


