Configuring Online Services Easily with OCR and TTS — Part 1 (Fusing OCR & CSS services)

Published in

NAVER Cloud

8 min readJan 26, 2021

Hello, this is NAVER Cloud Platform!

Today, we will connect NAVER Cloud Platform’s OCR service its CSR product, a text-to-speech (TTS) service, to show you how to implement your own online TTS service!

Let’s briefly look at the service we are going to implement. First of all, OCR stands for Optical Character Recognition. Today we’d like to implement a simple service that will read out the text contained within an image file with the voice of a voice actor. As shown in <Figure 1> below, the image file containing the text is transmitted through the API Gateway. You can create a service that delivers the image to an OCR service through Invoke URL, saves the analyzed text as a file, and reads it out in an actor’s voice through a TTS service.

<Figure 1> Creating a non-contact service using OCR & TTS in NAVER Cloud Platform

Looking into NAVER Cloud Platform’s OCR Service

“ICDAR Robust Reading Competition” is the most prestigious competition in the field of OCR. AI companies such as Alibaba, Tencent, and NAVER participate every year, and in 2018, NAVER’s CLOVA OCR won First Place in four categories.

<Figure 2> NAVER broke the record in character recognition, surpassing the world’s strongest player, China.

The CLOVA OCR technology papers have been accepted by CVPR and ICCV, the world-renowned conferences in the field of AI, and it is recognized as being in the top 4% at ICCV. CLOVA OCR’s technology is readily available to users through NAVER Cloud Platform, since January 16, 2020.

NAVER Cloud Platform enables CLOVA OCR to operate its service stably on a cloud environment. As shown in <Figure 3>, you can see that CLOVA OCR provides a technology that analyzes text in images accurately and quickly.

CLOVA OCR is a technology that locates the text in images or photographs and recognizes characters, and features a unique set of character detection and recognition techniques to efficiently recognize various types of characters. It also lets you easily create a template and quickly extract the characters you need by defining a selection area.

The OCR service recognizes documents and accurately extracts text and data from a user-defined area to improve the quality of character recognition services.

Documents are automatically categorized through various templates provided by NAVER Cloud Platform’s OCR service, and the preset image process recognizes the working area before performing image adjustment and correction processes. By learning, it can detect the predefined area, and then accurately recognize and extract the text within the images.

<Figure 4> NAVER Cloud Platform’s OCR process

NAVER Cloud Platform’s OCR service has three main superior features.

(1) Precise Data Extraction
OCR understands characters of various shapes and forms through extensive data learning. For this, we have our own character text area detection and recognition technology, and provide this feature to quickly extract only the necessary characters after creating a template and selecting the area to be recognized.

(2) Differentiated Model Utilizing NAVER’s AI Technology
Utilizing NAVER’s CLOVA AI technology, the CLOVA OCR service features a high-performance OCR model optimized for major business purposes. It supports Korean, English, and Japanese, and provides a high level of hand-written character recognition as well as printed characters to support technology to suit the actual user environment. Especially on the OCR template, you can define a recognition area to extract the text as formatted value and receive it in a formalized document.

(3) Automated Document Processing, Made More Convenient
Using an OCR service, automated classification becomes available for documents that previously required manual classification, according to the verified similarity compared to preregistered templates. Therefore, more efficient workflow can be designed where documents are classified with minimal user intervention.

It is difficult to open the images piled up in a folder one by one and check them with your eyes, but by using OCR, metadata is automatically saved for convenient data management and utilization, further increasing the value of the data. NAVER Cloud Platform’s OCR service accurately analyzes the image through the RESTFul API received, and processes and delivers the text extraction result in JSON (JavaScript Object Notation) format.

We are going to provide various types of image recognition and text extraction services as shown in <Table 1> below, and continuously update the function to extract the various objects contained in images.

<Table 1> Services provided by NAVER Cloud Platform’s OCR

Text Extraction through NAVER Cloud Platform OCR API

Introducing Main APIs

You can easily and quickly learn how to use NAVER Cloud Platform’s OCR service through the manual in <Figure 7>.

1. OCR User Manual

<Figure 5> is a manual for the general use of NAVER Cloud Platform’s OCR service.

You will find detailed information about the portal and Console for using the service in the manual.

<Figure 5> NAVER Cloud Platform OCR User Manual (Source: https://docs.ncloud.com/ko/ocr/ocr-1-1.html)

2. OCR API Linkage Manual

<Figure 6> is a manual for linking APIs in NAVER Cloud Platform’s OCR service. You will find the detailed description about the formats and examples of the API linkage for service implementation.

<Figure 6> NAVER Cloud Platform OCR API Linkage Manual (Source: https://docs.ncloud.com/ko/ocr/ocr-1-2.html)

3. OCR API Invoke Manual

<Figure 7> is a manual for invoking APIs in NAVER Cloud Platform’s OCR service. You will find the detailed description about the definitions of the API that needs to be invoked when implementing service, along with examples.

<Figure 7> NAVER Cloud Platform OCR API Invoke Manual (Source: https://docs.ncloud.com/ko/ocr/ocr-1-4.html)

4. OCR Custom API Specification.

This is the manual for the using custom APIs in NAVER Cloud Platform’s OCR service.

<Figure 8> NAVER Cloud Platform OCR Custom API Manual (Source: https://apidocs.ncloud.com/ko/ai-application-service/ocr/ocr/)

You can implement and provide more reliable service APIs by linkage with API gateway products that provide enhanced usability such as user API security and monitoring. Please refer to <Figure 9> and <Figure 10>.

<Figure 9> NAVER Cloud Platform API gateway (Source: https://docs.ncloud.com/ko/apigw/apigw-1.html)

<Figure 10> NAVER Cloud Platform API gateway architecture

Request Subscription for OCR Service

1. Read OCR Terms of Service and Request Subscription

You can use the OCR service after you have read and accepted NAVER Cloud Platform OCR Terms of Service.

<Figure 11> NAVER Cloud Platform OCR Service Terms of Service and request for subscription

2. Create and Register Domain

NAVER Cloud Platform’s OCR service provides two main categories of service, General and Template. We are going to implement a very general service and provide it through an API in this exercise. To being, select General and create a domain.

With a few clicks, we have completed creating an OCR domain as shown in <Figure 13>.

<Figure 12, 13> NAVER Cloud Platform OCR domain registration

3. Register Text OCR

Click the [Integration Setting] button based on the domain created in <Figure 13>. You can easily configure an API service through automatic API gateway linkage. (If more detailed setting for the API gateway is needed, it can be created through manual linkage.)

<Figure 14> Creating password and registering API gateway invoke URL

The password (Secret Key) and API Gateway URL created in <Figure 14> will be required for the service we are going to implement here, so you can use the copy buttons when needed.

4. Check for Registration of API Gateway

Check if the text OCR service is properly registered in the API gateway. Although there weren’t any settings made in the API gateway, you can see the information on the created API through OCR automatic linkage.

Verifying Operation of OCR API and Running Tests

For the test web client, the user can choose from among the test tools that they like to use. In this exercise, we ran the test with a commonly used tool, Postman.

<Figure 16> Run Postman and type in header information

<Table 2> Example of OCR API header key/value

In the Request Body field, enter as shown in <Figure 17> and save.

<Code 1>

{
     "images": [
       {
         "format": "png",
         "name": "medium",
         "data": null,
         "url": "https://kr.object.ncloudstorage.com/maso-storage/source_image.png"
       }
     ],
     "lang": "ko",
     "requestId": "string",
     "resultType": "string",
     "timestamp": {{$timestamp}},
     "version": "V1"
 }

Complete the request body field and click [Send],

then you can see, as shown in <Figure 18>, that the text in the test image is analyzed and the result is organized in JSON format.