Configuring Online Services Easily with OCR and TTS — Part 1 (Fusing OCR & CSS services)
Hello, this is NAVER Cloud Platform!
Today, we will connect NAVER Cloud Platform’s OCR service its CSR product, a text-to-speech (TTS) service, to show you how to implement your own online TTS service!
Let’s briefly look at the service we are going to implement. First of all, OCR stands for Optical Character Recognition. Today we’d like to implement a simple service that will read out the text contained within an image file with the voice of a voice actor. As shown in <Figure 1> below, the image file containing the text is transmitted through the API Gateway. You can create a service that delivers the image to an OCR service through Invoke URL, saves the analyzed text as a file, and reads it out in an actor’s voice through a TTS service.
Looking into NAVER Cloud Platform’s OCR Service
“ICDAR Robust Reading Competition” is the most prestigious competition in the field of OCR. AI companies such as Alibaba, Tencent, and NAVER participate every year, and in 2018, NAVER’s CLOVA OCR won First Place in four categories.
The CLOVA OCR technology papers have been accepted by CVPR and ICCV, the world-renowned conferences in the field of AI, and it is recognized as being in the top 4% at ICCV. CLOVA OCR’s technology is readily available to users through NAVER Cloud Platform, since January 16, 2020.
NAVER Cloud Platform enables CLOVA OCR to operate its service stably on a cloud environment. As shown in <Figure 3>, you can see that CLOVA OCR provides a technology that analyzes text in images accurately and quickly.
CLOVA OCR is a technology that locates the text in images or photographs and recognizes characters, and features a unique set of character detection and recognition techniques to efficiently recognize various types of characters. It also lets you easily create a template and quickly extract the characters you need by defining a selection area.
The OCR service recognizes documents and accurately extracts text and data from a user-defined area to improve the quality of character recognition services.
Documents are automatically categorized through various templates provided by NAVER Cloud Platform’s OCR service, and the preset image process recognizes the working area before performing image adjustment and correction processes. By learning, it can detect the predefined area, and then accurately recognize and extract the text within the images.
NAVER Cloud Platform’s OCR service has three main superior features.
(1) Precise Data Extraction
OCR understands characters of various shapes and forms through extensive data learning. For this, we have our own character text area detection and recognition technology, and provide this feature to quickly extract only the necessary characters after creating a template and selecting the area to be recognized.
(2) Differentiated Model Utilizing NAVER’s AI Technology
Utilizing NAVER’s CLOVA AI technology, the CLOVA OCR service features a high-performance OCR model optimized for major business purposes. It supports Korean, English, and Japanese, and provides a high level of hand-written character recognition as well as printed characters to support technology to suit the actual user environment. Especially on the OCR template, you can define a recognition area to extract the text as formatted value and receive it in a formalized document.
(3) Automated Document Processing, Made More Convenient
Using an OCR service, automated classification becomes available for documents that previously required manual classification, according to the verified similarity compared to preregistered templates. Therefore, more efficient workflow can be designed where documents are classified with minimal user intervention.
It is difficult to open the images piled up in a folder one by one and check them with your eyes, but by using OCR, metadata is automatically saved for convenient data management and utilization, further increasing the value of the data. NAVER Cloud Platform’s OCR service accurately analyzes the image through the RESTFul API received, and processes and delivers the text extraction result in JSON (JavaScript Object Notation) format.
We are going to provide various types of image recognition and text extraction services as shown in <Table 1> below, and continuously update the function to extract the various objects contained in images.
Text Extraction through NAVER Cloud Platform OCR API
Introducing Main APIs
You can easily and quickly learn how to use NAVER Cloud Platform’s OCR service through the manual in <Figure 7>.
1. OCR User Manual
<Figure 5> is a manual for the general use of NAVER Cloud Platform’s OCR service.
You will find detailed information about the portal and Console for using the service in the manual.
2. OCR API Linkage Manual
<Figure 6> is a manual for linking APIs in NAVER Cloud Platform’s OCR service. You will find the detailed description about the formats and examples of the API linkage for service implementation.
3. OCR API Invoke Manual
<Figure 7> is a manual for invoking APIs in NAVER Cloud Platform’s OCR service. You will find the detailed description about the definitions of the API that needs to be invoked when implementing service, along with examples.
4. OCR Custom API Specification.
This is the manual for the using custom APIs in NAVER Cloud Platform’s OCR service.
You can implement and provide more reliable service APIs by linkage with API gateway products that provide enhanced usability such as user API security and monitoring. Please refer to <Figure 9> and <Figure 10>.
Request Subscription for OCR Service
1. Read OCR Terms of Service and Request Subscription
You can use the OCR service after you have read and accepted NAVER Cloud Platform OCR Terms of Service.
2. Create and Register Domain
NAVER Cloud Platform’s OCR service provides two main categories of service, General and Template. We are going to implement a very general service and provide it through an API in this exercise. To being, select General and create a domain.
With a few clicks, we have completed creating an OCR domain as shown in <Figure 13>.
3. Register Text OCR
Click the [Integration Setting] button based on the domain created in <Figure 13>. You can easily configure an API service through automatic API gateway linkage. (If more detailed setting for the API gateway is needed, it can be created through manual linkage.)
The password (Secret Key) and API Gateway URL created in <Figure 14> will be required for the service we are going to implement here, so you can use the copy buttons when needed.
4. Check for Registration of API Gateway
Check if the text OCR service is properly registered in the API gateway. Although there weren’t any settings made in the API gateway, you can see the information on the created API through OCR automatic linkage.
Verifying Operation of OCR API and Running Tests
For the test web client, the user can choose from among the test tools that they like to use. In this exercise, we ran the test with a commonly used tool, Postman.
In the Request Body field, enter as shown in <Figure 17> and save.
<Code 1>
{
"images": [
{
"format": "png",
"name": "medium",
"data": null,
"url": "https://kr.object.ncloudstorage.com/maso-storage/source_image.png"
}
],
"lang": "ko",
"requestId": "string",
"resultType": "string",
"timestamp": {{$timestamp}},
"version": "V1"
}
Complete the request body field and click [Send],
then you can see, as shown in <Figure 18>, that the text in the test image is analyzed and the result is organized in JSON format.
Learning more about OCR API
Please refer to the detailed API manual, as shown in <Figure 19>.
OCR API Request
You can check the details of the request as shown in <Figure 20>.
You can set up the header information required for the request, as shown in <Figure 21>.
The request body can be set up as shown in <Figure 22>.
OCR API Response
The response for the image recognition result is shown in <Figure 24>.
The detailed description of the image field, which is the main object used in the service implementation, is shown in <Figure 26>.
In addition, the API guide provides explanations of each API version and detailed examples of requests and responses.
Has everyone been able to follow me so far?
This is it for Part 1 of configuring non-contact services easily with OCR and TTS! We will be back in the future with Part 2.
Thank you for reading!