Practical Serverless: A Scalable OCR Solution in 10 Minutes

Anjana Fernando
Ballerina Swan Lake Tech Blog
4 min readAug 3, 2020

In this article, we will show you how to create a serverless solution for implementing a scalable Optical Character Recognition (OCR) system. In a system like this, scalability is a requirement. At certain times, we can expect possible bursts of traffic into the system where we need to process all of these requests and communicate the result back to the user in a timely manner. To cater to this, we need a system that scales dynamically. One possible solution is to model the required workers and deploy them in a Kubernetes environment to achieve our scaling requirements.

This approach has been implemented and discussed in this article. Here, we will implement the same solution using Azure Functions in Ballerina and show how it can be implemented with considerably fewer lines of code, which resulted in lesser complexity and better maintainability.

Architecture

Figure 1: Deployment Diagram

Figure 1 shows the deployment diagram of the solution that we will be implementing. The user input is taken using an HTTP endpoint, where the user will be providing the binary data for the image and an email address as a query parameter. This HTTP endpoint will be implemented using an HTTP trigger in Azure Functions, and from here, using their output binding mechanism, we store the image data and the job request information in a blob storage and a queue storage respectively. The reason why we are going towards an asynchronous processing approach is that, in this manner, it is easier to scale the required processing units as needed. For example, the job submission function is not a CPU bound task, but rather it does a simple data storage operation. Whereas, the image processing function, which reads in from the blob and queue storage, will have a more expensive and time-consuming task of doing the actual OCR operations. So the serverless environment can scale the functions with its requirements.

In the same manner, the result publishing function is separated from other tasks, since its email sending task can be a task with a high latency, which shouldn’t be an operation that should be blocking others. So it has its own result queue to retrieve result entries to be sent out using its capacity.

Implementation

Here, we will take a look at the Ballerina code that was used when implementing the Azure Functions solution.

Job Submission

Listing 1: submitJob Function Implementation

The submitJob function is the entry point to the system, where it defines an HTTP trigger to collect the user’s email address and take in the image data. Also, it defines blob and queue output binding to save the data that is collected. For the next function, it is just a matter of connecting wiring its input bindings to the output binding that is defined here.

Processing Images

Listing 2: processImage Function Implementation

The actual OCR operation happens in the processImage function. It starts off by defining the queue trigger to listen to the job entries that were added by the submitJob function. Additionally, it adds a blob input binding, which defines a parameterized path parameter that reads in the job ID from the trigger data. It also defines an output binding for a queue which will contain the final result along with the email address to where it should be sent to. The Azure Computer Vision connector is used here for the OCR operation, where its API key is looked up through an environment variable, which is set using an application setting in Azure.

After the results are published to the output queue, it is up to the next function to pick it up and send it out to the users.

Publish Results

Listing 3: publishResults Function Implementation

Here, the publishResults function defines a queue trigger to listen to the result publication queue. As and when an entry is available, this function is called to send out an email with the given address in the job information and the results of the OCR operation. Here, we have used the Gmail connector in order to send out the email.

This marks the end of the process, where we saw how the functions were conveniently wired together through the triggers and the bindings to provide a highly usable approach in defining integrations between systems.

Deployment

Prerequisites

  • Azure Account
  • Azure CLI
  • Azure Storage Services
  • Azure Cognitive Services — create a “Computer Vision” service in “East US” region
  • Generate GMail API keys — instructions found here

The full source code for the project can be found here.

Listing 4: Ballerina Build and Azure Functions Deployment

The Ballerina compiler automatically builds the Azure Functions zip artifact to be deployed through the Azure CLI. Listing 4 shows a sample execution of the compilation, deployment, and a sample run. Finally, figure 2 shows the email received by the user with the job result.

Figure 2: Email Result

Summary

The OCR scenario mentioned here is merely done to simulate potential time-consuming processing that you would have to do. This provides a reference implementation and a pattern on how we can easily create complex workflows by utilizing a serverless framework such as Azure Functions, where its bindings concept specifically allows us to streamline our operations.

As a comparison, the Ballerina Azure Functions implementation is just a single source file with 60 lines, where its similar Kubernetes solution ended up with a multi-module project with 220 lines of code, and multiple configuration files. So we can see how the serverless frameworks have elevated the abstraction level for developers, allowing them to simply concentrate on the business logic.

For more information on writing serverless functions in Ballerina, check out the following resources:

--

--

Ballerina Swan Lake Tech Blog
Ballerina Swan Lake Tech Blog

Published in Ballerina Swan Lake Tech Blog

Ballerina Swan Lake is an open source and cloud native programming language optimized for integration. Create flexible, powerful, and beautiful integrations as code seamlessly. Developed by WSO2.

Anjana Fernando
Anjana Fernando

Written by Anjana Fernando

Software Engineer @ Google; ex: WSO2 | 🇱🇰 🇺🇸 | — @lafernando

No responses yet