Ballerina/Azure Cloud Case Study: Scalable Asynchronous Request Processing
Azure cloud provides a vast range of services that can be used to implement and deploy virtually any type of scenario. In this write up, I will explore the capabilities of scalable request processing of Azure cloud in conjunction with the Ballerina programming language.
Ballerina provides a natural approach for developing microservices and network-aware applications. This makes it an ideal choice to showcase the functionalities of Azure cloud. The following services of Azure cloud will be used:
- Azure Kubernetes Service (AKS)
- Azure Blob Service
- Azure Queue Service
- Azure Cognitive Service — Computer Vision
Use Case: Scalable Optical Character Recognition (OCR)
Figure 01 shows the deployment diagram of our scenario. Here, a user will submit images to a web service, taking in binary data of an image, and an email address. The service will process this request, and eventually, the user will be emailed to the email address that was given earlier, with the text content that is there in the given image data.
The full deployment is done in its own Kubernetes cluster. This consists of two programs:
This is a Kubernetes service deployed using a Ballerina service, where it takes in the incoming requests, and stores the request data in a job queue, using the Amazon Queue service. Also, in this step, it uploads the binary data to an binary data store (image store), using the Azure Blob service. The queuing mechanism makes sure the incoming requests are not blocked for processing, but rather, the jobs are queued for asynchronous processing by independent workers. These workers, we can scale up/down or auto-scale as we want.
This is another container running in Kubernetes, but does not intercept incoming network requests. So this is simply a worker process which polls the job queue, in order to check for scheduled jobs. If there are any live jobs, it will query the data, and also load the corresponding binary data from the image store, and afterwards, the Azure CV service will be contacted to do the required processing.
After the Azure CV OCR process returns the response, the worker will send the result to the email associated with the job entry. After these actions, the job entry will be discarded.
OCR Service — Front-end
Listing 01 shows the Ballerina service source, which takes in a binary payload as a POST request body, and also an email address value as a path context value. Here, the Azure blob service is contacted, in order to store the binary data that is sent to the service, with an UUID that is generated to uniquely identify the job. The same UUID is used in storing the job description entry using the Azure queue service. This contains the user’s email address, where the job result should be sent to. The final result of the service call is simply returning back the job id to the user as a JSON response.
The deployment related Kubernetes annotations are used in order to provide the configuration values for the Kubernetes deployment. This includes the Docker Hub username/password as environment variables, and also the config map values, populated from the “ballerina.conf” file in the project directory, which has the values required to initialize the cloud connectors.
OCR Worker — Back-end
Listing 02 contains the Ballerina worker process that is used to process the jobs that are submitted by users. This is a simple main program, which polls that job queue for new entries. If there are any entries available in the job queue, in combines the information from the queue entry, and the data from the blob storage, in order to get the entry job id, the binary data, and the user email address.
After the job data is retrieved, it contacts the Azure Computer Vision (CV) service in order to do OCR on the image data. The resultant text is extracted, and using the Ballerina GMail connector, the result is sent out the user. The user will see the job id in the mail subject, and the extracted text in the email body.
- Azure Account
- Azure AKS Service
- Azure Storage Services — create storage account “storage1777”
- “storage1777” account —create Blob container “ocrctn”
- “storage1777” account — create Queue “ocrqueue”
- Azure Cognitive Services — create “Computer Vision” service in “East US” region
- Generate GMail API keys — instructions found here
The full source code for the project can be found here.
There are two Ballerina projects available, “frontend” and “backend”. Before building each of the projects, we need to update “ballerina.conf” files in each directory.
The values you need to update in each file is as follows:-
- STORAGE_ACCESS_KEY — navigate to the account “storage1777” in Azure dashboard page, where you can find the “Access keys” section. Here, copy either key value in “key1” or “key2”.
- CV_KEY — navigate to the computer vision service created, where you can find the “Keys” section. Here, copy either key value in “KEY 1” or “KEY 2”.
- GMAIL_CLIENTID, GMAIL_CLIENTSECRET, GMAIL_REFRESHTOKEN, GMAIL_ACCESSTOKEN — Fill in with the generated GMail API key values
After the prerequisites are fulfilled, the Ballerina projects can be built individually.
Listing 03 and Listing 04 shows how the two Ballerina projects can be built, and deployed to AKS using the kubectl command.
Listing 05 shows a sample curl command used to submit a job to the system, where binary image data is sent to our front-end service (sample image). This request gets a response from the front-end service with the unique identifier created.
Now you will be sent an email to the address given in the above request.
Figure 02 shows the email retrieved with the result processed by the worker.