Implementing Serverless Functions with Ballerina on AWS Lambda

Integration Services, Ballerina and AWS Lambda

Ballerina is now reaching its initial GA release by adding more and more language features, connectors, improving performance and fixing issues. Today, there are many reasons for choosing Ballerina over any other integration language; firstly its high performance mediation engine implemented with asynchronous event-driven network application framework Netty, the rich collection of mediation constructs & transports, the list of connectors for well known APIs such as Google, AWS, Facebook, Twitter, SalesForce, JIRA, etcd, etc and its lightweight runtime (which is currently around 14 MB in distribution size). In addition, for developers who are keen on graphically modeling integration workflows than writing it in a text editor, Ballerina Composer provides an appealing set of tools on a web based UI. All of these aspects are now making Ballerina one of the best platforms for implementing integration services.

Currently, there are two ways that integration workflows can be exposed in Ballerina; the first approach is via services and second is via main functions:

Exposing Integration Workflows via Ballerina Services

Figure 1: Ballerina Service Execution

A Ballerina service can expose integration workflows via HTTP 1.1, HTTP 2.0, WebSocket, JMS and FTP/FTPS/SFTP listeners. Similar to Node.js Express, Python Flask, Ballerina also exposes service listeners within its runtime using Netty without having to deploy services on a traditional server. The integration workflows can be implemented to talk to multiple external endpoints via REST, SOAP, or using a connector with few lines of code. This is another strength of Ballerina. Refer service chaining sample in Ballerina repository to experience this yourself. Ballerina services can be deployed on virtual machines or containers depending on the deployment architecture of the solution that you are implementing. Nevertheless, due to its lightweight, self-contained nature, these services are well suited to be deployed on containers in a microservices based architecture.

The following example shows how a simple hello world service is run in Ballerina:

$ ballerina run service helloWorldService.bal
ballerina: deploying service(s) in 'helloWorldService.bal'
ballerina: started server connector http-9090
$ curl -v http://localhost:9090/hello

> GET /hello HTTP/1.1
> Host: localhost:9090
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/plain
< Content-Length: 13
Hello, World!

Exposing Integration Workflows via Ballerina Main Functions

Figure 2: Ballerina Main Function Execution

The second approach of exposing integration workflows in Ballerina is using main functions. This is similar to Java and Golang where the main function is used for executing any logic via a binary. This approach in Ballerina allows integration workflows to be directly invoked via shell commands, without exposing them via service endpoints:

$ ballerina run main echo.bal "Hello Ballerina"
Hello Ballerina

As you may now understand the same concept can be applied for exposing Ballerina functions in serverless environments. In serverless architecture, functions are implemented in a protocol neutral way and exposed through multiple service listeners in a loosely coupled manner. Therefore, services which already have service listeners bound to specific protocols might not be able to deploy as functions directly. In this article, we will use a Ballerina main function for deploying an echo function on AWS Lambda and expose it via the Amazon API Gateway as a REST API.

An Introduction to AWS Lambda

AWS introduced AWS Lambda in the year 2014 with the rise of the serverless architecture. It provides the ability to deploy software applications as a composition of functions and expose them via various channels. For an instance, Amazon API Gateway can be used for exposing functions as REST APIs, Amazon SNS can be used for triggering functions via pub/sub messaging, Amazon Kinesis streams can be used for invoking functions from streaming data, functions can be written as triggers in Amazon’s NoSQL database DynamoDB, functions can subscribe to Amazon S3 bucket events for processing content uploaded to S3 buckets, and it even can expand Amazon Alexa’s skill set. The complete list of AWS Lambda event sources can be found here.

The most important aspect of using lambda functions on AWS is that its pricing model. It has been designed to charge users based on the amount of memory allocated for a function and the time it takes to execute each request according to its on-demand deployment model. At the moment CPU allocation cannot be specifically controlled rather it changes in proportionate to the allocated memory. For an example, if a function is deployed with 128MB of memory and if it gets executed for 30 million times a month where each run take 200ms, the total monthly compute cost would be $5.83. Additionally, users would need to pay $5.80 (29M * $0.2/M) for the 30 million requests served by the platform. Here the first one million requests are given for free. Finally, the total infrastructure cost would be $5.83 + $5.80 = $11.63 per month. In contrast to the cost of deploying the same function on a VM or a container and running it continuously for a month, this price would be quite low.

Implementing AWS Lambda Ballerina Runtime

Figure 3: AWS Lambda Ballerina Runtime

At present AWS Lambda only supports writing functions in Node.js, Python, Java, and C#. Support for any other language can be provided by implementing a wrapper function in one of the above languages and creating a process for triggering the required runtime execution. Incoming request’s message body can be passed to the Ballerina function as command line arguments and the output of the function can be captured via the STDOUT and STDERR.

package org.ballerina.aws.lambda.runtime;
...

public class ApiGatewayFunctionInvoker implements RequestStreamHandler {
...
    public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) throws IOException {

...
// Read request body from input-stream
...
CommandResult result = CommandExecutor.executeCommand(
logger, env, "/tmp/ballerina/bin/ballerina",
"run", "main", "/tmp/" + balFileName, body);

...
// Write ballerina main function output to output-stream
...
}
}

In addition to above, it is also important to note that Ballerina requires a Java runtime environment (JRE) for its execution. On AWS Lambda the best way to get a JRE is to use its own Java runtime. Otherwise, the JRE would also need to be packaged into the Lambda distribution if a different language is used for implementing the wrapper function. As illustrated in figure 3, As shown above, I have implemented a Java wrapper function for invoking Ballerina functions via Amazon API Gateway and a Gradle build file for packaging the Java wrapper function, Ballerina runtime and Ballerina code that implements the integration workflow into a zip file. This zip file can be uploaded to AWS Lambda as an all-in-one distribution for deploying Ballerina functions.

Exposing Functions via The API Gateway

Even though Lambda functions are implemented in a protocol neutral way when integrating them with different channels input and output messages would need to be processed in a channel specific way. For an example, if a function needs to be exposed via the API Gateway, the function might need to read input parameters via HTTP query parameters, headers, and body depending on the function design. Amazon API Gateway sends incoming messages to Lambda functions in the following format:

{
"resource": "Resource path",
"path": "Path parameter",
"httpMethod": "Incoming request's method name"
"headers": {Incoming request headers}
"queryStringParameters": {query string parameters }
"pathParameters": {path parameters}
"stageVariables": {Applicable stage variables}
"requestContext": {Request context, including authorizer-returned key-value pairs}
"body": "A JSON string of the request payload."
"isBase64Encoded": "A boolean flag to indicate if the applicable request payload is Base64-encode"
}

Similarly, the response messages would need to be in the following format for the integration:

{
"isBase64Encoded": true|false,
"statusCode": httpStatusCode,
"headers": { "headerName": "headerValue", ... },
"body": "..."
}

The ApiGatewayFunctionInvoker has been designed to support above message transformations and invoke Ballerina functions in a generic way. Therefore, the integration workflows can be implemented independent of the event source trigger. The only aspects that need to be considered are both request message body passed as a command line argument and the output of the main function written to the STDOUT, STDERR are in JSON format.

Steps To Deploy

1. Clone the following Git repository and switch to the latest release tag:

$ git clone https://github.com/imesh/aws-lambda-ballerina-runtime
$ cd aws-lambda-ballerina-runtime
$ git checkout tags/<latest-version>

2. Download and extract Ballerina runtime distribution from ballerinalang.org:

$ cd aws-lambda-ballerina-runtime
$ wget http://ballerinalang.org/downloads/ballerina-runtime/ballerina-<version>.zip
$ unzip ballerina-<version>.zip

3. Remove Ballerina zip file, version from the Ballerina folder name and the samples folder:

$ rm ballerina-<version>.zip
$ mv ballerina-<version>/ ballerina/
$ rm -rf ballerina/samples/

4. Copy your Ballerina main function file to the project root folder. To demonstrate how things work let’s use the following echo.bal file:

import ballerina.lang.system;

function main(string[] args) {
if (args.length == 0) {
json error = { "error": "No input was found" };
system:println(error);
return;
}
system:println(args[0]);
}

Now the directory listing will be as follows:

$ ls
README.md ballerina/ build/ build.gradle echo.bal src/

5. Build the project using Gradle. This will create a distribution containing Ballerina runtime, echo.bal file and Java wrapper function:

$ gradle build

5. Check the build/distributions folder for the AWS Lambda Ballerina Runtime distribution:

$ ls build/distributions/
aws-lambda-ballerina-runtime.zip

6. Now, login to AWS and open up the AWS Lambda page. Then, press the Get Started Now button to create a new function:

7. Select “Blank Function” blueprint and go to the next step:

8. Select API Gateway as the source trigger of the Lambda function and provide API details. Let’s call this API “EchoAPI” and keep security as open for the simplicity of the POC:

9. Select Java 8 as the runtime and provide a name for the function:

10. Upload the function package file (aws-lambda-ballerina-runtime.zip) created in step 5 and provide the Ballerina file name via an environment variable:

11. Set the handler as “org.ballerina.aws.lambda.runtime.ApiGatewayFunctionInvoker::handleRequest” and create a new IAM Role for function execution with required policy templates:

12. Expand the “Advanced settings” section and set memory as 1536 MB. The reason for this is to increase CPU to its maximum level:

13. Then review the function summary and press the “Create function” button:

14. Click on “Actions -> Configure test event” and provide a sample input message as follows:

15. Thereafter, press the “Test” button and execute a test. If everything goes well an output similar to the following will be displayed:

16. Now, try to invoke the above function via the API Gateway. To do this go to the Triggers tab and copy the API URL and execute a CURL command:

$ curl -v -H 'Content-Type: application/json' -d '{"hello":"ballerina"}' https://81y9s6t1pj.execute-api.us-east-1.amazonaws.com/prod/BallerinaEchoFunction
...
> POST /prod/BallerinaEchoFunction HTTP/1.1
> Host: 81y9s6t1pj.execute-api.us-east-1.amazonaws.com
> User-Agent: curl/7.51.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 21
>
...
< HTTP/1.1 200 OK
< Content-Type: application/json
< Content-Length: 21
< Connection: keep-alive
< Date: Wed, 21 Jun 2017 12:05:53 GMT
...
{"hello":"ballerina"}

Conclusion

In this article, we went through a quick POC on deploying an echo function written in Ballerina on AWS Lambda. As you may have noticed, the execution time of the echo function is quite high, in this example, it was around 2052 ms even with the highest possible amount of resources provided. I tested the same function locally by creating a Docker image using the Ballerina docker command and the results were much similar. It seems that Ballerina main functions are currently consuming a considerable amount of CPU due to some reason. This would need to be investigated and improved in the future if possible. Moreover, it was also identified that a specific Java wrapper class is needed for each function source trigger for processing incoming messages and preparing response messages as those are trigger source specific. Currently, AWS Lambda Ballerina Runtime implementation provides a Java wrapper function for integrating Ballerina functions on AWS Lambda with Amazon API Gateway. Support for more trigger sources will be added in the future. If you are willing to contribute please feel free to share a pull request.

References

[1] AWS Blogs, Scripting Languages for AWS Lambda: Running PHP, Ruby, and Go: https://aws.amazon.com/blogs/compute/scripting-languages-for-aws-lambda-running-php-ruby-and-go/

[2] AWS Documentation, Output Format of a Lambda Function for Proxy Integration: http://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-set-up-simple-proxy.html#api-gateway-simple-proxy-for-lambda-output-format

[3] AWS Documentation, Map Response Payload: http://docs.aws.amazon.com/apigateway/latest/developerguide/getting-started-models.html

[4] AWS Documentation, Lambda Function Handler (Java): http://docs.aws.amazon.com/lambda/latest/dg/java-programming-model-handler-types.html