Battle of the Extensions

Published in

Melio’s R&D blog

8 min readApr 11, 2024

Two mages, one representing Rust and one representing LLRT fighting one against the other in an epic battle

LLRT (Low Latency Runtime) is a lightweight JavaScript runtime crafted to meet the increasing demand for fast and efficient Serverless applications. It is a trendy topic these days, especially for its exceptional cold start performance.

In the lambda-perf project, LLRT demonstrates startup times on par with Rust and C++, under 20ms, marking a stark contrast to the typical NodeJS runtime’s 150ms. This outstanding performance has sparked conversations about the potential of such a runtime to transform edge Lambdas, Lambda authorizers, and any scenario benefiting from minimal cold start times alongside satisfactory performance.

This post marks the first exploration into how LLRT can significantly enhance Lambda extensions, highlighting an extension developed with LLRT and comparing it to its Rust-developed counterpart.

As a staff engineer at Melio with a keen interest in Lambda extensions written in Rust, this was an opportunity for me to explore whether we could utilize LLRT to facilitate the efficient development of extensions.

The source code for the different extension implementations is shared in the following link.

The post is divided into three parts:

A shallow dive into the workings of Lambda extensions.
A review of the extension code developed using LLRT.
The comparison numbers.

And without further ado, let’s delve into the inner workings of extensions.

Extension shallow dive

Lambda Execution Environment Architecture

An extension is a separate process under your control that runs alongside your Lambda handler, complete with its own lifecycle events. It offers two main advantages:

Additional parallel processing:

An extension provides a straightforward method to add concurrency to an application, thereby enhancing its speed. For instance, consider a scenario where you need to execute some resource-intensive process that can run parallel to your Lambda handler. Instead of incorporating it within your Lambda function, you can offload this task to an external process, or an extension, and then retrieve the results at the end of the Lambda invocation.

For instance, here you can see an example of an extension designed to accept tasks for calculating Fibonacci numbers. It computes them in parallel with the handler’s execution, and the handler then waits to receive the results at the end of its execution.

tasks_service = "http://localhost:3001/tasks"

def lambda_handler(event, context) -> dict:
    try:
        response = requests.post(
            analytics_service,
            data=json.dumps(
                {
                    "action": "fibonacci",
                    "data": "1"
                }
            ),
            headers={"Content-Type": "application/json"},
        )
    # ...
    requests.get(tasks_service, data = json.dumps({"action_id": response.action_id}))

Unique Lifecycle:

Unlike Lambda functions, an extension has the unique capability to continue running even after the Lambda function has completed its execution. This lifecycle allows the Lambda function to return a response while the extension remains active.

Consider workflows that involve post-processing tasks, such as sending analytics data to an external service. In situations like these, there’s no compelling reason for the Lambda client to endure additional waiting time for internal processes that do not impact the client directly.

Additional factors you’ll need to consider when developing an extension include:

Communication:

Depending on your extension’s purpose, if your Lambda function needs to communicate with it, you will have to establish some form of inter-process communication. Our LLRT extension utilizes HTTP for this purpose.

Efficiency:

Given that an extension operates as a separate process, it’s essential for it to consume as little memory and CPU cycles as possible, since it shares resources with your Lambda function. Therefore, choosing efficient runtimes for your extension, such as Go, Rust, or C++, is advisable.

Although the LLRT runtime handles JavaScript files, which are not native binaries, the LLRT binary itself has a significantly smaller footprint and less overhead compared to the comprehensive Node.js V8 engine. This efficiency makes LLRT an attractive option for a Lambda extension runtime.

Let’s examine what an LLRT extension might entail.

Product use case

Our Lambdas gather analytics data on their operations, which is then sent to an SQS queue for processing by a separate Lambda. Due to the operational nature of Lambda, our Lambda consumer had to wait for a successful send operation before we could issue a response. This process added to the product’s latency. Consequently, we’ve opted to leverage the unique lifecycle events of a Lambda extension. This allows us to return results to our consumers more swiftly while managing the analytics event sending within the extension.

LLRT Extension

High-level architecture

To communicate with the extension, the extension exposes an HTTP server. Since LLRT lacks the HTTP module but includes the net module, we crafted a lightweight HTTP server using the net module. Whenever the Lambda function needs to log a new analytics event, it makes a simple POST HTTP call to localhost, providing the event details.

analytics_service = "http://localhost:3001/analytics"

def lambda_handler(event, context) -> dict:
    try:
        response = requests.post(
            analytics_service,
            data=json.dumps(
                {
                    "action": "".join(
                        random.choice(string.ascii_lowercase) for i in range(10)
                    )
                }
            ),
            headers={"Content-Type": "application/json"},
        )

2. The internal HTTP server then stores the analytics event in an internal buffer.

server.post("/analytics", (req, res) => {
    logMessage("Posting to SQS");
    MESSAGES.push({
      action: req.body.action,
      timestamp: new Date().toISOString(),
    });
    res.ok();
  });

3. The core logic of the extension is an infinite loop that waits for the next lifecycle event. After each event, the extension checks the internal buffer for any events.

while (true) {
      logMessage("next event");
      const event = await next(extensionId);
      switch (event.eventType) {
        case EventType.SHUTDOWN:
          handleShutdown(event);
          break;
        case EventType.INVOKE:
          handleInvoke(event);
          break;
        default:
          errorMessage(`unknown event: ${event.eventType}`);
      }

      await sendToSQS();
    }

4. And forwards them to the designated SQS queue.

async function sendToSQS() {
  logMessage(`SQS loop started, messages: ${MESSAGES.length}`);
  while (MESSAGES.length > 0) {
    const message = MESSAGES.shift();
    logMessage(`Sending message to SQS: ${JSON.stringify(message)}`);
    const params = {
      QueueUrl: QUEUE_URL,
      MessageBody: JSON.stringify(message),
    };
    try {
      await SQS_CLIENT.send(new SendMessageCommand(params));
      logMessage("Message sent to SQS");
    } catch (err) {
      errorMessage(`Failed to send message to SQS: ${err}`);
    }
  }
}

Packaging

Building and deploying the extension is as straightforward as with any other Lambda extension. You package everything into a zip file and use the AWS CLI aws lambda publish-layer-version command to create a Lambda layer containing the extension code.

zip -r extension.zip . && \
 aws lambda publish-layer-version --layer-name "llrt-extension" --zip-file  "fileb://extension.zip" --compatible-architectures arm64 --output json --no-cli-pager && \
 rm extension.zip

Please note that your extension needs a specific folder structure:

The initialization script, which calls your extension’s handler, should be placed within the extensions folder.
The LLRT executable should be located in a folder separate from the extensions folder to prevent it from being executed as well.

Unlike fully native options like Rust, working with LLRT also involves providing the LLRT runtime, which is included as part of the extension package. Ensure to select the correct LLRT binary architecture, whether it be arm64 or X86, to match your Lambda function’s architecture.

Each time the extension is initialized, it triggers the init script. In our scenario, this script is an executable bash script. This bash script then runs the LLRT executable to execute the extension handler, which is written in JavaScript.

#!/bin/bash
# ...
./opt/${LAMBDA_EXTENSION_NAME}/llrt-exec "/opt/${LAMBDA_EXTENSION_NAME}/index.js"

Numbers

I’ve created two extensions that implement the behavior mentioned above, which is an extension that listens for analytics events and pushes them to an external SQS queue for further processing.

One extension is implemented using Rust, and another is implemented using LLRT. Additionally, I implemented the same flow without an extension, using purely Lambda code, to demonstrate the benefits of extensions in general for use cases like this. I am using Python as the Lambda runtime.

Warm Start

The biggest hurdle in measuring latency in Lambdas that utilize extensions in the manner we do — specifically, leveraging the unique lifecycle events — is the difficulty in capturing the latency experienced by the user who invoked the Lambda.

The Duration in the REPORT event in the logs includes both the Lambda duration and the extension duration. In our case, I wanted to isolate and view only the Lambda duration, excluding the extension duration, to represent what the Lambda consumer would experience.

To achieve this, I utilized the CloudWatch metric PostRuntimeExtensionsDuration to obtain the duration of the extension’s post-runtime activity. I then subtracted this number from the total Lambda invocation duration.

I employed the power-tuning state machine to execute the Lambda function 100 times. Here are the results:

LLRT becomes quite comparable to Rust with higher memory configurations.

Another benefit observed when using the extension is the 90th percentile of maximum memory consumption.

The Lambda without the extension has boto3 bundled with it, thus increasing the total memory usage of the Lambda.

Cold Start

Cold start includes both the Lambda cold start and the extension initialization phase. Unlike the warm start, it’s easier to measure; simply looking at the Init Duration in the Report event provides the relevant number. To generate cold start invocations, I used the measure cold start state machine.

LLRT is quite comparable to Rust. Additionally, it’s worth noting how the inclusion and initialization of the boto3 package affect cold start times. LLRT pre-packages the Node.js AWS SDK V3, which is quite efficient since it separates each AWS service into individual packages, making the initial package loading and initialization quick.

Conclusion

LLRT is a new JS-like runtime designed for edge compute services, and I believe it has the potential to be an excellent Lambda Extension runtime. It offers performance comparable to Rust for IO-bound flows, both in cold and warm starts. Above all, its familiar language syntax makes it easier to learn compared to Rust. Assuming LLRT will have an official release, it will be my preferred runtime for writing Lambda extensions.