Building Nami: A serverless framework for consuming webhooks at scale

Using AWS, Node, Docker, and MongoDB

nami

Published in

nami-serverless

17 min readJun 17, 2019

Introduction

When events occur in batches, a wave of webhooks can overwhelm consumers, resulting in dropped data. Always-on systems must be provisioned to handle worst-case scenarios, however, this is inefficient as these resources will be idling most of the time.

We built Nami to solve this problem by providing an efficient and easily deployable architecture to consume bursty webhook traffic, relying on the elastic scaling power of AWS Lambda Functions as a Service (FaaS). The framework abstracts away the complexities of cloud infrastructure and the effort required to deploy, configure, and choreograph the primarily serverless architecture.

Event-driven Architecture

Webhooks represent an event-driven approach to development. They allow applications to take in data about events as they happen, trading control of data flow for event-driven updates and asynchronicity. Nami’s event-driven architecture (EDA) embraces this trade-off.

In an EDA, events represent changes in state that have already occurred. “An event is an object or message that indicates something has happened;” it holds information about that happening, but does not contain any logic itself (1). Rather than requesting for work to be done, the consumer responds to events that are announced after they have occurred.

The EDA paradigm is asynchronous because a caller is never sitting idle, waiting for a response to be returned. While validation of a command in a request-driven cycle needs to happen synchronously, an event consumer may consume the event asynchronously. The application producing the event needs to know nothing about the consumer(s) or how they will process the event. The producer only waits for acknowledgement that the message was received, but does not wait for the message to be processed (2). In an asynchronous, event-driven model, the dichotomy of client and server loses the meaning it has in the request-driven context, so we can instead use terms such as event producer (server sending the event notification) and event consumer (server receiving the event notification).

Decoupling between event producers and event consumers, one of EDA’s primary advantages, allows each to be implemented with different technologies, announce and process events at different times, and to be scaled separately, all without needing to know about one another’s implementation or whereabouts.

Webhooks

One way that the shift towards an event-driven paradigm manifests is in the use of webhooks. Webhooks are a mechanism for one system to notify another system of an event that has occurred.

Before webhooks became a popular way to move data, polling techniques were commonly used to retrieve event information from services. If your app needed information from a service, it would need to continually poll that service with HTTP requests at a certain time interval to check for updated information.

Polling an API for data

If that service does not have any updates, the resources used to continually poll the service were wasted. Webhooks turn this situation around:

“don’t call us, we’ll call you.”

When a particular event occurs and triggers a webhook, the producer, or webhook sender, calls the receiver by making an HTTP request to the URL configured for the webhook subscription.

Webhook data is most commonly sent via a POST request, with the payload included in the request body. Webhooks can be sent as several different content types, including JSON, form-data, and XML. Once the request is successfully received by the consumer, a 200 status response is sent back to the webhook producer.

Example webhook payload for Shopify inventory item created event

A common use case for Webhooks is for real-time event notification. Systems that generate a lot of events can quickly ramp up the webhook traffic to the destination. If webhook traffic grows consistently it is relatively easy to accommodate the extra load with additional infrastructure. More difficult though, is inconsistent, bursty traffic.

Manual Solution

To handle these occasional spikes in traffic, the webhook consumer app could provision additional infrastructure to handle the extra traffic. However, there are a few drawbacks to this approach. Servers are expensive and require ongoing maintenance. This also makes the core application, sharing the same infrastructure, more vulnerable to downtime caused by problems with handling webhooks. Further, provisioning infrastructure that will sit idle and unused much of the time is an inefficient use of resources, especially for consumers receiving webhooks in volatile patterns.

Additionally, by nature of being event driven, the consumer cannot control the flow of data. If the flow of a webhook comes in like a firehose and generates more traffic than the application can handle, the result will be an unintentional denial-of-service (DoS) attack. This is like a crowd of people blocking the entry door of a business, making it more difficult for legitimate customers to enter, or denying them service (3).

Considering the challenges with using on-premise infrastructure to handle webhooks at scale, these types of use cases fit well with auto-scaling compute services like AWS Lambda, AWS’ Function as a Service (FaaS) platform. The elastic scaling capability of Lambda allows for compute resources to scale up and down on demand.

Serverless Computing

Before we examine AWS Lambda more closely, let’s zoom out and look at serverless as a whole. Server management options run a continuum from bare metal (physical, on-site servers) to completely serverless. Serverless computing (comprised of both FaaS and Backend as a Service [BaaS]), which is in fact performed on servers somewhere, is on one end of the spectrum where all provisioning and management concerns are abstracted away from the user, but the tradeoff is a loss of control over how code is handled.

FaaS is offered by multiple cloud platforms, with popular services such as AWS Lambda, Google Cloud Functions, and Microsoft Azure Functions. AWS currently dominates the cloud computing market. In a FaaS environment, function code is uploaded to the cloud platform, which then deploys a compute container provisioned with the function code and other required resources necessary for the function to be successfully executed.

With FaaS, scaling is automatic, elastic to meet the demands of incoming requests, and fully managed by the provider. Compute containers are spun up only when needed, and then shut down as soon as the function has finished execution. The customer pays only for the actual execution time of the function, which can be a significant cost-savings over an always-on server.

Considering these benefits of a serverless solution, it’s clear that a managed server infrastructure wastes resources when it requires the provisioning of enough servers to handle bursty webhook traffic generated by batched events. Use cases consuming webhooks with such traffic patterns, that need only to perform a brief, simple acknowledgement of the request, lend themselves well to a serverless infrastructure.

Need for a Framework

While a serverless infrastructure offers a solution to the problem of bursty webhook traffic, another problem still remains: Developers should not have to spend their time setting up this scaffolding and learning the intricacies and gotchas of a cloud platform just to scale for webhooks. A framework is needed to streamline the system provisioning and deployment process.

Working within cloud platforms is not a trivial endeavor. Though AWS is the current industry leader in cloud computing, the complexity of AWS and its multitude of services means that the learning curve is steep to perform even simple tasks. Choreographing individual services to communicate and work in a coordinated way only compounds this problem.

Nami solves this complexity problem with a simple command line interface; making it easy to deploy an AWS endpoint and quickly stand up a pipeline for webhook data that scales up and down to match traffic patterns, without any intervention required from the user. Nami enables the user to deploy with one command what would take over 50 separate API calls to deploy, configure, and connect each piece of infrastructure. This pipeline, built using AWS services, serves to receive, acknowledge, process, and store payloads received from webhooks.

How it Works

When the event producer makes a request, the API gateway is the first port of call. It accepts the request containing webhook event data, which is encapsulated as an event object, and triggers invocation of the pre-queue Lambda.

The function pushes the event object onto the SQS queue and responds to the webhook producer’s POST request with a 200 status code, if the function execution is successful.

As event data messages enter the queue, this triggers invocation of the post-queue Lambda, which consumes those messages at a stable rate.

The post-queue Lambda then opens a connection to the database, a MongoDB image deployed with Docker on EC2. The function code retrieves the payload from the event object and writes it to the database.

A successful write to the database causes the queue to delete the message. This process repeats until the queue is empty. If the write is unsuccessful, or the function execution fails for another reason, the queue will retry sending the message up to 5 times before sending it to a Dead Letter Queue (DLQ).

Event data in MongoDB is persisted on an EBS volume, AWS’ magnetic disk storage for EC2. The data is now at rest in the database and ready for the user to retrieve.

Nami Framework Design Decisions

As we worked through engineering our solution, we considered tradeoffs to maximize scalability and minimize complexity for the user. Here we explore some of those tradeoffs and the design decisions made during development.

Receiving and Acknowledging Webhook Data

Unreserved Concurrency for Maximum Scaling

AWS Lambda functions are an ideal tool with which to handle bursty traffic. Many Lambda functions can be invoked concurrently, where concurrent executions are the number of executions of a function’s code that are happening in parallel at any given time. Nami deploys two Lambda functions: a pre-queue Lambda and a post-queue Lambda. The pre-queue Lambda serves as the webhook receiver, the front facing function that ingests incoming webhook traffic. This core part of the architecture should be able to scale to its limit since we are relying on its elasticity to handle and respond to bursts of incoming requests. To enable this level of scaling, we configure the pre-queue Lambda function to use the AWS account level unreserved concurrency limit of 1000 concurrent Lambda functions. Although this number declines slightly after the post-queue Lambda is created and allocated 5 concurrent executions of its own, utilizing the unreserved concurrency pool will maximize scaling for the pre-queue Lambda function compared to reserving a specific function concurrency.

The following diagram depicts the scaling behavior of Lambda. Lambda functions are run in compute containers, with a single container able to handle multiple sequential executions. If all existing containers are currently executing functions when another request comes in, then another container must spin up to handle it. It is these multiple concurrent containers that enable parallel execution of functions. This is how Lambda scales horizontally.

Execution Logic

In an event-driven model, the event producer and consumer should not be aware of one another’s implementation. The consumer should simply receive the data, send it on for further processing, and then acknowledge receipt as quickly as possible. Therefore, the pre-queue Lambda function only receives the payload, enqueues it, and immediately returns a 200 success response to the sender. Additionally, the lean execution logic lowers latency, meaning that fewer containers are fired and thus less probability of reaching the concurrency limit. The user can also optionally add custom authentication logic to the function handler. Any kind of webhook event data processing is delegated to the post-queue Lambda, which reads the webhook payload from the Nami deployed queue.

Memory Allocation

During load testing, we aimed for a performance of around 5,000 requests per second (RPS) to compare with real world examples that solved bursty webhook traffic issues using a serverless architecture similar to Nami. We had found that increasing the function memory size to 256MB from a default of 128MB resulted in the best response times for this scenario, optimizing scaling power. An important consideration here is that provisioning any Lambda function with higher memory has a direct correlation with longer cold start times during the container initialization process. Despite this, function execution times were still well within tolerable levels for an expected webhook response time.

With this decision, we traded lower cost of minimum memory and shorter cold start times for lower execution duration and greater scaling capacity. If a Nami user finds that a different memory setting is optimal for their customized implementation, they can easily adjust this setting from the AWS Lambda console.

Multiple Framework Instances

We chose to support multiple instances of the framework where each instance is unique to an individual webhook subscription. Rather than deploying a new API Gateway endpoint for each instance, we consolidate user endpoints as child resources under the parent, or root resource, of the API Gateway so as not to strain the limit of individual API Gateways allowed on an account. Each time the user executes the deploy command to generate an additional API endpoint, a new resource is created that accepts webhook POST requests to the API Gateway. Each resource represents a unique API endpoint by appending the deployment to the end of the /nami path on the same host.

Message Queue

We connect the Lambda receiving webhook requests to the data store via an SQS queue as message broker, reinforcing our design in several ways.

Evening Out Traffic Spikes

Because the queue allows us to scale the producer and consumer separately, and allows us to have multiple consumer Lambdas, we have a different capacity for throughput on each side. When the producer is producing messages faster that the consumer can handle, the queue can buffer the messages until the consumer is ready to receive them.

An AWS SQS message queue is deployed in between the Lambda functions as a means to throttle the message flow, controlling the distribution of webhook data.

Another downstream resource, which could become overwhelmed without the throttling mechanism of the queue, is the data store itself. Each second, thousands of consumer Lambda functions could be attempting connections to the EC2 instance, which will quickly become overloaded. While a larger instance size with more RAM could be used to support more incoming connections, we chose to keep costs lower with a smaller instance size as the queue’s throttling action is sufficient to keep the instance from being overwhelmed.

Decoupling

In the same way that the use of webhooks decouples event producer from event consumer, using the SQS queue at the center of our architecture decouples the pre-queue Lambda sending the message from the post-queue Lambda processing the message so that the producer does not need to have any awareness of the processing that the consumer is doing, or even whether it is available. The queue ensures that producer and consumer can be developed independently.

Isolating Failure

Decoupling producer and consumer on the Nami data pipeline additionally provides some fault tolerance to the system. Should the EC2 instance ever be unavailable, the queue, or (in the event of extended unavailability) the DLQ, will retain the messages until the system is able to heal.

Processing the Data

Reserved Concurrency for Throttling

When the post-queue Lambda function is invoked, it should provide throttling without generating errors while also maximizing concurrency available to the pre-queue Lambda function. To achieve this, we limited the reserved concurrency of the post-queue Lambda function to allow only 5 concurrent invocations at one time, a minimum concurrency level recommended by AWS. This means that from the potential 1,000 concurrent executions available to any given AWS account, only a maximum of 5 will be allocated to the post-queue Lambda. This ensures the pre-queue Lambda function still has a large pool from which to draw, the SQS queue is drained at a steady rate, and the two Lambda functions are never competing for resources.

Batch Size

When messages are received from the queue by the post-queue Lambda, it is possible that the function execution can fail, even after a successful write to the database. This results in the message not being deleted from the queue. Another successful function execution can then result in duplicate entries in the database. This problem is compounded when there are multiple messages in a batch, because if a batch fails midway, all of the messages from the batch remain in the queue to be processed again. To reduce the impact of this potential issue, we chose to set the maximum batch size that the post-queue Lambda can process to 1. In this case, if a batch size of 1 is written to the data store multiple times, it is only one message that is duplicated. The downside of a batch size of 1 is more sends, however, we felt this was an acceptable trade-off given that the potential for event data duplication begins upstream, with the webhook producer, thus we chose to minimize this potential from within Nami.

User-Specific Logic

The post-queue Lambda function is the appropriate place to implement user-specific logic for data processing and/or sending the webhook payload to external systems. Performing data processing in this post-queue Lambda enables our system to maintain decoupling between webhook producer and consumer. Also, because the flow of webhooks has been slowed by the queue, we can do more “heavy lifting” with the data, as minimal execution durations are not critical to scaling as with the pre-queue Lambda.

Reusing Database Connections

We enabled the post-queue Lambda function to reuse database connections in order to both reduce the function’s execution duration, as well as efficiently use available database connections. This makes use of the Lambda container’s execution context, creating a closure around the database connection object, the value of which is accessible to all Lambda executions in the same container. If the connection does not exist, it is created and assigned to a variable in this context space for future Lambda executions to use.

While the function execution duration gains are minimal, this connection reuse plays a large role in allowing a relatively small EC2 instance size (t2.micro) to be used with Nami. Each open database connection uses an amount of memory, so the EC2 instance is only able to support up to certain number of connections. This implementation prioritizes efficient use of these connections.

Security

To ensure that the data store is not accessible from the public internet, the post-queue Lambda function and EC2 instance are both within their own Virtual Private Cloud (VPC) and security group. This restricts all incoming traffic beyond what is needed for the Lambda function to connect to the instance and write to the database.

Handling Dropped Messages

To handle repeated failures of the post-queue Lambda trying to write to MongoDB, a second SQS queue is added to serve as a dead letter queue (DLQ) for the main queue. Any messages that fail to be written to the database 5 times (configurable) will be diverted here for the user to process manually instead of these messages being lost. The DLQ is set to the maximum message retention of 14 days, deliberately giving the user ample time to manually process any failed messages.

Storing the Data

Deploying with Docker

Nami deploys a Docker container with MongoDB on an AWS EC2 instance to provide a stable destination for user’s data. Deploying with Docker, as opposed to direct installation of MongoDB, is well suited to the use of an EC2 instance in our framework because it allows for deployment of the encapsulated container on any system. Containerization disassociates the application running inside it from host system configurations. Once messages arrive, the post-queue Lambda functions can connect to the EC2 instance and write the webhook data to the MongoDB database for future retrieval.

Preserving Data during Downtime

One challenge with EC2 instances is that any data stored in memory is lost in the event that the instance stops running, or is terminated. To overcome this, Nami’s MongoDB container is pre-configured to store data on a persistent Elastic Block Store (EBS) volume rather than the ephemeral Docker container or EC2 instance itself. A specific filesystem mapping is declared during EC2 provisioning to initiate this change. This ensures that webhook message data would persist if either of those went offline or were stopped or terminated.

Implementation Challenges

AWS Optimistic Request Object

A challenge we faced with deploying or destroying multiple AWS components in rapid succession was dependency conflicts where one component relied on the other’s existence in order to be deployed, or a component could not be destroyed when another depended on it. Just using async/await syntax was not enough to overcome the problem of encountering these timing errors when attempting to deploy (or destroy) one component when its dependency was not yet resolved.

Errors like this occur because, when creating or deleting components via the AWS SDK, the request returns an AWS Request object for which the registered callback returns an optimistic success response. Despite awaiting each API call, the next asynchronous operation is performed upon receipt of an optimistic response, not when the component is actually prepared to accept the next operation.

In development, this latency was periodically throwing errors, both when deploying dependent resources and when destroying them. For example, after calling the nami destroy <name> command, which terminates the EC2 instance associated with a resource, we received an error when trying to destroy the security group associated with it because the instance had not actually been terminated yet.

To overcome this, we implemented a simple while loop which used the AWS SDK to check the instance’s status code and only return from the function call after the instance is in the appropriate state.

Automated Provisioning

The use of a Docker container on an EC2 instance presented a few challenges. How do we programmatically install the Docker container with appropriate port and volume mappings, and how do we handle database collection creation when the EC2 instance is launched?

Nami automatically installs, configures, and runs both Docker and MongoDB by passing in a shell script that is automatically run when the EC2 instance is instantiated. This shell script performs the following operations:

updates package index with apt-get
installs Docker and the Docker CLI
downloads and runs the MongoDB Docker image
configures the MongoDB container to write to the EBS volume
publishes the same port MongoDB uses to the Ubuntu host

Another question faced with running Docker on EC2 was — what happens if an EC2 instance were to temporarily stop running? Docker containers need to be runin order to be used, and they stop running when their host system stops running. To ensure that the container would always be available and running as long as the EC2 instance was on, we configure a restart policy per container --restart=alwaysthat ensures that our container will always be up and running when the EC2 instance is running.

One command

We hope you enjoyed reading about the interesting problems we solved while building Nami. One command is all that’s needed to deploy an instance of the framework. Nami is published an an npm package for easy installation. Give it a try!

Read the full case study here: https://nami-serverless.github.io/

About Us

Our team of three software engineers collaborated remotely across North America and Europe to build Nami.