Serverless Stream Consumers — Common Pitfalls and Best Practices

Published in

Capital One Tech

7 min readJun 17, 2019

The idea of an always-available AWS Lambda-based stream consumer looks like an anti-pattern at first sight. How (or why) would you want to run a perpetual, high-volume stream consumer in an ephemeral container? From an operational efficiency perspective, the benefit is ease of infrastructure management. With serverless stream consumers, different teams across the same company can leverage, contribute to, and adopt a codebase without needing to manage infrastructure.

Companies like Netflix have already proven the benefits of a Stream Processing as a Service (SpaaS) model, exemplified in projects like their Keystone and Mantis platforms. In this model you want your engineering team focused on delivering solutions that provide business value, not duplicating existing infrastructure across teams within the same organization.

Why AWS Lambda?

The benefits of running stream consumers as AWS Lambda functions are closely tied to the benefits of serverless computing in general. Amazon offers a service called AWS Kinesis, which allows developers to cost-effectively process streaming data. With just a few lines of code, you can set up a simple, event-driven app that performs actions as new events become available.

If you are expecting low frequency events, then there is a significant cost-effectiveness advantage to having your serverless app be “down” when events aren’t being processed, as opposed to server-based infrastructure.

Event-Driven Consumer Architecture

While the combination of Kinesis and Lambda allows for ease of integration inside of your VPC, the event-driven paradigm is not always the most efficient when attempting to consume an enormous amount of data. In order to guarantee sequential processing, each Lambda consumer is assigned to a single Kinesis shard, meaning that as you scale, you will have to potentially re-shard. Because you are charged on a per-shard basis, splitting increases your costs.

An event-driven design can also run into latency issues. In this scenario, each batch of records triggers a Lambda function, which has to read records, conditionally perform an action on these records, and then write them to a new stream. Additionally, you have to make sure to potentially handle data sink connections properly before your function times out. You are left with having to keep your Lambda functions warm in order to maintain availability if you are dealing with sporadic events appearing in your stream, which adds another layer of infrastructure management overhead.

Constant-Poll Consumer Architecture

For these reasons, I’d like to explore a specific use case that instead leverages Apache Kafka using a constantly-polling group of Lambda functions, each of which is assigned to a single Kafka topic partition. The benefit of assigning a single Lambda function to each topic partition is simplicity of design and performance.

Imagine a case where each Lambda function isn’t explicitly assigned to a single topic partition; when the group of Lambdas is deployed, they don’t all appear at the same time, which means that the Kafka cluster’s broker, which acts as the group coordinator, will attempt to rebalance the consumers every time a new consumer appears. Consumer rebalancing latency largely depends on the size of your cluster and the number of consumers; waiting can take a few minutes as opposed to a few seconds. This length of unavailability is unacceptable, especially when each Lambda function is only available for 15 minutes at a time.

Instead, each Lambda function should act as a dedicated partition consumer, constantly polling for records, applying conditional actions in near real-time, and then sinking that data. The function is always available, invoked on a fixed schedule by a CloudWatch Event. The trade-off in this case is ease of integration versus performance. With that said, bypassing an AWS Kinesis and event-driven AWS Lambda consumer architecture and opting for a more nuanced Kafka and constant-polling Lambda design doesn’t come without difficulty.

Pitfalls

There are certain limitations associated with the AWS Lambda service as a whole that require some engineering to allow us to use them as robust, persistent data stream consumers. These are the four most pressing technical issues encountered:

The consumer group must be continuously running in order to capture all data in a stream, yet AWS Lambda limits each function to a 15-minute lifespan.
AWS Lambda functions experience cold-starts when first invoked by an Amazon CloudWatch Event, meaning minor latency may exist before they start running, especially if they exist within a virtual private cloud.
Amazon CloudWatch Events guarantee at-least-once invocation, meaning that it’s entirely possible that the same Lambda function is invoked more than once within a short span of time. Since Kafka tracks offsets through a unique consumer ID, this can be a major issue. When more than one consumer with the same consumer group ID attempts to read messages from the same Kafka partition, offsets are not correctly tracked, leading to data loss.
For every other invocation of the same Lambda function, AWS sometimes reuses a recycled ECS container to run your process. If you don’t close all threads on termination of the function, the threads will remain running and may cause unintended behavior when the Lambda function is re-invoked in the future. Additionally, if your invocation schedule ensures that a Lambda function is always available (back to back invocation), then consecutive functions will not reuse the same container. Rather, every other function will use a recycled container, meaning that processes running on unhandled threads may execute much later than anticipated.

Best Practices

It’s important to note that these “best practices” are general guidelines that are specific to AWS Lambda-based stream consumers acting over a Kafka topic. A performant, high-volume serverless stream consumer will probably be tuned to specific use cases and implementation-specific constraints. Hopefully, these suggestions will help inspire more robust architecture designs and implementation decisions:

To get around the most pressing concern of each Lambda function’s 15-minute lifespan, use an external store to track state. Each Lambda function should include some sort of state manager which constantly polls a service like DynamoDB. You can leverage each Lambda container’s context object to get invocation time, AWS request ID (a unique identifier), and the name of the function itself. The state manager can poll the external store for state (Start, Stop, etc.), as well as the previously mentioned fields, to allow the runtime handoff between functions to succeed, every 14 or so minutes. In other words, when the old Lambda stops its execution, a new one has already been invoked and is ready to start consuming data where the old one left off, almost immediately.
Amazon CloudWatch Events lets you set up a cron job to periodically trigger Lambda functions. To minimize the latency impacts of starting a Lambda function, you should set the invocation schedule to run every n — 1 minutes, where n is the execution timeout length you specify for each function (Lambda currently supports a maximum of 15 minutes). If always-available infrastructure is a requirement, then you will need a Lambda function to always be up and ready to consume records; waiting ~1 minute for a new one to spin-up is not acceptable. By leveraging the external state store, you can write logic to allow for a seamless handoff between sequential invocations of the same Lambda function: By overlapping invocations of the same function through a 14 minute schedule (15 minutes is not enough time for the new function to be available when the old one dies), you ensure that a consumer is always ready to consume data and minimize cold-start latency; Lambda A continues to consume data while AWS initiates Lambda B’s container warm-up. In this example, Lambda A constantly polls the state table. Once Lambda B is invoked, it immediately overwrites the unique ID and invocation time in the state store, signaling the now-outdated Lambda A to stop execution and spin down. Lambda A posts the offset of the last record it consumed, and Lambda B continues to consume from that point with minimal downtime.

As mentioned previously, AWS CloudWatch Events guarantees at-least-once invocation, which means you have to account for AWS simultaneously invoking more than one instance of the same function at any given time. Kafka consumers are not thread-safe, meaning this can be a big problem. This is another reason to leverage an external store as a state manager: if you write the logic correctly, the duplicate instance will find that its (supposedly) unique container ID already exists in the table, and immediately spin-down the function before it can be registered by Kafka as a consumer. Disaster averted.
This suggestion goes without saying, but make sure to close your consumer’s Kafka connection and secondary execution threads when the Lambda is ready to spin-down and be replaced. Multi-threading issues are not fun to debug, especially when they’re AWS Lambda-specific and you have to re-deploy infrastructure every time a code change is needed.

Conclusion

Serverless technology, while still relatively early in its development curve, can be used to create powerful and flexible data stream consumers. While not ideal in all cases, organizations can build serverless stream consumers to mitigate cloud-provider costs and remove duplication of efforts across engineering teams. As in any emerging field, best practices are constantly evolving. My hope is that these suggestions will help you avoid pitfalls and build more robust systems.

DISCLOSURE STATEMENT: © 2019 Capital One. Opinions are those of the individual author. Unless noted otherwise in this post, Capital One is not affiliated with, nor endorsed by, any of the companies mentioned. All trademarks and other intellectual property used or displayed are property of their respective owners.