Does AWS Lambda keep its serverless marketing promise of continuous scaling?

OpsGenie’s migration to serverless hit a major bump in the road — we’ve exceeded AWS Lambda’s concurrent execution limit

Sezgin Kucukkaraaslan

Published in

A Cloud Guru

7 min readAug 18, 2017

Does AWS Lambda really deliver on the promises of continuous scaling?

AWS Lambda automatically scales your application by running code in response to each trigger. Your code runs in parallel and processes each trigger individually, scaling precisely with the size of the workload.

Yeah, I know: that’s how you market a product …

OpsGenie has been heavily investing in serverless technologies. We are using AWS Lambda in application development, synthetic monitoring of production systems, cross-region replication of database systems, customization needs of customers, and more.

We recently shared some insight on why OpsGenie is switching to serverless technologies, and our engineers continue to blog about the journey with new lessons from the experience.

Instead of implementing microservices, we shared the rationale for moving OpsGenie to a serverless architecture. We found several advantages with using AWS Lambda instead of building dockerized applications on top of AWS ECS — or deploying them to a PaaS platform like Heroku.

OpsGenie is on a journey to reap the benefits of serverless architecture

Engineers are coding and deploying new product features using the AWS Lambda service and moving existing apps to…

read.acloud.guru

The journey recently hit a major bump in the road — we’ve exceeded AWS Lambda’s concurrent execution limit.

AWS defines concurrency as the number of executions your function code are occurring at any given time. The service applies throttling of your functions if a certain concurrent execution count is exceed.

The throttle differs based on whether your Lambda function is processing events from a stream-based event source versus a source that is not stream-based. The AWS Lambda Developer Guide provides much more detailed information on concurrent execution limits.

For invocations that are not stream-based, AWS uses the following formula to calculate concurrent Lambda execution counts:

events (or requests) per second * function duration

The usage metrics below show when AWS started throttling our requests.

Exhibit 1: Average Lambda Invocation Duration

During this period of time, we had a concurrent execution limit of 1000.

With 200 events per second and a 2.8 second execution duration, on average, it seemed unlikely that we had reached the limit that throttled our requests.

I’m not here to argue about whether the calculation by AWS to throttle our requests was right or wrong. I’m here to rant — I mean, discuss — about some of the common issues that are challenging our journey to serverless.

This is what we discovered …

When our engineering team used CloudWatch metrics to look deeper, we realized that an operational task that sends AWS ALB logs to our Graylog server started to suffer latencies — which caused throttles of the AWS Lambda service.

Latencies. Yeah — it seems so obvious … now!

Back-Pressure

Here’s the thing — any function can suffer latency because of network or performance problems in third party systems. Any public API endpoints can receive massive loads, or an application function can experience DynamoDB throttles and retry their database calls a couple of times.

Eventually, any back-pressure happening in any part of your infrastructure can make your whole system go down.

Many everyday situations can cause back-pressure, but AWS Lambda’s concurrent execution limit model makes it almost impossible to deal with it in a quick and convenient manner. As a workaround, you have to buffer loads in a queue or stream — and then call your Lambda functions with the buffered events.

At first, this issue seemed to almost destroy the value of all event-based Lambda integrations with services like API Gateway, S3, and SNS. To keep moving forward, our engineering team had to figure out what could be done for short-term workaround.

Our immediate response was obvious — request a limit increase. It turns out that you don’t receive limit increases immediately — you have to wait a week or so!

For next steps, our team focused on buffering the load. There are two basic options for buffering loads in the AWS serverless world: Kinesis and SQS.

The cost of using Kinesis to buffer loads
Kinesis is Amazon’s fully managed and scalable streaming data solution. You can use it to capture and send any kind of streaming data such as application logs or metrics, website clicks, sensor data, player-game interaction — and you consume and analyze the data as it arrives.

The Kinesis scalability model sits on top of sharding, which is the base throughput unit of a Kinesis stream. One shard supports 1000 put requests of 1MB input data per second. If you need more more throughput, you can dynamically add or remove shards via Kinesis API.

For stream-based sources like Kinesis, the number of shards per stream is the unit of concurrency. If your stream has 10 active shards, there will be at most 10 Lambda function invocations running concurrently.

This approach protects you against concurrent execution limit threats — but it comes with a cost. In order to increase the throughput, you’ll need to increase the number of shards. If your logic is not appropriate for batch processing, a very large number of shards is required.

The one thing about AWS Kinesis service I hate most is its pricing model. AWS forces you to pay $0.015 per shard per hour — even if you don’t send or consume any data!

And unlike most other streaming solutions, Kinesis does not support topic subscriptions. You’ll have to create a Kinesis stream for every Lambda function that you create! In a common local development and test environment model with 50 engineers, creating that many streams could be catastrophic.

The cost of AWS Kinesis might immediately disqualify it as a viable option for many organizations.

Using Simple Queue Service is still a (better) workaround
SQS is Amazon’s messaging queue service, Similar to Kinesis, it is fully managed and highly scalable. You can use it as messaging middleware to decouple your services and applications.

SQS is a very simple, lightweight, and cost-effective solution — which I happen to like very much (in case you hadn’t already guessed!).

The problem with SQS, however, is that AWS does not offer a native integration with Lambda. You have to provide your own solution for polling queues and submitting messages to relevant functions — then you must manually delete the messages you processed.

A key issue with these workarounds is that they unnecessarily increase the complexity of the infrastructure you have to manage. So, it made me wonder … what kind of services should AWS provide so development teams and organization can design and deliver better solutions?

AWS — make it so
At a time when microservices and service-oriented architectures are gaining popularity— and when developing decoupling systems by applying separation of concerns pattern is inevitable — an account-based concurrency limit is unacceptable!

If AWS plans to keep its serverless marketing promise of continuous scaling — developers need a much better model for consuming Lambda functions.

Developers need fine-grained control over a single function or a group of functions.
Developers need the ability to programmatically increase concurrency limits for functions — with a single API call.
Developers need a CloudWatch metric that provides the container count of functions that triggers action and immediate limit increase requests.

AWS already uses this approach for their DynamoDB service. A developer can create tables with different read and write capacities, create threshold metrics at CloudWatch, and get notified about load spikes so that you can immediately provision your capacity according to the load.

The engineering team at OpsGenie has already made these feature requests to AWS — and we don’t expect they’ll be implemented exactly as we propose. We just hope that AWS will deliver a solution that addresses our concerns — and quickly.

Even after that rant — I mean discussion — I truly believe that serverless is the future of computing and application development. OpsGenie is enjoying the journey and our overall experience (so far).

Our team is well aware that one of the costs of being an early adopter includes dealing with immaturity of services — like the lack of ideal tools and frameworks for local development, testing, and deployment.

I fully expect that we’ll face many more issues as we continue to evolve our serverless solution alongside AWS. I’m confident we’ll meet those challenges, continue to learn, and share our insights with the engineering community.

OpsGenie is really enjoying being a part of the serverless community, learning from others, and contributing to it’s growth and maturity. We hope that sharing about our insights will help others with their journey.

Looking forward to hearing about your experiences in the comments below!

Does AWS Lambda keep its serverless marketing promise of continuous scaling?

OpsGenie’s migration to serverless hit a major bump in the road — we’ve exceeded AWS Lambda’s concurrent execution limit

OpsGenie is on a journey to reap the benefits of serverless architecture

Engineers are coding and deploying new product features using the AWS Lambda service and moving existing apps to…

Back-Pressure

Written by Sezgin Kucukkaraaslan