AWS Lambdas Can Have Shared State

Step Functions allow lambdas to share state without involving DynamoDB or any other data storage service

AWS Lambda functions are widely regarded as being stateless, that is, whatever data they want to persist between function invocations has to be stored somewhere other than the lambda’s memory space. It’s a fundamental limitation that limits the applicability of lambda functions for use in domains where the cost impact of constant round-trips to data storage makes the overall approach too slow and expensive.

The stateless limitation is fodder for commentary such as Jonas Boner’s argument that Serverless Needs a Bolder, Stateful Vision, where he makes the case for needing programming models different than just Functions as a Service (FaaS), and that the main reason for that is that FaaS functions are stateless.

Just like Jonas and everyone else, I’ve taken stateless limitations for granted ever since I got started with AWS Lambda. It probably didn’t feel like a big deal to me given my background with Ruby on Rails. Each request to a Rails app is stateless. With its horizontally-scaling architecture, Rails simply has no reliable provision for storing data in process memory. You need stateful data? You put it in the database or a cache of some sort. I kind of sorta remember it being annoying back in 2005 when I was coming from a J2EE background, but then again there was the annoyance of having to use sticky load balancers.

Shudder.

In a sense, Lambdas have a similar programming model to Rails with the whole stateless business. Rails controller actions usually handle requests/events by fetching data from a relational database (or cache), but in AWS serverless world the preferred data store is DynamoDB.

Now listen, I’m not thrilled with having to use DynamoDB instead of a relational database. It’s painful, but at the time that I’m writing this blog post there is no such thing as a completely elastic serverless relational database. (Aurora Serverless promises to be just that, but it’s too bleeding edge for production use in my opinion.)

So getting back to the topic at hand, let’s talk about state management for lambda functions. Given the steep learning curve and inherent limitations of DynamoDB, trying to use it for all state management in your serverless microservices applications is a recipe for unhappiness. You especially don’t want to try using it to keep all the transient state that enables collaboration between services with unavoidable domain coupling. If I’m going to the trouble of creating DynamoDB tables, it better be for data that comprises significant domain entities in my system.

Besides that, I want my Lambda functions to be small units of code that do one thing, and do it well. I don’t want my need for shared state to lead me into merging multiple units of code into a single Lambda function invocation.

Step Functions to the Rescue

In case you haven’t heard of it yet, the AWS Step Functions service lets you configure lambdas into serverless workflows. Workflows are made up of a series of steps, with the output of one step acting as input into the next. In other words, they let you build state machines. Despite limitations on the actual contents of that state, Step Function workflows provide a measure of transient statefulness to lambdas that need it to operate efficiently.

For example, here’s a step functions workflow from my latest startup called Demos Guru. This little state machine is part of the Submissions microservice and it takes care of charging a customer for a demo submission. To do so involves collaboration of three lambda functions: DebitArtistAccount, FundingTimedOut, and FundingSuccess. (The other boxes represent states or decision points created by configuration as part of the workflow definition.)

You get these neat workflow diagrams for free in the AWS Step Functions Console.

DebitArtistAccount does what its name implies. It does so by a synchronous call to a Cashier microservice, which in our app encapsulates the state of the credits ledger. The cashier service responds with success or failure, and that information is used to populate an event message passed along the workflow.

An actual output event from the DebitArtistAccount lambda

I could describe the rest of the workflow but it’s a little besides the point for this particular article.

Incidentally, one of the things I love in practice when it comes to programming these things is that you get an event object passed into the lambda’s handler function, and you can output it to whoever comes next simply by returning it. No callbacks required, which really cuts down on lines of code per function.

See those attributes that I set on the event object? That’s stateful data that I don’t have to manage myself, the platform takes care of it.

Recap: the Step Functions runtime abstracts away the storage and communication of stateful event data between collaborating lambdas, as long as you are able to envision those collaborations as a state machine.

As mentioned briefly above, Step Functions also provides ready-made steps for your workflow. They can pass data to other steps in the workflow, handle exceptions, add timeouts, make decisions, execute multiple paths in parallel, and even call out to other AWS services, but that would be the subject for another blog post.

Limitations of the approach

The size of the event payload cannot exceed 32 kilobytes. Let me tell you, that is not much data! I have already run smack into those limits in anger a couple of times with systems I’ve built. If the data you are passing between states might surpass the limit, the recommended approach is to use S3 to store the data, and pass ARNs instead of the raw data. However, that negates the point of this blog post because now you’re managing state in S3.

Step Functions also has a hard limit of 25,000 entries in the execution history. To avoid reaching this limit for long-running executions, you have to implement a design that can split ongoing work across multiple workflow executions. This also would negate the point of this blog post, by forcing you to keep transient execution state in some sort of external data storage.

Conclusion

With well-defined inputs and outputs, functions can be composed into larger workflows that share state without the need for external data storage. AWS Step Functions offers exactly that functionality, packaged as state machines.

As Jonas Boner said in the above-mentioned article:

The output of one function can become the input of the other, making it possible to string them together just like Lego blocks, composing larger pieces of functionality from small reusable parts. Individual functions are by themselves not that useful because they (should) have a limited scope, providing a single well-defined piece of functionality. But by composing them into larger functions you can build rich and powerful components or services.

That’s exactly the kind of functionality that Step Functions gives you, except that it’s not really marketed that way. Maybe that will change over time?


To learn more about how to build systems using serverless microservices on AWS, you should read my book.
https://leanpub.com/serverless