AWS Lambda lifecycle and in-memory caching

The ephemeral nature of AWS Lambda functions might have you believe that techniques like in-memory caching cannot be utilized like you would with a conventional server, but that’s not the case! Let’s check out why.

To understand this first you need to understand how the Lambda function lifecycle works. Suppose you had a function named “query”, the first time it’s invoked, AWS will fetch your zipped source code from S3, launch a container in the cluster to represent your function, and the event is passed to your handler, finally returning the response as shown below, simple right?

After the invocation Lambda will “freeze” your container preventing it from doing “async” or “background” work. A subsequent request will “thaw” the container and pass the new event to your function handler. This container will remain in the cluster, ready to “thaw” and serve requests so long that it isn’t idle for too long, after which it may be discarded entirely. These details are unspecified by AWS.

So far we’ve only looked at successive calls to a function, what about concurrent calls? This is where AWS Lambda gets a lot of its power. A single Lambda function container can only serve one invoke at a time, so concurrent requests will trigger AWS to fetch and launch additional containers for response. These containers will come and go as necessary to serve your traffic.

With those basics out of the way, back to the topic of caching. Conceptually you should consider the each invoke to be completely isolated, but in reality this is simply not the case. It might be easy to write this off as an implementation detail, but I can’t see it changing any time soon.

Because the container sits around waiting for subsequent requests and the memory allocated to it does not magically disappear each time, we can store data for future requests. The diagram below shows how you may typically access an external resource such as S3, with a caching mechanism such as LRU you can store “hot” data for immediate access, 1–2s requests to S3 can quickly become milliseconds.

Note that you are of course limited by memory, as the largest Lambda function currently allows for 1536 MiB of data, though I think that is plenty for most use-cases.

One thing to consider is that as your concurrency grows, the hit rate will decrease, as the data may not exist in that particular container, so you will have to consider this in your use-case.

If the additional cost and latency is not a problem, a centralized cache such as Redis (via Elasticache for example) may be more ideal, however in either case you might as well take advantage of the memory you already have available!


This is relatively trivial but I hope it helps some people realize that Lambda is a bit more flexible than you may have previously thought.


As always, if you’re in need of uptime & performance monitoring please take a look at Apex Ping!