Pick the right memory size for your AWS Lambda functions
Getting it right can improve speed and decrease costs. But it’s not as straightforward as it seems.
AWS's only performance parameter is the memory size. But what is the ideal size for your real-world function? To illustrate the importance, we will go over a serverless API function that is part of ShortrLink and compare the effect of memory size on performance and costs.
AWS couples memory size and CPU power
Lambda only lets us directly configure the memory size. However, this metric is directly connected to the CPU power that you are getting. In short, the power you get scales linearly until ~1.7 GB of memory is reached. At this point, Lambda provisions 1 full vCPU core. After that, every additional MB still means more power — but on additional CPU cores. In theory, only multi-processing functions benefit from more than 1.7GB of memory!
Now, this sounds pretty straightforward, but what does that mean in real-world terms? Most Lambda functions in the wild are not just calculating Fibonacci sequences by themselves but instead are connected with other services and constrained by the response time of these outside APIs. Have you ever deployed a function for a GET API that does nothing more than fetching an item from a database and perform some basic data transformations? If so, you’ve probably wondered if you really need more CPU power.
Experimental design: A basic serverless GET API
To test for the point of diminishing returns, I am using a real-world function that:
- Receives an HTTP event from API Gateway
- Gets an item from DynamoDb (using Pynamodb)
- Publish an event to EventBridge (using Boto3)
- Return an HTTP response
As a result, we depend on two other AWS services within the same region of our Lambda function. We do not (directly) perform any calculations but have multiple transformation steps during the runtime. The Lambda configurations I am testing for are 128MB, 256MB, 512MB, 768MB, 1024MB, 1280MB, 1536MB, 1792MB, and 2048MB.
The Lambda function, written in Python, gets called 500 times from my local machine using the API Gateway HTTP endpoint. All requests are made in sequence so that each execution will reuse the same runtime. That way, we are still testing for pure execution duration by limiting the number of cold-start we have to exclude for our analysis.
Of course, in real-world applications, cold-starts are a relevant factor. Especially for a smaller project, you cannot expect every API endpoint to be called at least once every 5 minutes throughout the day. For that reason, there is a second round of executions that calls each configuration 20 times while forcing a cold start each time.
The main source of information is AWS Cloudwatch logs. In addition, I am also measuring the time it takes from making the GET request to receiving an answer, thus, measuring the responsiveness of the API from an end-user perspective. As a caveat, all resources are in AWS’s eu-west-1 region, while the machine that makes the API requests sits in South-East-Asia. Therefore, when looking at the numbers, you have to keep the general network latency in mind.
Results: 768MB optimize for speed, 512MB for costs, 1024+MB when expecting many cold-starts
With the experimental design out of the way, let’s have a look at the numbers:
In the lower regions, the effect of increasing the memory size (and therefore CPU) is extreme. While the median execution time with 128MB memory is a proud 226 ms, 256MB brings this down by 60% to 91ms. However, there is also a clear elbow around the 512MB mark. Doubling the memory from 512MB to 1GB only improves our performance by 30 % to 27ms. In fact, the execution time stays the same at the p50 and p90 quantiles after reaching 768MB.
This also has implications for the cost of our lambda functions:
While 768MB gives us the optimum in performance, we can see a clear cost-minimum of 512MB. Here you will have to decide if you are willing to pay more to get faster response times. At the same time, we also observe that provisioning too few resources can drive up the costs.
The cold start problem
So far, we have only considered the best-case scenario: Always warm functions, all models pre-loaded for Lambda to use. But what about the dreaded cold starts? Here, things look a little bit differently:
Unlike our warmed-up functions, cold start durations continuously decrease with added memory/CPU cores. But even here, we see diminishing returns. Going from 1GB to 2GB only results in a 40% speed increase. Consequently, adding more resources is not always cost-efficient:
Again, we observe a local optimum, but this time it’s at 1024MB, twice as much as before. Therefore, analyzing your specific use case and estimating execution frequencies becomes necessary for your design process.
Don’t forget about the real-world experience.
Lastly, what does this increase mean for our end-user? Given 9,500km between server and client, as you can imagine, not that much:
What looked like a big difference when only observing the Lambda durations becomes barely visible in our chart. Going from 128 to 256MB still decreases latency by ~15%, going from 512 to 768MB only decreases latency by 6%.
Of course, this is an extreme example. However, it illustrates that while it’s easy to get obsessed with optimizing the performance of our core function, external factors can have an equal or even much high effect on the experience of our end-user.
Conclusion: Know your use case
In the end, we are left with three options: If you know that your function will barely get cold and you optimize execution duration purely: 768MB is our winner. However, if cost is your foremost concern, reducing memory size to 512 MB will reduce costs by 7.4% — but also increase your execution times by a steep 36%.)
In the case of rarely used functions, more is better. Sure, you can reach the cost optima by choosing 1GB, but even at 3GB and on average 1 execution per hour, your monthly costs will be <1ct/month. Therefore, the recommendation would be to pick at least 3GB, potentially more (after testing for further scaling, of course).
The dependency on the latency of 3rd party services means that numbers for your function will look different. Every time your function is waiting for the response of another service, it sits idle (unless threaded.) In the end, you will have to test for yourself what your individual sweet spots looks like.