Choosing the right amount of memory for your AWS Lambda Function

With 46 values to choose from, what is the optimal memory configuration for your AWS Lambda Function?

With AWS Lambda there aren’t many options needed for your functions to run. Only two parameters affect runtime behavior: timeout and memory.

Timeout is value between 1 second and 15 minutes. In my opinion, it make no sense to set the timeout less than the maximum value. If your code executes in less time, you get charged less. Why bother with less?

That leaves us with memory. You can set the memory in 64 MB increments from 128 MB to 3008 MB. AWS Lambda allocates CPU power proportional to the memory, so more memory means more CPU power. Right?

Well, I didn’ know, so I ran a little experiment.


Disclaimer

I did a microbenchmark. Please take all numbers with a large grain of salt. This is just plain number crunching. Your everyday applications do something else. Additionally, this code runs Java on a JVM. Other programming languages might show different results. I don’t claim to be an expert. I am not even a scientist. I am programmer, I create bugs for living.


Setting up the stage

My idea was to run a piece of code that solely relies on raw CPU power, measure the execution time for every possible memory setting and run it often enough to get some numbers.

If we refrain from touching memory, we can avoid side effects that tamper with the execution time, such as heap memory allocations and garbage collection. Run this code over the course of several days, at different times. This should give us sufficient data to investigate.

The Nth Prime Lambda Function

I ended up using a non-optimized Nth Prime Algorithm. This makes a nice number crunching AWS Lambda Function. It is simple enough to deploy and invoke.

On my 2,2 GHz Intel Core i7 computing the 10,000th prime (=104729) takes on average 1.2 seconds and uses 8 MB. Not sure how much JVM startup time distorts the measurement, but it is a good reference point.

Measurement procedure

After uploading the Nth Prime Algorithm to AWS Lambda, I wrote a shell script that conducts the experiment. Here is what it does:

For each of the 46 possible memory configurations starting with 128 MB:

  • Adjust the memory configuration to the new value
  • Invoke the function once to warm up the container
  • Invoke the function ten times and collect the reported execution time

I ran this script ten times in AWS Region Frankfurt (eu-central-1) over a couple days, at different times. In the end I had 100 execution times for each of the 46 memory configurations.

Code and Data is in my GitHub repository.

Show me the numbers

Alright, let’s see what we got. Below is the minimum, maximum, mean and standard deviation of the execution time for every possible memory setting starting from 128 MB to 3008 MB.

+--------+---------+----------+-----------+-----------------+
| memory | min | max | mean | sstdev |
+--------+---------+----------+-----------+-----------------+
| 128 | 3001.01 | 10450.78 | 7935.6063 | 3256.2320344706 |
| 192 | 3000.11 | 6952.83 | 5573.8957 | 1701.6498738484 |
| 256 | 3000.61 | 5245.3 | 4426.6759 | 945.70408936015 |
| 320 | 3000.47 | 4494.92 | 3712.9086 | 480.04802410647 |
| 384 | 3000.53 | 3904.63 | 3228.8366 | 183.35724265152 |
| 448 | 2621.3 | 3003.31 | 2885.9706 | 112.36530765896 |
| 512 | 2269.18 | 2604.81 | 2504.3341 | 80.455624234102 |
| 576 | 2081.46 | 2326.49 | 2237.709 | 55.097036630426 |
| 640 | 1904.89 | 2069.99 | 2016.4533 | 44.367618049927 |
| 704 | 1713.38 | 1877.5 | 1829.5894 | 41.276714785394 |
| 768 | 1564.92 | 1727.36 | 1679.7336 | 34.334527473202 |
| 832 | 1427.63 | 1587.1 | 1546.2559 | 39.962233840531 |
| 896 | 1321.59 | 1484.06 | 1438.0431 | 35.145737638173 |
| 960 | 1236.38 | 1481.55 | 1340.8897 | 37.069090651475 |
| 1024 | 1162.81 | 1347.85 | 1258.9367 | 31.85923363466 |
| 1088 | 1094.01 | 1409.83 | 1187.7633 | 36.408821360192 |
| 1152 | 1026.12 | 1185.37 | 1115.7696 | 31.037877801555 |
| 1216 | 976.31 | 1090.08 | 1057.3625 | 27.015196037031 |
| 1280 | 930.57 | 1143.32 | 1006.6163 | 29.690574760453 |
| 1344 | 883.66 | 1126.23 | 962.1384 | 28.602906743578 |
| 1408 | 846.55 | 938.79 | 915.182 | 20.839842134767 |
| 1472 | 813.88 | 943.67 | 892.8046 | 21.582735825121 |
| 1536 | 817.77 | 978.29 | 862.4818 | 19.140438411537 |
| 1600 | 766.74 | 918.02 | 824.4562 | 20.273283996432 |
| 1664 | 754.49 | 843.99 | 804.8531 | 19.96859295079 |
| 1728 | 732.68 | 840.46 | 799.0838 | 23.00475479091 |
| 1792 | 738.24 | 843.02 | 792.867 | 28.09364486531 |
| 1856 | 676.54 | 842.73 | 784.9241 | 45.456639503371 |
| 1920 | 684.7 | 842.71 | 787.0235 | 47.376128847722 |
| 1984 | 685.64 | 848.84 | 786.838 | 47.439545793046 |
| 2048 | 683.92 | 847.95 | 784.0749 | 47.270177744856 |
| 2112 | 681.11 | 847.67 | 793.2913 | 38.018518976436 |
| 2176 | 668.57 | 838.26 | 783.6352 | 47.121944521795 |
| 2240 | 670.63 | 849.99 | 787.01 | 48.075066596759 |
| 2304 | 668.56 | 854.77 | 768.8548 | 59.883440797751 |
| 2368 | 666.98 | 841.72 | 786.9664 | 49.060118706696 |
| 2432 | 670.24 | 846.8 | 763.6157 | 56.326999881643 |
| 2496 | 684.59 | 838.51 | 780.1577 | 44.356753960691 |
| 2560 | 673.01 | 908.11 | 769.2397 | 60.128025373054 |
| 2624 | 668.88 | 854.33 | 795.439 | 39.758556985074 |
| 2688 | 669.41 | 842.56 | 765.3316 | 57.196403004088 |
| 2752 | 683.71 | 902.19 | 770.5524 | 60.526736710383 |
| 2816 | 669.08 | 845.12 | 782.3667 | 46.900907338793 |
| 2880 | 666.96 | 845.42 | 774.8392 | 53.368037301499 |
| 2944 | 672.89 | 839.27 | 771.4477 | 50.380980755929 |
| 3008 | 684.99 | 842.87 | 771.2135 | 49.182884461872 |
+--------+---------+----------+-----------+-----------------+

Looking at raw numbers is no fun, but nonetheless we can spot some patterns:

With memory settings less than 1024 MB the execution time varies a lot. 128 MB gave several runs which took 10 seconds. A couple days later, the same code took only 3 seconds to compute the 10,000th prime number. Pretty unpredictable if you ask me.

Around 1408 MB the Lambda function does not run much faster if we keep adding memory. There is also not much variance in the execution time. The code runs around 800 ms on average. Which is great! Unpredictable execution time is not something we want in a serverless environment. More memory doesn’t yield faster execution times. You only end up burning money.

Conclusion

I measured the time it takes to compute the 10,000 prime for every possible memory setting. Execution times drop fast until we hit a plateau at around 1408 MB. Unless you really need the memory you won’t get any further speed benefits from increasing the memory at this point.

AWS Lambda does not allocates CPU power proportional to memory, it allocates CPU time proportional to memory. With more memory your chances of getting a bigger slice of cpu time increases and at a certain threshold you more or less have the CPU for yourself.

What I did not do, was running this experiment in a different AWS Region. I only measured in Frankfurt. Maybe things are faster in Tokyo? Slower? Who knows.

AWS Lambda natively supports Java, Go, PowerShell, Node.js, C#, Python, and Ruby code, provides a Runtime API which allows you to use any additional programming languages to author your functions.

Different programming languages produce different outcomes. I only tested with Java.

There might be some benefit if we use multiple threads. Not sure what happens if we span multiple threads and measure the execution time. This is definitely something to figure out.

Feel free to try this code out for yourself. You can check everything in my GitHub repository. Use a different programming language, a different AWS Region, whatever you like. I’d love to hear your feedback!

Further Reading