Understanding how AWS Lambda scales
AWS Lambda service is simply, infrastructure that gets allocated to your function on-demand as per need. When the need increases new infrastructure is automatically created internally which executes your function. The size of the unit of infrastructure is defined by you when you create the function, AWS allows us to select memory for the function and CPU allocation is directly proportional to the memory that you chose, what this means is that if you choose 128MB of memory you get x CPU while choosing 256MB gives you 2x of the same.
Lambda scales on the basis of unit of work, while unit of work
varies on the Lambda source type. Each unit of work is executed on a
dedicated infrastructure while the said infrastructure is open for
re-use for subsequent calls but not while the current call is
executing, while you pay for only the duration of execution of individual requests proportional to the memory allocation of the function.
There are two types of Lambda function invocation type and the unit of work depends upon the said types:
- Streaming source typesAWS Lambda developer guide explains this part, perfectly as:“If you create a Lambda function that processes events from stream-based services (Amazon Kinesis Streams or DynamoDB streams), the number of shards per stream is the unit of concurrency. If your stream has 100 active shards, there will be 100 Lambda functions running concurrently. Then, each Lambda function processes events on a shard in the order that they arrive.”
- Synchronous source type Each request (or invocation) gets a dedicated infra to use at a time, i.e. at any given point of time your Lambda function will get a single request, having said that however, after a request is complete Lambda does reuse the containers for performance reasons to avoid overheads of repeatedly initializing code and re-establishing database connections*.
*AWS does not itself maintain such connections open, this is up to the code that is written.
AWS Lambda publishes the formula for how many concurrent executions to expect from a Lambda function:
Concurrent Invocations = events (or requests) per second * function duration (in secs)
Concurrent executions are the number of Lambda functions running side
by side catering to different requests / streams of data.
Consider this graph from my earlier post around Lambda’s performance with hosting .NET Web API:
Compare this with the number of invocations that happened during this period:
Gauge the graphs, those are meant to make us understand the nature of parallel executions in AWS Lambda.
Clearly, in order to save money and decrease the latency of our Lambda functions, we must decrease the running time of Lambda functions, the strategy to do that is two-fold:
- Choose the right size of the Lambda function, not necessarily the smallest.
- Follow best practices in coding and leveraging Lambda’s unit of the infrastructure lifecycle.
To reiterate, unit of infrastructure in AWS Lambda is the compute
resources set aside for each unit of work. The size of this unit of
infrastructure is determined by user configuration determined by
chosen memory size which has a proportional CPU attached to it. The
infrastructure is used for one unit of work at a time and is
generally reused to serve subsequent requests.
Lambda configuration best practices
Choosing the right timeout
While creating your Lambda function you can choose two things, memory, and timeout. Memory is the next topic, timeout is what it sounds like, AWS will terminate executing your Lambda function in case it runs for greater than the defined value for the same.
This has a subtle value, consider that you created a Lambda function which puts stuff into a database, let’s say Elasticsearch as an example. What happens if your ES cluster is hung for a few minutes (large index migration, faults etc)? Your Lambda function will try to connect to that until that connection times out, if not configured right, this will lead to your function taking over a minute for every log that needs to go into ES rather than the few milliseconds it used to take before, increasing your costs by about 1min / 200ms (bloated regular time to put stuff into ES) = 300 times!
Make the timeout to something acceptable like 4–5 seconds in the above case.
Choosing the ‘right’ size of the Lambda function
You can test how much memory your Lambda code is using up by running a
Test from the AWS Console*, see screenshot below:
* You can configure the test event to customize the input that your Lambda function receives by clicking on
Actions in the same screen and then
Configure test event
If you looked at the screenshot closely, Lambda helpfully gives the
Max memory used by your function in the test which you can use to adjust the memory allocation of the same.
Choosing the right memory size can be tricky, let’s consider two extreme schools of thought first:
I will choose the minimum amount of memory that my function needs in any given execution to save maximum cost
Remember that CPU is proportional to the memory, lowering the memory by half cuts the CPU by half. Your code might finish a lot faster with more memory than with less mitigating the effects of adding the memory (cost-wise) while decreasing the latency.
I will choose to give the maximum allowed memory to my Lambda function so that it executes the fastest
There is a limit beyond which memory, CPU will be useless if you have the money, this is the safest option albeit a wasteful one.
Coding best practices
Low startup time
Since infrastructure is allocated dynamically in Lambda and the unit of infrastructure is small to account only for one request at a time, its best to keep the startup time of the function to be lowest in order to avoid large fluctuations in latencies when new unit of infrastructures are created and to avoid costs of that startup itself.
Just to sear the new thought here, let’s consider an example. The typical methodology of increasing performance in the traditional server/VM model is to get all the intensive work done at startup e.g. reflection-based assembly scanning, database model formations (think ORM’s), loading up frequently used master data from the database, etc. You need to rethink all of these.
In my opinion, when going serverless, I wouldn’t think of using RDBMSs (although you can) and instead use databases like Elasticsearch, DynamoDB, Mongo, etc. I wouldn’t use an expensive ORM (startup–latency-wise) to connect to something which cannot horizontally scale. Having said this, this is my opinion and the shoe may not fit all.
From the definition of the unit of infrastructure, we can determine that the lifetime of the infra is not under our control, hence, in order for the code to be functionally correct, it needs to be stateless.
Stateless here means that the function should not depend upon state stored in memory or disk or anything local to the infrastructure itself. AWS gives a plethora of services to ensure you won’t need to place much effort into getting this ready.
Need fast querying without keeping stuff in memory? Use Elasticsearch, I wrote an article on getting started with Elasticsearch in case you are interested.
Need to do logging without effecting latency of the function and without worrying about the lifetime of the infrastructure? See the logging heading below in the article :-).
Although the unit of infrastructure is allocated dynamically which forces stateless programming, we can still use the fact that infra is in fact re-used for subsequent requests. The latter means that you can:
- Re-use external (cache server, database etc) connections across function calls (remember that you don’t have to worry about concurrency here).
- Utilize the functions of local storage which is 500MB in size in the /tmp directory. Code in such a way that if you don’t find it, you can always go to a central store to re-populate it. Don’t keep stuff which needs to be shipped out, remember that the storage is volatile and can be wiped out at any time after a function call ends… keep read-only data.
- Keep stuff in memory, keep this small, remember functions have very limited memory a lot of it would be required to execute your code efficiently.
In a scenario where infrastructure can be taken away from us at any time leaving any background thread which was still running, destroyed, it would render standard asynchronous logging useless, not to mention logging to files. Making synchronous calls to logging would be horrendous to latency (and hence, also cost) no matter how fast the receiving end of that logline and would be severely dependent on the number of log lines that you have (shudder).
Fortunately, AWS recognized the above problem and as part of the programming model in Lambda as long as you write your log lines in the console which is
console.log in node and
Console.Writeline in c# core, which would be synchronous and would take negligible time, AWS takes care of shipping those logs out to Cloudwatch for you… asynchronously while giving a guarantee of getting those logs there even if the said Lambda infrastructure is destroyed. How they do it is, of course, immaterial as long as they do it and we don’t suffer latency issues.
Cloudwatch is good and all, but I like my xyz log storage
The above statement is true for me as well, I like Elasticsearch better. Eventually, as the system matures I would like my logging to be happening somewhere other than Cloudwatch, you can use Cloudwatch streams to achieve this to put it wherever you want using another Lambda function. AWS gives a pre-cooked recipe to put stuff in Elasticsearch. See the screenshot below for where these options are:
The pricing for this streaming can be found on the AWS pricing page for Cloudwatch.
Here is an example for your convenience:
1.00 per million custom events generated + Lambda cost (lower function timeout to prevent surprises) + ES cost — logstash cost — maintenance
By the way, in case you missed it, AWS recently launched a nuget package to support deploying existing (or new) .Net core Web API projects directly into Lambda, I tested it and the performance it gave was pretty impressive especially considering the package was still in preview mode at the time of writing this article.
Author: Sumit Maingi, CTO @ Locusnine