AWS Lambda and the Node.js Event Loop
--
At radient.io we ❤️ serverless technologies. It helps us to be more productive, cost effective and ship our software faster. We use AWS Lambda in combination with Node.js and learned something interesting recently — how the AWS Lambda execution environment interacts with the Node.js event loop.
Before we dive in, lets quickly refresh the Node.js concurrency model 🤓. If you’re already familiar with the event loop and it’s mechanics, you can jump straight to the AWS Lamba section for the good stuff..
Concurrency Model
Node.js is single threaded and the event loop is the concurrency model that allows non-blocking I/O operations to be performed.
The event loop is what allows Node.js to perform non-blocking I/O operations — despite the fact that JavaScript is single-threaded — by offloading operations to the system kernel whenever possible.
How? Well, we’ll have to discuss the call stack and the task queue first 😅.
Call Stack
Function calls form a stack of frames, where each frame represents a single function call. Every time a function is called, it is pushed onto the stack (added) and when the function is done executing, it is popped off the stack (removed). The frames (elements) in a stack come off in a last-in-first-out (LIFO) order.
Each frame stores information about the invoked function. Like the arguments the function was called with and any variables defined inside the called function’s body.
When we execute the following code:
We can visualize its call stack like this:
1. When the script call-stack.js
starts executing, the call stack is empty.
2. main()
is called on line 15 and pushed onto the call stack.
3. While executing main
, console.log('main start')
is called on line 8 and pushed onto the call stack.
4. console.log
executes, logs "main start"
and is popped off the call stack.
5. main
continues executing, calls work()
on line 10 and is pushed onto the call stack.
6. While executing work
, console.log('work')
is called on line 4 and pushed onto the call stack.
7. console.log
executes, logs "do work"
and is popped off the call stack.
8. work
finishes executing and is popped off the call stack.
9. main
continues executing, calls console.log('main end')
on line 12 and is pushed onto the call stack.
10.console.log
executes, logs "main end"
and is popped off the call stack.
11.main
finishes executing and is popped off the stack. The call stack is empty again and our script finishes executing.
Task Queue
Any asynchronous work in the runtime is represented as a task in a queue, or in other words, a message queue. Each message can be thought of as a function that will be called in first-in-first-out (FIFO) order to handle said work. For example, the callback provided to the setTimeout
or Promise
API.
Additionally, each message is processed completely before any other message is processed. This means that whenever a function runs, it can not be interrupted. This behavior is called run-to-completion and makes it easier to reason about our JavaScript programs.
Messages get enqueued (added to the queue) and at some point messages will be dequeued (removed from the queue). When? How? This is handled by the Event Loop.
Event Loop
The event loop can be literally thought of as a loop that runs indefinitely and where every cycle is referred to as a tick. On every tick the event loop will check if there’s any work in the task queue. If there is, it will execute the task (function), but only if the call stack is empty.
The event loop can be described with the following pseudo code, taken from MDN:
while (queue.waitForMessage()) {
queue.processNextMessage();
}
To summarize:
- When code executes, function calls are added to the call stack.
- Whenever calls are made via asynchronous APIs (e.g.
setTimeout
orPromise
) the corresponding callbacks are eventually added to the task queue. - When the call stack is empty and the task queue contains one or more tasks, the event loop will remove a task every tick and push it onto the call stack. The function will execute and this process will continue until all work is done.
Great, with that covered we can explore how the AWS Lambda execution environment interacts with the Node.js event loop!
AWS Lambda
AWS Lambda invokes a Lambda function via a handler function that is exported, e.g. exports.handler
. When Lambda invokes this handler it calls it with 3 arguments, i.e. function handler(event, context, callback)
. The callback
argument may be used to return information to the caller and to signal that the handler function has completed so Lambda may end it. For that reason you don’t have to call it explicitly. Meaning, if you don’t call it Lambda will call it for you.
Additionally, when using Node.js version 8.10, you may also return a Promise instead of using the callback
function. In that case you can also use the async
keyword, because async
functions return Promises 😎.
Baseline
We’ll create a file called timeout.js
with the following contents:
When we execute this script locally with node timeout.js
the following will print:
> main start
> timeout start
> main end
> timeout cb fired after 5000 ms
The fourth message takes 5 seconds to print, but the script doesn’t stop executing before it does.
What happens in Lambda, stays in Lambda
Now lets modify the code from timeout.js
in order for it to be compatible with Lambda:
You can create a new function in the AWS Lambda console and paste in the code from above. Run it, sit back and enjoy..
Wait, what? Lambda just ended the handler function without printing the fourth message "timeout cb fired after 5000 ms”
. Lets run it again.
It now prints "timeout cb fired after 5000 ms”
first and then the other ones! So what’s going on here?
AWS ✨ Lambda ✨ Execution ✨ Model
AWS Lambda takes care of provisioning and managing resources needed to run your functions. When a Lambda function is invoked, an execution context is created for you based on the configuration you provide. The execution context is a temporary runtime environment that initializes any external dependencies of your Lambda function.
After a Lambda function is called, AWS Lambda maintains the execution context for some time in anticipation of another invocation of the Lambda function for performance benefits. It freezes the execution context after a Lambda function completes and may choose to reuse the same execution context when the Lambda function is called again (but doesn’t have to).
In the AWS documentation we can find the following:
Background processes or callbacks initiated by your Lambda function that did not complete when the function ended resume if AWS Lambda chooses to reuse the Execution Context.
As well as the following somewhat hidden message:
When the callback is called (explicitly or implicitly), AWS Lambda continues the Lambda function invocation until the event loop is empty.
Lets see if we can find more and search for lambda + callback + empty + event loop. Top result is about the context Object. Looking through it there’s actually a property called callbackWaitsForEmptyEventLoop
. Here’s what is does:
The default value is true. This property is useful only to modify the default behavior of the callback. By default, the callback will wait until the event loop is empty before freezing the process and returning the results to the caller.
OK, so with this information we can make sense of what happened. Lets break it down:
- Lambda starts executing our code, the call stack is empty.
main
is called and added to the call stack.- While executing
main
,console.log('main start')
is called on line 16 and pushed onto the call stack.
4. console.log
executes, logs "main start"
and is popped off the call stack.
5. main
continues executing, calls timeout(5e3)
on line 18 and is pushed onto the call stack.
6. While executing timeout
, console.log('timeout start')
is called on line 4 and pushed onto the call stack.
7. console.log
executes, logs "timeout start"
and is popped off the call stack.
8. timeout
continues executing, calls new Promise(callback)
on line 6 and is pushed onto the call stack.
9. While new Promise(callback)
executes, it interacts with the Promise API and passes the provided callback to it. The Promise API sends the callback to the task queue and now must wait until the call stack is empty before it can execute.
10. new Promise
finishes executing and is popped of the call stack.
11. timeout
finishes executing and is popped off the call stack.
12. main
continues executing, calls console.log('main end')
on line 20 and is pushed onto the call stack.
13. console.log
executes, logs "main end"
and is popped off the call stack.
14. main
finishes executing and is popped off the call stack. The call stack is empty.
15. The Promise callback (step 9) can now be scheduled by the event loop and is pushed onto the call stack.
16. The Promise callback executes, calls setTimeout(callback, timeout)
on line 7 and is pushed onto the call stack.
17. While setTimeout(callback, timeout)
executes, it interacts with the setTimeout
API and passes the corresponding callback and timeout to it.
18. setTimeout(callback, timeout)
finishes executing and is popped of the stack. At the same time the setTimeout
API starts counting down the timeout so that it can schedule the callback in the future.
19. The Promise callback finishes executing and is popped off the stack. The call stack is empty again.
At this point the call stack and task queue are both empty. At the same time a timeout is counting down (5 seconds), but the corresponding timeout callback has not been scheduled yet. As far as Lambda is concerned, the event loop is empty, it will freeze the process and return results to the caller!
The interesting part is that Lambda doesn’t immediately destroy it’s execution context. Because if we wait for +5 seconds and run the Lambda again (second run), we see the console message printed from the setTimeout
callback first.
This happens because after the Lambda ended (first run), the previous execution context was still around and after 5 seconds the setTimeout
API sent the corresponding callback to the task queue.
When we execute the Lambda again (second run), the call stack is empty with a message in the task queue, which can immediately be scheduled by the event loop.
This results in "timeout cb fired after 5000 ms"
being printed first, because it executed before any of the code in our Lambda function.
Doing it right
Obviously this is undesired behavior and you should not write your code in the same way we wrote the example code. Like stated in the AWS documentation, we need to make sure to complete processing all callbacks before our handler exits.
You should make sure any background processes or callbacks (in case of Node.js) in your code are complete before the code exits.
Therefore we’ll make the following change to our example code:
This change makes sure the handler function doesn’t stop executing until timeout
finishes. When we run our code with this change, all is well.
In Closing
I intentionally left out some details about the the task queue. There are actually two task queues! One for macrotasks (e.g. setTimeout
) and one for microtasks (e.g. Promise
). According to the spec, one macrotask should get processed per tick. After it finishes, all microtasks will be processed within the same tick. While these microtasks are processed, they can queue more microtasks, which will all be run in the same tick.
For more information see this article from RisingStack where they go more into detail. I highly recommend you read it.
Thanks for reading, I hope you learned something new! If I explained something incorrectly 🐞 or if you have something to add, please let me know in the comments.