The curious case of zombie lambdas
Friday evenings, that’s when you normally face the worst of your nightmares. For us it’s generally the unpredictable behaviour of the systems we are not familiar with that don’t bother to reveal it during the weekdays.
This was just one of them. I’ve been working with serverless for quite a while now and I can’t still get over how easy it makes things for us. I knew lambdas were as easy as they could get until I encountered this scenario.
It was a combination of chaining and async-awaits. Not at all recommended, but I didn’t want to make things more complicated for the team implementing the code by asking them to convert everything to async-awaits. Why, you ask? Valid question. One, because the team was already working very hard to make sure things were on time, I just didn’t want to add this other thing on their plate. And two, the team wasn’t very familiar with promises. Moreover, with some third party functions that would need to be promisi-fied, it didn’t look fair to add unnecessary (in this case) complexity. We had to make it work with minimal changes.
Once run, the code exhibited erratic behaviour. On some instances it would run, while it would simply not show any signs on execution on the other. All this data collected from the lambda logs. And if this wasn’t enough, we were able to see unexpected logs completed unrelated to our scenarios in some cases.
My gut feel was that it was a side effect of ‘asynchronousity’. But then why would the code work fine when executed on a local server? Local server — that was the hint! Server — that was the real hint! The only difference I could see between the two environments was the existence of server on one, and the lack, or shall I say psuedo-existence of server on the other. No matter how you write your async code, the chances of it getting executed on a server are high, since the event loop continues to persist as long as you run the server. I wanted to know how this translated to lambda in case of an async miss. What would happen if you missed that await or forgot to fulfil the promise at the right place?
We tried to drill down exactly where we might have missed or misplaced the resolution of promise. And with a close guess, boom! The code worked wonders on the serverless environment too. As happy as I was seeing it work just fine, I was curious to know what would happen in a server-less environment when you have unresolved promises pending and your lambda completes execution.
There isn’t extensive documentation on the same. But I fortunately could find few threads: —
https://lumigo.io/blog/node-js-lambda-execution-leaks-a-practical-guide/.
And that completely made sense. The root cause was as anticipated, problems in the implementation of async code. The code not executing at times was a result of async code being pushed to event loop for later execution and the unexpected logs were a result of the same code getting picked up from the event loop in cases where lambda execution context was reused.
It sure did feel great on having deciphered the cause behind this sudden issue. A friday evening well spent!
Cheers!