Developers who enjoy using serverless usually hate one thing about it: cold starts. We decided to tackle this issue and open source it.
X-Lambda will monitor your AWS Lambda functions, analyze past behavior and use statistical methods to forecast short-term invocation demand. Running on a scheduled basis, it will keep the right number of containers warm, mitigating cold start latency.
It is now in alpha. Watch the project on Github to receive updates as we move forward with development.
Where the idea came from?
At Dashbird.io, we monitor more than 250,000 functions on a daily basis and one of the metrics our service generates is about cold starts. Many developers ask us:
Ok, but how do I “solve” these cold starts?
We weren’t fully satisfied with the possible solutions and decided to create a better one. It’s open sourced under MIT License, so you can use and modify it freely for any purpose.
Contributions to the project are highly appreciated, we want to build this together with the serverless community. Please use the Github issues page to raise any ideas or point out problems.
Under the hood
The project is implemented in Python 3.7 (the best for anything relying on statistics), meant to run on AWS Lambda, deployed with the Serverless framework. It is themed after X-Men characters. Because, well, why not name a Lambda function after Wolverine?
Oh yeah, X-Men, that’s why the logo above looked familiar!
Four Lambda functions take care of everything:
Of course, we wouldn’t let Professor Xavier out of this mission. This function takes care of controlling the entire warming process, coordinating the other Lambdas.
Professor will get the list of functions to keep warm and fan-out tasks to three of his team members:
- Wolverine: retrieves the necessary Lambda metrics
- Jean: forecasts container demand
- Cyclops: fires up the right number of Lambda containers
Logan will gather all the information we need so that the other Lambdas can accomplish their tasks on a timely and precise manner.
The data is retrieved from AWS APIs:
- CloudWatch (GetMetricData): a time series of Lambda concurrent requests;
- Lambda (GetFunctionConfiguration): basic info about the function, such as memory size, whether it’s running inside a VPC, etc;
All these data points will influence the warming dynamics. The startup time of Lambda containers is longer inside a VPC, for example, and this measure is important to avoid containers being reused during the warming process. Allocating more memory, on the other hand, speeds up the startup and execution time.
Based on the Lambda concurrent invocation metrics, the Marvel Girl forecasts how many containers will be needed in the near-term.
An Exponential Smoothing model is used to extrapolate the forecasts, optimized by maximizing a log-likelihood function. There are dozens of Time Series forecasting techniques out there. We chose Exponential Smoothing because it’s simple and favors recent observations above older ones, which enables X-Lambda to react more quickly to unexpected spikes in demand.
Currently, we use historic concurrent requests provided by CloudWatch, with a 5-minute resolution, extrapolating the next 15 minutes (3 predicted slots of 5 minutes each).
Scott Summers is responsible for adjusting its laser power and firing up the right number of containers for a given Lambda function.
Cyclops will invoke Lambda providing a specific payload that allows your functions to properly identify and adjust to X-Lambda requests. This way, the execution can be short-circuited to only serve as a warming request, skipping execution of the entire function’s code. Functions will also be able to sleep for the right period of time before terminating the execution to make sure containers will not be reused during the warming process.
In the project README we detail more about:
- Current limitations and what to expect from this alpha version
- The warming dynamics and how we prevent containers from being reused
- Relevant architectural aspects of the project
- Our vision for the project roadmap
Again, your ideas as very welcome. We want to perfect this project and make it stable and scalable for production environments as soon as possible. Please share in the comments below or in the Github issues page.