Lambda or Containers
Like many other Kubernetes (K8S) users, perhaps you’ve recently found yourself waking up dehydrated with a severe headache. You realized that the technology isn’t that easy to operate, it can be costly, and, in many cases, you might not even require the fine-grained controls and the platform team that come with it.
I recently checked whether Lambdas could serve as a viable alternative. While my study focused on AWS and a specific workload, the general findings can easily be applied to another Cloud Service Provider (CSP) and deployment. Let’s quickly review the good, the bad, and the ugly.
Introduction
A container essentially functions as a lightweight Virtual Machine (VM). It involves bundling your application binaries with the required Operating System (OS) and middleware packages. The image you test on your local machine is the same one deployed in production. In a production environment, Kubernetes (K8S) takes charge of deploying and managing these containers at scale. Among its responsibilities, K8S ensures the appropriate number of hosts and containers, as well as load-balancing traffic.
However, this approach comes with its own set of considerations. Configuring and maintaining it correctly demands a significant investment of engineering time. Tasks include defining the host pool for container deployment, distributing them across different availability zones for resilience, and optimizing the pool size to control costs. Each K8S cluster incurs a base cost for running the control plane and any CNCF add-ons you decide to deploy, and regular patching of both containers and the control plane is necessary.
Lambda, on the other hand, operates on a serverless application model. It involves breaking down your application into functions, zipping the code and its dependencies, uploading it to AWS, and exposing it through the Amazon API Gateway. While offering fewer options, AWS takes care of several tasks for you. It deploys necessary infrastructure on demand, ensures the use of up-to-date OS and middleware, scales within a region as needed, and distributes the load across availability zones for resilience. Importantly, you only pay for the actual usage of your functions; if a function is not invoked, it costs you nothing.
Cost perspective
When utilizing containers, it’s essential to provision adequate capacity to handle incoming traffic. Although Kubernetes (K8S) assists in automatically starting and stopping hosts or containers to match demand, a fixed minimum cost and idle capacity persist — typically two to three hosts and containers spread across different availability zones, prepared to receive traffic even during quiet periods. Even under load, some unused capacity remains, as K8S initiates new containers or hosts before hitting capacity limits.
Lambda are charged on the usage. AWS queues requests, launching and stopping execution environments to align with demand. Costs closely follow your traffic curve, rising with user connections and traffic and dropping to zero during periods of inactivity. This proves advantageous for environments with sporadic or limited traffic, such as test systems or those experiencing minimal nighttime activity.
However, there are limitations to this model. If your function has low CPU consumption, like a simple encoding/decoding function, and experiences a consistent call volume throughout the day, handling the traffic on a small number of containers may be more cost-effective. For example, a K8S cluster running 500 services, 1000 container instances, processing 10 billion service calls monthly in production, and 2 billion in test, showed that migrating services to Lambda would reduce costs by 3 in the test system and increase costs by 50% in the production system.
Further analysis revealed that 30 services captured 80% of the traffic, indicating a need to optimize by caching responses or adjusting the choreography. Implementing such changes using Lambda would result in a 60% cost reduction in production. In essence, altering the pricing model shifts optimization focus from reducing CPU usage to eliminating unnecessary function calls.
Regardless, managing Lambdas and their underlying infrastructure is considerably simpler, leading to a significant reduction in maintenance costs.
Cold start and mitigation strategies
One caveat with Lambda is that each request requires its own execution environment. If a request arrives and there’s no available environment, AWS initiates a new one and loads your function code, resulting in what is known as a ‘cold start.’ AWS retains inactive execution environments for a period, aiming to reuse them and minimize additional latency.
In programming environments like Node or Python, the difference is generally acceptable (a few hundreds milliseconds). However, with Java, the phenomenon is particularly significant, especially when utilizing Spring libraries. Due to different design assumptions, the launch, class loading, and dependency injection can take several seconds.
For example, bridging a SpringBoot controller using a Lambda adapter yielded a 6-second latency in my test, and this can easily exceed the 10-second mark with a more complex package. To address this, AWS introduced a new Lambda option in early 2023 called ‘snap_start,’ available exclusively with Java. This option captures a memory image of your JVM at the end of the loading phase and reuses it to expedite future launches.
While this approach isn’t fully transparent, as you need to optimize class loading in a specific section to maximize the benefit, it significantly reduces the ‘cold start’ issue. In my simple test, I achieved a latency as low as 200ms, making it 30 times faster.
Despite this improvement, ensuring that all classes are initialized before the image is captured and predicting the launch time for a new execution environment remain challenging. Running your own tests is crucial to control startup time, and it’s advisable to monitor these events in production to quickly identify Lambdas that require optimization.
If you aim to guarantee a consistent execution time, especially for a specific volume of concurrent requests, you still have the option to provision execution environments. Although this moves away from the pay-per-use model, you’ll pay a fixed cost for reserved instances. However, you continue to benefit from the greatly simplified operational model.
Operational constraints
AWS takes charge of the scalability of your Lambda, dynamically spawning new execution environments as needed. While it may seem on paper that you can scale and utilize all available resources within a region, it’s crucial to approach these promises with caution. Since you’re working with shared infrastructure, AWS implements quotas to safeguard the region and your account from unrealistic demands, thereby limiting your usage.
One example is the concurrent execution quota for all Lambdas in a single account and region. By default, this quota is set to 1000, but it’s a soft limit and can be extended to tens of thousands upon request to the support organization.
However, be cautious about the reserved concurrency option. This allows you to claim a share of the quota for a specific Lambda, ensuring it can scale up to a designated number at any time. Even when not in use, it blocks the corresponding quota, reducing the quota available to other Lambdas. If you apply this setting to all your Lambdas, you may exhaust your overall quota, making it challenging to deploy and execute new Lambdas, even if your account isn’t processing any traffic.
On a more nuanced level, AWS also imposes limits on the rate at which you can create new execution environments — 1000 every 10 seconds. If you receive more than 1000 requests within this timeframe, the extra requests will be queued, awaiting an available execution environment. While most of the time, the noticeable effect is increased latency, some requests might time out and get rejected. It’s essential to recognize and live with this hard limit.
A few points of attention
When considering the migration of existing microservices to a Lambda model, several key factors demand attention. As discussed earlier, the time required to launch an execution environment and load your code can lead to unacceptable latency. Additionally, there is a limitation on the size of a Lambda package (e.g. 250Mb), include only essential dependencies, thoroughly test, and closely monitor the cold start aspect to ensure optimal performance.
Although AWS retains your execution environment for a period, the benefits of in-process caching are limited, given its quick disposal. Opting for a distributed cache like Redis, which remains warm between requests, is preferable. Using a database connection pool is redundant since the function processes only one request at a time, requiring just a single connection. Employing a proxy to direct requests to your database and maintain connections is common practice, and AWS RDS instances offer this as an option.
It’s crucial to note that execution time is limited, even when running your Lambda asynchronously. Termination occurs after a specific duration (e.g. 15 minutes), and access to choices related to compute power, memory, or network placement is restricted. Lambda may not be suitable for high-performance computing (HPC), complex computations, or large data processing. In such cases, standard EC2 instances or specialized services become necessary.
In summary, Lambdas provide basic building blocks that cover most application requirements, but there are scenarios where the simplicity trade-off may not be viable. Replatforming existing Java microservices requires careful rewriting and testing to address the aforementioned points. Despite the existence of some bridges, automatic translation is not a wise assumption.
Conclusion
The Lambda serverless programming model proves highly advantageous for greenfield development, offering operational simplicity with AWS handling more compared to a container-based approach like EKS. In many cases, it can lead to reduced running costs, especially for applications with lighter usage, during testing, or in their initial stages.
The simplicity and productivity of the model, however, come at the expense of fine-grained control. It’s essential to comprehend and weigh these trade-offs. Migrating an existing application to this model is a non-trivial task. The process requires modernizing services to fully harness the benefits, and some services may not align well, necessitating specific solutions.