AWS Lambda Performance Best Practices

A checklist for Cloud Engineers to live by

Thilina Ashen Gamage
Platform Engineer
11 min readJan 26, 2024

--

In this blog, let’s discuss the best practices to optimize AWS Lambda performance and secure peak efficiency for a Lambda deployment. Whether you’re just getting started with AWS or are already experienced, these insights will help you maximize your application's performance while leveraging AWS's cutting-edge capabilities.

✅ Choose an efficient runtime

When comparing cold start times, Python runtimes are the fastest, and Node.js runtimes can be nearly as fast as Python too — while Java can be 3 times slower than Python. To compensate for Java’s poor performance, the cure is to allocate more memory — which can roughly cost 2 times more than that of Python or Node.js. Therefore, even though Lambda supports multiple runtimes, make sure that you pick an efficient runtime like Python or Node.js unless you have a really good reason not to do so. As a rule of thumb, remember that usually interpreted languages (e.g. Node.js, Python) perform better for Lambda even though in some special cases (e.g. for subsequent requests), compiled languages (e.g. Java) can perform better.

The impact of cold starts in Lambda varies depending on which runtime a function is written in. For instance, Node.js and Python functions experience the shortest cold start durations whereas Java functions experience the longest. Java’s average cold start duration — which is nearly three times as long as Python’s — is likely due to the time it takes to load the Java Virtual Machine (JVM) and libraries Java requires to run. Source: datadoghq.com
The amount of memory allocated to Lambda functions varies by runtime. This is likely because increasing the memory allocated to a Lambda function also increases the amount of CPU allocated to it, which helps to reduce cold start durations. Runtimes that already have the shortest cold start durations — such as Python and Node.js — typically have less memory allocated to them than Java. Source: datadoghq.com

✅ Mitigate cold starts

Implement strategies like provisioned concurrency, reserved concurrency, Lambda warmers, proactive initialization, and Lambda SnapStart to minimize the impact of cold starts and reduce latency.

Provisioned Concurrency vs. Reserved Concurrency: At t1, function-orange begins receiving requests. Since Lambda has pre-initialized 200 execution environment instances (i.e. provisioned concurrency), function-orange is ready for immediate invocation (i.e. no cold start latencies). At t2, function-orange uses up all its provisioned concurrency. function-orange can continue serving requests using reserved concurrency, but these requests may experience cold start latencies. At t3, function-orange reaches 400 concurrent requests. As a result, function-orange uses up all its reserved concurrency. Since function-orange cannot use unreserved concurrency, requests begin to throttle. At t4, function-orange starts to receive fewer requests, and no longer throttles. At t5, function-orange drops down to 200 concurrent requests, so all requests are again able to use provisioned concurrency (that is, no cold start latencies). Source: docs.aws.amazon.com
  • Provisioned concurrency: This allocates pre-initialized execution environments to your Function, ready for immediate response to incoming requests. However, be aware that configuring this feature incurs additional charges to your AWS account.
  • Reserved concurrency: This sets the maximum concurrent instances for your Function. Once configured, no other Function can use that concurrency. Importantly, no additional charges are associated with configuring reserved concurrency for a Function.
  • Lambda warmers: This is currently best achieved through the CloudWatch Events “ping” method and involves strategic practices to pre-warm Lambda Functions effectively. The key tips include not pinging more often than every 5 minutes, directly invoking the Lambda Function without calling API Gateway, using a known dummy payload, and crafting handler logic to respond without executing the entire Function. To achieve concurrency, the same Function must be invoked multiple times with delayed executions, preventing the system from reusing the same container and ensuring optimal performance for concurrent Function instances.
  • Proactive initialization: For functions using unreserved (on-demand) concurrency, Lambda occasionally pre-initializes execution environments to reduce the number of cold start invocations. This does NOT mean you’ll never have a cold start again. And, it should be noted that the percentage of true cold start initializations to proactive initializations varies depending on many factors, but it’s observable.
  • Lambda SnapStart (Java only): With SnapStart, Lambda initializes your function when you publish a function version. Lambda takes a Firecracker microVM snapshot of the memory and disk state of the initialized execution environment, encrypts the snapshot, and caches it for low-latency access. When you invoke the function version for the first time, and as the invocations scale up, Lambda resumes new execution environments from the cached snapshot instead of initializing them from scratch, improving startup latency. Note that SnapStart supports Java 11 and later Java-managed runtimes.
Non-SnapStart function vs. SnapStart function: The time it takes to initialize the function, which is the predominant contributor to high startup latency, is replaced by a faster resume phase with SnapStart. Source: aws.amazon.com/blogs

Configure optimal memory

Memory size improves cold start time linearly. Memory size also determines the amount of virtual CPU and other resources available to a function. Therefore, overprovisioning or underprovisioning memory configurations can adversely affect both the performance and the cost of a Lambda Function. Ensure that you analyse the actual memory requirements of your application code and assign only what’s required with a slight buffer (If you visualize and fine-tune the memory/power configuration of Lambda functions, you can try out a tool like aws-lambda-power-tuning).

The amount of memory also determines the amount of virtual CPU available to a function. Adding more memory proportionally increases the amount of CPU, increasing the overall computational power available. If a function is CPU-, network- or memory-bound, then changing the memory setting can dramatically improve its performance. Source: docs.aws.amazon.com, fourtheorem.com
Memory size improves cold start time linearly. The more memory allocated to your function, the smaller the cold start time — and the less standard deviation. Source: pluralsight.com

✅ Optimize Function package sizing

To enhance deployment speed and expedite function initialization, it’s essential to trim the size of your deployment packages. Removing redundant and unused code sections, along with unnecessary dependencies (e.g. remove dev/build/compile-time dependencies and keep only runtime dependencies in the final bundle), not only accelerates deployment but also contributes to faster function startup. Smaller package sizes reduce the download and unpacking time before invocation, particularly crucial for minimizing cold start delays (however, note that in some performance tests, it has been observed that when memory and other resources are configured to higher values, increasing the package size has less impact on overall performance).

For Node.js functions, consider minifying or uglifying your JavaScript code to shrink the package size (using a module bundler library like webpack). This practice significantly decreases the time needed to download the package. In certain instances, the package size could drop by 80–90%, offering substantial improvements in download efficiency.

For functions authored in Java or .NET Core, AWS’s recommendation is to refrain from including the entire AWS SDK library in your deployment package. Instead, selectively depend on modules that encompass the SDK components you need, such as DynamoDB or Amazon S3 SDK modules and Lambda core libraries. Also, improve Lambda’s unpacking speed for Java-authored packages by placing dependency .jar files in a separate /lib directory. This approach is more efficient than consolidating all your function’s code into a single jar with numerous .class files. Streamline dependencies by opting for simpler frameworks that load swiftly during execution environment startup. For instance, prioritize straightforward Java dependency injection (IoC) frameworks like Dagger or Guice over more complex options like the Spring Framework (if you have a preference for Spring ecosystem, then go for Spring Cloud Functions rather than the Spring Boot web framework).

✅ Implement asynchronous invocations

Utilize asynchronous invocation for scalable processing, enabling the system to efficiently handle varying workloads without waiting for the completion of each function.

When you invoke a function asynchronously, you don’t wait for a response from the function code. You hand off the event to Lambda and Lambda handles the rest. You can configure how Lambda handles errors, and can send invocation records to a downstream resource such as SQS or EventBridge to chain together components of your application. For asynchronous invocation, Lambda places the event in a queue and returns a success response without additional information. A separate process reads events from the queue and sends them to your function. Source: docs.aws.amazon.com
# To invoke a function asynchronously, 
# set the invocation type parameter to Event.

aws lambda invoke \
--function-name my-function \
--invocation-type Event \
--cli-binary-format raw-in-base64-out \
--payload '{ "key": "value" }' response.json

✅ Configure concurrent executions

Configure the concurrency settings of your functions to control the number of simultaneous executions, especially for critical workflows. This helps prevent throttling due to resource exhaustion, especially in scenarios where multiple invocations can potentially overload the system.

For critical functions, configure the concurrency settings. Source: dev.to/kelvinskell

✅ Leverage batch processing

For scenarios involving multiple items, instead of triggering Lambda functions individually for each item, aggregate them into batches (based on counts or task types) first and then pass to the function for processing. This significantly reduces the overhead associated with individual function invocations. This approach is particularly beneficial when dealing with large datasets or repetitive tasks.

Image: jeremydaly.com

✅ Leverage Lambda layers

Leveraging Lambda Layers for shared code (such as common logic, libraries, and dependencies) promotes efficient code reuse across multiple functions. This helps in reducing the size of individual function packages, faster cold starts, and streamlining maintenance (less dependencies, clear versioning and updates).

To utilize Lambda layers for improving performance, you create a layer and associate it with one or more Lambda functions. The layer is then automatically included in the execution environment of those functions. Source: aws.amazon.com

✅ Implement caching strategies

Implement caching mechanisms for frequently accessed data to reduce the need for repetitive computations and costly database queries. This can significantly enhance function performance, especially in scenarios where certain data remains relatively unchanged (e.g. system configs, reference data, enum values).

In this example, Lambda checks the key in its in-memory Redis Cluster and there could be two scenarios — a. Cache Miss: If the key is not present inside the Redis Cluster, then Lambda Function will query the same from the External Data Source. Once the Lambda Function has successfully queried the data, the Redis Cluster will be updated with user-defined TTL and returned to the API Gateway. b. Cache Hit: Redis Cluster will return the values to the Lambda Function, and then Lambda will pass it back to the API Gateway. Source: medium.com/@sarkarpranab66

✅ Choose an established runtime version

AWS already has its own internal caching mechanisms built into the Lambda platform — however, from time to time, these caching configurations can get adjusted based on varying workloads and usages. Especially, the newer runtimes may take some time to achieve the peak performance caching levels. Therefore, for critical workloads, it is always recommended to pick a stable and mature runtime over a recently announced shiny new runtime.

Newer runtimes can result in longer cold start times due to reduced cache residency within internal Lambda sub-systems. Cold start times typically improve in the weeks following launch as usage increases. Source: aws.amazon.com/blogs

✅ Leverage stateless and ephemeral design

Designing functions to be stateless (does not maintain a central state and hence won’t share data between requests/sessions) and ephemeral (fit to last for a very short time like 15 minutes) simplify their execution and management. Stateless functions are easier to replicate and scale horizontally, allowing for better performance in scenarios with fluctuating workloads. The absence of a persistent state makes it easy to distribute tasks across multiple instances for streamlined processing without worrying about interdependencies. Also, the ephemeral nature ensures the robustness and reliability of the system by preventing the accumulation of issues over prolonged execution periods.

✅ Share global/static variables, singleton objects, HTTP and DB connections between subsequent invocations

Where it is necessary, make use of global/static variables and singleton objects since it is efficient to keep them alive and share between subsequent invocations until the instance goes down rather than re-initialising them for every request call. Set up HTTP and database connections at the global level to reuse them for future invocations (For instance, in Node.js + MongoDB/DocumentDB case, creating Mongoclient object at the global level and setting the callbackWaitsForEmptyEventLoop property on the AWS Lambda Context object to false allows a Lambda function to reuse the DB connection across function invocations).

✅ Set up auto-scaling

Configuring auto-scaling settings allows your functions to dynamically adjust the number of concurrent executions based on demand. This flexibility ensures optimal resource utilization and responsiveness to varying workloads.

Scaling Lambda functions with SQS: When a Lambda function subscribes to an SQS queue, Lambda polls the queue as it waits for messages to arrive. Lambda consumes messages in batches, starting at 5 concurrent batches with 5 functions at a time. If there are more messages in the queue, Lambda adds up to 60 functions per minute, up to 1,000 functions, to consume those messages. This means that Lambda can scale up to 1,000 concurrent Lambda functions processing messages from the SQS queue. In this scenario, Lambda can freely auto-scale and process data independently due to its stateless nature. Source: aws.amazon.com/blogs

✅ Implement connection pooling

Implementing connection pooling helps in reusing database connections across multiple function invocations, reducing the overhead of establishing new connections each time. This is particularly important for functions with frequent interactions with databases.

AWS RDS Proxy acts as a proxy layer between your application and Amazon RDS instances. It efficiently manages database connections by pooling them. This reduces the overhead of creating new connections for each database request, resulting in improved application performance and reduced database resource consumption. It also caches query responses, reducing the load on your RDS instances. It helps accelerate read-heavy workloads by serving frequently requested data from the cache. Source: medium.com/@MicDiogo

✅ Configure timeouts

Setting appropriate timeouts ensures that your functions don’t consume unnecessary resources. It’s essential to strike a balance between allowing sufficient time for execution and avoiding prolonged resource occupation in case of failures. Timeouts force developers to implement other performance best practices like:

  • Decouple long-running tasks into async processes,
  • Minimize and optimize costly I/O operations,
  • Minimize external dependencies and waiting times,
  • Implement caching mechanisms, and
  • Leverage concurrency.
Lambda has a 15-minute time limit, meaning a Lambda function will automatically stop after running for that period. The default API Gateway timeout is 30 seconds, but you can adjust it to your needs. However, nobody wants to wait more than 30 seconds to get a response. Source: community.aws/posts

✅ Optimize logging

Streamlining logging practices by focusing on essential information minimizes the volume of data transferred and stored. This reduces both the associated costs and the impact on function performance.

In a normal situation, logging won’t create any noticeable performance issues. However, every additional operation an application performs will add up to some overheads. For instance, for every Lambda log message you create on CloudWatch, it will add ~70 bytes of metadata ingested with it — the timestamp and request ID. A single short message of 50 characters plus metadata gives us 1 GB of data for 8M logs. If you have 9 messages like this in a function that is called 10 times a second, it produces 1 GB of logs daily. This is $15 per month — plus definitely some performance impact. Enable more log levels or add more log lines — the numbers will increase and the performance will gradually deteriorate. Source: docs.aws.amazon.com, betterdev.blog

✅ Optimize invocation payload

Minimizing the size of invocation payloads is essential for reducing network latency through faster data transfers and enhancing the overall efficiency of your functions through faster processing. This is particularly important when dealing with frequent function invocations or scenarios where data transfer and processing times contribute significantly to overall execution time.

Source: docs.aws.amazon.com

✅ Optimize networking

Minimizing external dependencies and optimizing networking configurations contribute to reducing latency in function execution. This is particularly important for functions that rely on external services or data sources.

In terms of network design, note that configuring ENIs (Elastic Network Interfaces) takes a considerable amount of time and contributes to the delay in cold starts — therefore consider sticking to the default network environment unless you require a VPC resource with a private IP (note that AWS has been making substantial improvements in how ENIs connect to Lambda environments with customer VPCs, enhancing overall performance).

If a Lambda operates within a VPC and communicates with an AWS resource, avoid DNS resolution to save time, as it can be quite time-consuming. For instance, if your Lambda function interacts with an Amazon RDS DB instance in your VPC, consider launching the instance with the non-publicly accessible option to streamline the process.

✅ Apply regular updates

Staying updated with AWS Lambda releases and incorporating new features or optimizations is essential for continually enhancing performance. Regular updates ensure that your functions benefit from the latest improvements and stay aligned with best practices in the rapidly evolving serverless landscape.

✅ Conduct thorough testing

Conducting thorough performance testing with realistic workloads is crucial for identifying bottlenecks and optimizing accordingly. This ensures that your functions can handle the expected load efficiently.

✅ Leverage environment variables

Use environment variables for configurable parameters, facilitating easy scaling adjustments and operational improvements without modifying or redeploying code.

✅ Set up monitoring tools to measure performance

If you can’t measure performance, you can’t improve it as well. Simple as that. Monitoring tools like CloudWatch help us promptly detect and address performance issues of Lambda functions and other AWS services natively. Implement monitoring for invocation counts, memory usage, concurrency stats, and usage costs. You can also build graphs and dashboards on CloudWatch with these metrics. Additionally, set alarms to proactively respond to changes in utilization, performance, or error rates.

Lambda sends metric data to CloudWatch in 1-minute intervals. For more immediate insight into your Lambda function, you can create high-resolution custom metrics as well.

Read more about CloudWatch support for monitoring Lambda metrics:

> Monitoring functions on the Lambda console: https://docs.aws.amazon.com/lambda/latest/dg/monitoring-functions-access-metrics.html

> Working with Lambda function metrics: https://docs.aws.amazon.com/lambda/latest/dg/monitoring-metrics.html

> Using Lambda Insights in Amazon CloudWatch: https://docs.aws.amazon.com/lambda/latest/dg/monitoring-insights.html

CloudWatch Lambda Insights is an extension of CloudWatch Logs Insights. It is built specifically for serverless applications running on Lambda. It collects CPU, memory, disk, and other infra-related resource usage and aggregates them to show data points. It also provides information for cold starts and Lambda instance-related issues. (Source: lumigo.io)

Conclusion

By incorporating these performance best practices and techniques, you can elevate the speed and efficiency of your Lambda applications to a whole new level. Over the past years, the AWS team has been super innovative and has released a plethora of new features to optimize resource utilization, reduce latency, and ensure a seamless user experience on the Lambda ecosystem. With continuous learning and a proactive approach to implementing AWS Lambda performance best practices, you can maximize the value of your AWS investment and achieve your business objectives.

Stay tuned for the next AWS tip. Until then, happy coding!

--

--

Thilina Ashen Gamage
Platform Engineer

Advocate of Cloud, Microservices, & Clean Code | 1.5M+ Reach | For more exciting content, stay in touch: https://medium.com/@ThilinaAshenGamage