AWS Lambda Storage

AWS Lambda Storage

Joud W. Awad
9 min readNov 19, 2024

--

When working with AWS Lambda functions, you may often need to work with files that are not present in your Lambda execution environment. In such cases, you typically need to download these files from a third-party service like Amazon S3 for processing.

Lambda is serverless, which means it is ephemeral by design. This raises the question: what are the possible ways to access storage in Lambda?

In this blog post, we will review all the available methods to access storage in AWS Lambda, discuss when to use each of them, and provide a comparison of these methods at the end.

Ephemeral storage with Lambda /tmp

AWS Lambda provides ephemeral storage for functions in the /tmp directory. This storage is temporary and unique to each execution environment. You can control the amount of ephemeral storage allocated to your function using the Ephemeral storage setting, configurable between 512 MB and 10,240 MB, in 1-MB increments. All data stored in /tmp is encrypted at rest with a key managed by AWS.

How It Works

/tmp architecture

The /tmp architecture involves an execution environment, which is a Lambda instance provisioned by Firecracker on a Lambda worker (refer to this article for full details on Lambda architecture). Each execution environment can handle a single request at a time. Once a request is handled, the execution environment can handle another one. This lifecycle behavior allows the execution environment to reuse the old runtime context (initialized libraries, stored data in /tmp).

However, each execution environment has its own dedicated /tmp directory. This means that execution environment 1 cannot access the /tmpexecution environment 2 and vice versa. A single execution environment can reuse its downloaded content (unless the Shutdown phase lifecycle occurred or Lambda worker lease lifetime termination happened), but scaled instances will not be able to share the same files. This issue can be solved by using EFS or S3.

Accessing data in Lambda ephemeral storage does not require an external internet request, making it faster and cheaper than accessing data from object storage like S3, which requires a network request.

Common use cases for ephemeral storage

Here are several common use cases that benefit from increased ephemeral storage:

  • Extract-Transform-Load (ETL) Jobs: Increase ephemeral storage when your code performs intermediate computation or downloads other resources to complete processing. More temporary space enables more complex ETL jobs to run in Lambda functions.
  • Machine Learning (ML) Inference: Many inference tasks rely on large reference data files, including libraries and models. With more ephemeral storage, you can download larger models from Amazon S3 to /tmp and use them in your processing.
  • Data Processing: For workloads that download objects from Amazon S3 in response to S3 events, more tmp space makes it possible to handle larger objects without using in-memory processing. Workloads that create PDFs or process media also benefit from more ephemeral storage.
  • Graphics Processing: Image processing is a common use case for Lambda-based applications. For workloads that process large TIFF files or satellite images, more ephemeral storage makes it easier to use libraries and perform the computation in Lambda.

Configuring ephemeral storage

AWS Lambda supports ephemeral storage up to 10 GB and as low as 512 MB, which includes additional charges when configured. Ensure that you set the right value based on your needs.

Keep in mind that due to how Lambda architecture works, AWS destroys Lambda workers approximately every 14 hours. This means that even if you have an instance warmed with files in /tmp, it can lose its content when this operation happens. For more information on the Lambda worker lifecycle, please refer to my other blog post.

Amazon S3

Amazon S3 is a highly scalable object storage service known for its impressive durability and availability. It is particularly well-suited for storing unstructured data such as images, media files, logs, and sensor data.

Understanding S3 Object Storage

Unlike traditional file systems, S3 uses a flat storage hierarchy. Instead of directories, you use folders to logically organize objects by prefixing the key name with folderName/. While S3 supports versioning, it does not allow appending data to existing objects; you must upload a new version of the object.

S3 & Lambda Integration

We saw earlier that Lambda’s ephemeral storage can be a perfect place for quickly accessing data on-demand when they are presented in the /tmp directory (and when this data size is less than 10GB). However, to present this data, we need durable storage to store the data first and then load it locally in /tmp for faster processing. When data is in /tmp, it cannot be changed from other services or clients. For example, if your Lambda function relies on a file that changes frequently or needs to be updated often, storing this data in a /tmp file does not make sense, as data in it will not be updated from external services and can only be updated from the Lambda execution environment.

S3 & Lambda For Frequently Changed objects

Thus, S3 is a perfect place to store your data that is frequently accessed. You can use the S3 SDK to access this data. Even for infrequently accessed data, S3 can be used as a starting point to store the data and then load it into your Lambda /tmp to save costs and network overhead, along with improved performance.

S3 & Lambda Infrequently access objects

Lambda Layers

A Lambda Layer is a .zip file archive containing supplementary code or data. Layers typically hold libraries, custom runtimes, or configuration files, allowing you to manage dependencies more effectively.

However, the data stored in a Lambda Layer is static and can be hard to change frequently. Lambda Layers allow you to have up to 5 layers attached to your Lambda function, each with a size limit of 50MB (direct upload).

Thus, Lambda Layers could be a good choice for your custom configuration files, libraries, custom code, etc., but they are not suitable for storing objects and files due to multiple reasons. Additionally, due to how layers work, they are not a very suitable solution for dynamically changing content.

If you are interested in reading more about Lambda Layers, I have a full blog post dedicated to it with a project. Please refer to it here.

Elastic File System (EFS)

Amazon EFS is a fully managed, elastic, shared file system that integrates with other AWS services. It is a durable storage option that offers high availability. You can now mount EFS volumes in Lambda functions, making it simpler to share data across invocations. The file system grows and shrinks as you add or delete data, so you do not need to manage storage limits.

The biggest difference between the aforementioned /tmp and EFS is that EFS is a durable storage option that offers high availability and elastic size.

The Lambda service mounts EFS file systems when the execution environment is prepared. This happens in parallel with other initialization operations, so it typically does not impact cold start latency. If the execution environment is warm from previous invocations, the mount is already prepared. To use EFS, your Lambda function must be in the same VPC as the file system.

EFS enables new capabilities for serverless applications. The file system is a dynamic binding for Lambda functions, unlike layers. This makes it useful for deploying code libraries where you want to always use the latest version. You configure the mount path when integrating the file system with your function, and then include packages from this location. Additionally, you can use this to include packages that exceed the limits of layers.

Due to its speed and support of standard file operations, EFS is also useful for ingesting or writing large numbers of files durably. This can be helpful for zipping or unzipping large archives, for example. For appending to existing files, EFS is also a preferred option to using S3. Since EFS is a file system, you can append to existing files (unlike S3, where a new version of a whole object gets created).

Lambda Configurations With EFS

To work with Lambda functions that access EFS, you need to deploy your function within a VPC. The Lambda function should be in the same subnet as the EFS to work properly.

When you need to mount a Lambda function to an EFS directory, you will have to create an “Access Point.” This access point will be the entry point between your Lambda-mounted path /mnt/video and your EFS directory, allowing your Lambda function to read and write data to it. For a single Lambda function, you can create multiple access points, each pointing to a different directory.

Why would you use Elastic File System (EFS) with AWS Lambda

Consider a scenario where you’re building a serverless video processing application. In this application, you might have to write some large files and share these large files between Lambda functions. Some Lambda functions will be writing files, and other Lambda functions will be reading the same files for further processing or to store metadata in a database.

You may think of writing to the /tmp folder, which has a decent storage space of 10GB. However, /tmp is ephemeral storage, meaning that you may lose your data between different Lambda function invocations, and you will not be able to share the data stored here across different Lambda function invocations.

In these cases, you can use EFS, which is a serverless, elastic file system that grows and shrinks as you add or remove files. EFS would also be a better fit if you want a durable file system that may be used by other services in your application ecosystem.

EFS Architecture With Lambda

Full Comparison

After reviewing all the possible ways to store and interact with data in Lambda, let’s do a full comparison between the four available types

Comparsion Table

Complimentary Roles

In conclusion, choosing the right storage option for your AWS Lambda functions depends on your specific needs and project requirements. Each storage type has its strengths and ideal use cases:

  • Ephemeral Storage (/tmp): Best for temporary data that requires fast access and does not need to persist between invocations.
  • Amazon S3: Ideal for long-term storage of frequently changing data, with high durability and availability.
  • Lambda Layers: Suitable for static libraries and configuration files that do not change frequently.
  • Elastic File System (EFS): Perfect for shared data, large files, and dynamic content that requires full file operations and persistence.

You may need a fast way to access data that does not change frequently, or you may require very fast access within a concurrent environment. In some cases, combining multiple storage options can provide the best solution. For example, you can use S3 or EFS for durable storage and load data into /tmp for faster processing, reducing overall network requests and costs.

Always consider your project’s specific requirements and constraints when selecting a storage solution. By leveraging the strengths of each storage type, you can optimize performance, cost, and scalability for your serverless applications.

Conclusion

In this article, we explored the various storage options available for AWS Lambda functions, including Ephemeral Storage (/tmp), Amazon S3, Lambda Layers, and Elastic File System (EFS). Each storage type offers unique benefits and is suited for different use cases, from temporary fast-access storage to durable, scalable solutions for large and frequently changing data.

Choosing the right storage option depends on your specific project requirements, such as data persistence, access speed, and cost considerations. In some scenarios, combining multiple storage options can provide an optimal solution, balancing performance and cost-effectiveness.

By understanding the strengths and limitations of each storage type, you can make informed decisions to enhance the efficiency and scalability of your serverless applications. Always evaluate your needs and leverage the appropriate storage solutions to achieve the best results for your AWS Lambda functions.

Follow Me For More Content

If you made it this far and you want to receive more content similar to this make sure to follow me on Medium and on Linkedin

--

--

Joud W. Awad
Joud W. Awad

Written by Joud W. Awad

Principal Software Engineer and Solutions Architect with 10+ years in backend, AWS Cloud, DevOps, and mobile apps.

No responses yet