Unlock New Potentials with Larger Ephemeral Storage in AWS Lambda

Published in

Slalom Build

6 min readFeb 22, 2022

AWS Lambda is an on-demand compute service that powers many serverless applications. Lambda functions are ephemeral, with execution environments that only exist for a brief time when the function is invoked. Many compute operations need access to external data for a variety of purposes. This includes importing third-party libraries, accessing machine learning models, or exporting the output of the compute operation.

Lambda provides a comprehensive range of storage options to meet web application developers’ needs. These include other AWS services such as Amazon S3 and Amazon EFS or native options like temporary storage.

Temporary storage provides a file system for your code to use at “/tmp”. The same Lambda execution environment may be reused by multiple Lambda invocations to optimize performance. The “/tmp” area is preserved for the lifetime of the execution environment and provides a transient cache for data between invocations. Each time a new execution environment is created, this area is deleted. Consequently, this is intended as an ephemeral storage.

Till now, this space had a fixed size of 512 MB. This limited the number of use cases for which Lambda could be utilized; specifically data intensive workloads that deal with large files such as media processing, machine learning and financial analysis. With the introduction of new “Ephemeral storage” parameter one can now create Lambda functions with up to 10 GB of “/tmp” space. In this post we’ll demonstrate how to use it via AWS Serverless Application Model (SAM).

Code and AWS SAM Template

Our deployment stack consists of a Lambda function written in Java (Corretto 11), an API allowing us to call the function and an S3 bucket. The function, whenever triggered, downloads a large file from the bucket and stores it in “/tmp” under a unique name. Subsequently it calculates and returns the size of that directory. This allows us to observe the currently occupied space and whether the storage has reached its limit with each execution.

private void getFileFromS3() throws IOException {
   final var s3 = S3Client.builder().build();
   final var getObjectRequest = GetObjectRequest.builder()
      .bucket(BUCKET_NAME)
      .key(FILE_NAME)
      .build();   final var response = s3.getObject(getObjectRequest);
   final var fileName = new StringBuilder(DIR_PATH).append("/").append(Instant.now().getEpochSecond()).toString();   try (var fos = new FileOutputStream(fileName)) {
      fos.write(response.readAllBytes());
   } catch (IOException ex) {
      throw ex;
   }
}

The following snippet highlights the parameter introduced in this feature. EphemeralStorage property allows us to set the desired storage space and its value must be between 512 and 10240.

ephemeralStorageTesterFunction:
  Type: AWS::Lambda::Function
  Properties:
    --- REMOVED FOR BREVITY ----
    EphemeralStorage:
      Size: !Ref StorageSize

Here we take advantage of a template parameter in order to specify the size of storage at the deployment time. The default value of this parameter in our template is 512.

The code for this example is available in GitHub.

Build and Deploy

The steps to build and deploy are:

Build and package the function and its dependency JARs into their own zip files. The dependencies are packaged separately to be deployed as a Lambda Layer. Since we update the function more frequently than the dependencies, this will help reduce the size of the package and consequently expedite the deployment.
Use AWS CLI to upload the zip files into S3. This will also transform the template replacing the path to zip files with S3 URIs.
Use AWS CLI to deploy the CloudFormation stack using the transformed template.

$  ./gradlew -q clean packageLibs && mv build/distributions/lambdaEphemeral.zip build/distributions/lambdaEphemeral-lib.zip && ./gradlew -q build$  aws cloudformation package --template-file template.yml --s3-bucket ephemeral-tester-src --output-template-file out1.yml --region eu-west-1$  aws cloudformation deploy --template-file out1.yml --stack-name ephemeral --capabilities CAPABILITY_NAMED_IAM --region eu-west-1$  aws cloudformation package --template-file template.yml --s3-bucket ephemeral-tester-src2 --output-template-file out2.yml --region eu-south-1$  aws cloudformation deploy --template-file out2.yml --stack-name ephemeral --capabilities CAPABILITY_NAMED_IAM --region eu-south-1 --parameters StorageSize=10240

Please bear in mind the following before deploying the code:

Application is built with gradlew. This wrapper script pulls the latest version of gradle the very first time it runs. Thus, one doesn’t need to install it separately.
Ensure the CLI is up-to-date. I’m using v2.2.35 to run above commands.
The code is packaged and deployed twice into two different AWS regions. This is to ensure that we have complete isolation of Lambda’s execution environment. One might notice that the second deployment overrides the StorageSize parameter’s value with 10240 (10GB).
Prior to these steps, you have to create two additional S3 buckets in each region which are referenced in these commands. One shouldn’t confuse these with the buckets created when deploying the stack and used to test the function. Always keep these buckets private. Instead of using public buckets, configure the CLI correctly to be able to upload zip files.
There is a buildspec file in the repo if you choose to use AWS Code Build and AWS Code Pipeline to build and deploy the function.
The documentation suggests following configuration for deploying a function using a zip file:

Function:
  Type: AWS::Lambda::Function
  Properties:
    Runtime: java11
    PackageType: Zip
    Code:
      S3Bucket: bucket-name
      S3Key: zip-file-in-the-bucket

However, we’d like CLI’s package command to upload the zip file with a unique name in order to avoid manual steps. In order to achieve this, simply set the Code property to the zip file’s path on disk and CLI will transform it correctly.

ephemeralStorageTesterFunction:
  Type: AWS::Lambda::Function
  Properties:
    Runtime: java11
    PackageType: Zip
    Code: build/distributions/lambdaEphemeral.zip-- becomes --ephemeralStorageTesterFunction:
  Type: AWS::Lambda::Function
  Properties:
    Runtime: java11
    PackageType: Zip
    Code:
      S3Bucket: ephemeral-tester-src
      S3Key: ki0d1989d5928c677ca94f50ff4cffdd

The same approach works for the Layer.

dependencyLibraries:
  Type: AWS::Lambda::LayerVersion
  Properties:
    Content: build/distributions/lambdaEphemeral-libs.zip-- becomes --dependencyLibraries:
  Type: AWS::Lambda::LayerVersion
  Properties:
    Content:
      S3Bucket: ephemeral-tester-src
      S3Key: 21899ff7z610becgc9cc3f31cd1600cd

The end result should look like this:

Lambda Function’s General Configuration which now has a new “Ephemeral storage” field

Testing

We’ll test each deployment by calling the GET /ephemeral-storages API and analyze the output. Before doing so, we’ll upload a large file (279 MB in our case) to the buckets created through CloudFormation stack and name it data-file, which is the default file name in the stack that was passed as an environment variable to the function.

Parameters:
  FileName:
    Description: The name of the file that function downloads from S3
    Type: String
    Default: 'data-file'--- REMOVED FOR BREVITY ----ephemeralStorageTesterFunction:
  Type: AWS::Lambda::Function
  Properties:
    --- REMOVED FOR BREVITY ----
    Environment:
      Variables:
        S3_FILE_NAME: !Ref FileName

To make this easier, we use a simple bash script, which keeps calling the API till it returns an error (see tester-script.sh).

#!/bin/bashCOUNTER=0
while :
do
   RESPONSE_PAYLOAD=$(curl -s https://xxxxxx.execute-api.eu-south-1.amazonaws.com/development/ephemeral-storages | jq '.')

   echo $RESPONSE_PAYLOAD
   if [[ $RESPONSE_PAYLOAD =~ 'error' ]]; then
      echo $COUNTER
      break
   fi
   ((COUNTER+=1))
done

As can be seen below, the instance with default storage size runs out of space after being called once.

Lambda function with default ephemeral storage

Whereas we managed to call the instance with large storage 38 times before we run out of space.

Lambda function with maximum ephemeral storage

Let’s not forget that our sample file is unusually large (279 MB). A typical real-world example would be using smaller files and thus scale even further. For instance, a Lambda function processing images that works with files as big as 50 MB, can now scale much better and operate more efficiently utilizing the ephemeral storage. Before this change, such function had to overcome this limitation by employing other storage services in order to remain scalable. That would’ve increased the complexity of the code and introduced other issues that ought to be considered such as latency and availability.

Clean up

Since everything is deployed through CloudFormation (through aws cloudformation deploy), cleanup is as effortless as deleting the stack. Remember to delete the S3 buckets you created manually, as well.

Summary

Ephemeral storage provides a file system to the code without introducing complexities such as latency and availability. However, the fixed and limited size of this storage made Lambda functions less suitable for implementing use cases in which large files have to be temporarily stored. The increased ephemeral storage makes possible and simplifies the implementation of such use cases.