How to extract a HUGE zip file in an Amazon S3 bucket by using AWS Lambda and Python

John Paul Hayes

The Problem

AWS Lambda has a limitation of providing only 500MB of disk space per instance. This limitation rules it out for those pipelines that requires you to process single large files

If you’re pipeline includes processing lots of data and you need a way of processing large (>500MB) files (in this example case, large zip files) from AWS S3, then you’ve probably not used AWS Lambda in your solution.

Your large files maybe zip files that are < 500MB in size, but when extracted, totals more than 500MB.

Either way, you’ve hit the limit of Lambda.

Have no fear, there is a solution.


The Solution?

Do not write to disk, stream to and from S3

Stream the Zip file from the source bucket and read and write its contents on the fly using Python back to another S3 bucket.

This method does not use up disk space and therefore is not limited by size.

The basic steps are:

  1. Read the zip file from S3 using the Boto3 S3 resource Object into a BytesIO buffer object
  2. Open the object using the zipfile module.
  3. Iterate over each file in the zip file using the namelist method
  4. Write the file back to another bucket in S3 using the resource meta.client.upload_fileobj method

The Code

Python 3.6 using Boto3

s3_resource = boto3.resource('s3')
zip_obj = s3_resource.Object(bucket_name="bucket_name_here", key=zip_key)
buffer = BytesIO(zip_obj.get()["Body"].read())

z = zipfile.ZipFile(buffer)
for filename in z.namelist():
file_info = z.getinfo(filename)
s3_resource.meta.client.upload_fileobj(
z.open(filename),
Bucket=bucket,
Key=f'{filename}'
)

Gotchas

There are always gotchas.

AWS Execution time limit has a maximum of 15 minutes so can you process your HUGE files in this amount of time? You can only know by testing.


I hope you’re enjoying this content. If so, please give me a clap and/or leave a comment!

You can connect with me on Twitter and Github.

John Paul Hayes

Written by

I mainly write about Python and AWS solutions

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade