How to extract a HUGE zip file in an Amazon S3 bucket by using AWS Lambda and Python

John Paul Hayes
2 min readJul 10, 2019

The Problem

AWS Lambda has a limitation of providing only 500MB of disk space per instance. This limitation rules it out for those pipelines that requires you to process single large files

If you’re pipeline includes processing lots of data and you need a way of processing large (>500MB) files (in this example case, large zip files) from AWS S3, then you’ve probably not used AWS Lambda in your solution.

Your large files maybe zip files that are < 500MB in size, but when extracted, totals more than 500MB.

Either way, you’ve hit the limit of Lambda.

Have no fear, there is a solution.

The Solution?

Do not write to disk, stream to and from S3

Stream the Zip file from the source bucket and read and write its contents on the fly using Python back to another S3 bucket.

This method does not use up disk space and therefore is not limited by size.

The basic steps are:

  1. Read the zip file from S3 using the Boto3 S3 resource Object into a BytesIO buffer object
  2. Open the object using the zipfile module.
  3. Iterate over each file in the zip file using the namelist method
  4. Write the file back to another bucket in S3 using the resource…

--

--