How to extract a HUGE zip file in an Amazon S3 bucket by using AWS Lambda and Python
The Problem
AWS Lambda has a limitation of providing only 500MB of disk space per instance. This limitation rules it out for those pipelines that requires you to process single large files
If you’re pipeline includes processing lots of data and you need a way of processing large (>500MB) files (in this example case, large zip files) from AWS S3, then you’ve probably not used AWS Lambda in your solution.
Your large files maybe zip files that are < 500MB in size, but when extracted, totals more than 500MB.
Either way, you’ve hit the limit of Lambda.
Have no fear, there is a solution.
The Solution?
Do not write to disk, stream to and from S3
Stream the Zip file from the source bucket and read and write its contents on the fly using Python back to another S3 bucket.
This method does not use up disk space and therefore is not limited by size.
The basic steps are: