Concurrent Multipart Uploading to AWS S3

Coatlique
6 min readMar 12, 2023

I’ve run into an interesting case, where I want to upload a large amount of data to s3 using the multipart uploads. The data can be anywhere from 20–40GB and I really don’t want to store it locally on disk or the entirety of it in memory… This led me to wondering how multipart upload works and how I can use it to quickly do what I want it to do.

Instead of allowing AWS to multipart upload a large file on its own I want to thread off tasks to upload pieces of the final file manually, but I have to ensure that those pieces are large enough that the multipart upload will not fail. Each piece of the upload has to be 5MiB or larger.

If we allowed it to AWS would take a large file and multipart upload it in the background without us knowing, but that requires storing the large file on disk in AWS container land which isn’t great at that.

Simple stuff first to test and run this locally, we can use the localstack docker container to run all the AWS goodies locally. https://docs.localstack.cloud/getting-started/installation/

I want to time how long the file takes to be created and set up as well as be able to pass in arguments. Then we need to ensure the bucket we will be using exists.

def ensure_bucket_exists(bucket, s3, location):
try:
s3.head_bucket(Bucket=bucket)…

--

--