Resumable file upload with S3
Nowadays the file size grows larger and larger. Unfortunately, this means that uploading these files reliably becomes much harder as well, especially when using the tiny mobile tubes provided by greedy network operators. The problem is that most applications are not up to the challenge, which means that millions of cat images and videos are lost due to network errors each year!
The article helps to achieve resumable file upload with S3 with javascript client and servlet back end.
Tus
Tus solves the problem of unreliable file uploads once and for all. It is a new open protocol for resumable uploads built on HTTP. It offers simple, cheap and reusable stacks for clients and servers. It supports any language, any platform, and any network.
JS Client
Tus provides various client implementations for resumable file upload. Here is one such example https://github.com/tus/tus-js-client
The tus client sends the file in chunks, it just requires the file object.
Note that the variable file
can be either javascript object or file stream in the nodeJS environment. The same code works for both the environments.
S3 as a Storage Back-End
Tus takes care of the front end and sends the file in a chunk. But there should be a proper back end to handle it.
With their Simple Storage System (S3), Amazon Web Services has built one of the major providers of cloud storage for applications ranging from small side projects to enterprise systems.
S3 Multipart uploads
S3 Multipart upload helps to store the file in chunks at the server side. It allows us to upload a single object as a set of parts. Each part is a contiguous portion of the object’s data. We can upload these object parts independently and in any order. If transmission of any part fails, we can retransmit that part without affecting other parts. After all parts of your object are uploaded, Amazon S3 assembles these parts and creates the object. It is limited to a single part as 5MB except for last part of the file.
Implementation
You can see the servlet implementation here https://github.com/sponnusamy/tus_servlet
It includes three steps
- Initiate file upload
- Upload the file in 5MB chunks
- Complete file upload
There is an option to abort the upload as well.
Initiate file upload
When you send a request to initiate a multipart upload, Amazon S3 returns a response with an upload ID, which is a unique identifier for your multipart upload. It is included all the APIs such as upload parts, list the parts, complete an upload, or abort an upload.
Upload/Resume
After you initiate a multipart upload the backend can start reading the file content from the client and upload into S3. It is possible to resume the file upload with the upload Id if it is paused in the middle. The Tus client would resume the file upload if it paused, otherwise it starts from the beginning. The backend needs to provide a HEAD handler for the endpoint and keep the offset and metadata in response headers. The offset can be calculated using ListParts S3 API as below.
Then Tus client would send the file stream in PATCH request and backend would receive it to upload to S3 in chunks
Complete file upload
It is important to complete the file upload to mark the upload is finished. When you complete a multipart upload, Amazon S3 creates an object by concatenating the parts in ascending order based on the part number.
Abort file upload
You may abort the file upload when it is paused.
The complete source code is available here https://github.com/sponnusamy/tus_servlet