A Complete Guide to S3 File Upload using pre-signed POST URLs
Owned and operated by Play Games24x7 Pvt. Ltd., My11Circle is an online fantasy game designed for the fans who like to showcase their cricket knowledge and analytical skills. My11Circle is a game of “skill”, managed by a professional management team with several years of experience in the online games industry. To find out more visit Games24x7 | Leveraging science to deliver immersive game playing experiences
One of the key aspects that make our game safe for all includes KYC. “Know Your Customer” flow is responsible to make sure users’ activity is legitimate. We need to know that the user’s provisioning documents are legitimate, that it fulfils our regulatory requirements, and that it’s represented by the right person.
Most of us would have encountered projects requiring file upload from client devices to backend systems for any kind of processing or just for storage. For projects that rely on AWS for hosting their app, S3 is the most popular storage option. In this article, I have summarised some challenges faced and solutions finally that helped us directly upload objects to S3 in a most secure way.
The Problem
I was working on a project related to KYC document uploads where we were using a system that required us to upload the document between multiple micro services responsible for image processing, OCR, blur and cut-off detection, verifying validity of the document and so on.
This was time consuming estimated at nearly ~10–12 seconds to process each document one at a time out of which uploads to the above micro services over a secure proxy (a client facing NodeJS proxy aggregator) approximately taking ~8s on a good 4G network! This was especially brutal for players who are in bad network coverage area or who realise that their document failed any of the image quality checks and have to re-upload after already waiting for their file upload to be processed all over again!
Our Goals
- Significantly cut down upload times by supporting multiple file uploads and also reducing number of hops for each file to go through. (Client > Proxy > S3 > multiple BE micro services)
- Improved security as these files were concerning KYC documents. Let’s discuss on this a bit more in next section as this is crucial for our problem statement.
What we need to upload securely?
- File Type restriction — This ensures only valid list of content-types are accepted that can be processed by the image processing servers. This includes image/jpeg, image/png or application/pdf only.
- File Size restriction — S3 has a default cap of 5GB per request and there is no easy way to change this limit. This can be a problem, you shouldn’t end up with a huge file on your disk to save storage costs.
- File Name restriction — With user controlled filenames that can be malformed and can lead to directory/path traversal or XSS attacks.
- Checksum validation with MD5 — We need to ensure that player uploads only those KYC documents that he/she initially intended to identify themselves. This avoids phishing scenarios where a generic upload link allows arbitrary person, apart from our intended player, to upload malicious files.
Proof of Concept
S3 supports file object uploads via AWS SDKs, AWS REST APIs, or AWS CLI. Since the goal was to reduce the number of hops from client to end systems, we chose S3 pre-signed REST APIs to directly uploads files from user’s device to desired S3 bucket which would eliminate the need to upload to proxy and then to S3 or other micro-services again. However there are some challenges with pre-signed URLs…
- There is no easy way to restrict file size of the upload (default limit max 5GB using PUT or other upload methods).
- File type needs to restricted at the bucket policy level rather than on per client request basis
- There are no cancellations of uploads in progress available at present and only client has to initiate the POST/PUT network request cancellation.
- No file upload progress available, needs to be done on client side measuring the chunks of bytes that has been sent over the network.
- No malicious file handling in S3.
- No upload-source based restrictions that can be configured.
- Ideal expiry time to be decided and configured for each pre-signed URL generated to avoid any mishandling of these URLs exposed to clients. Also the need for implementing some form of refresh logic upon expiry.
- Need to explore S3 upload completion events on client side to BE initiate flows without waiting for clients to intimate that to our servers
When you create a pre-signed URL, you must provide your security credentials and then specify a bucket name, an object key, an HTTP method (PUT for uploading objects), and an expiration date and time. This is the default situation, but using PUT method you don’t have some controls that you could get with POST, why? Because of POST Policies.
A POST Policy is a sequence of rules (called conditions) that must be met when performing a POST request to an S3 bucket in order for this request to succeed. You can configure these directly from the AWS console. One benefit, over the others, of using a POST policy is that the list of conditions contains content-length-range
for example, which helps set the file upload size limit.
Hence I decided to proceed with using S3 Pre-signed POST URL, adding security via POST policies and bucket policies, for uploads directly from clients to S3 bucket..
The Solution
- Player selects file, client generates MD5 checksum and requests for upload link from Proxy.
- Proxy generates Pre-signed POST URL with a custom in-house POST Policy with fine grain control over for file size, type, name, MD5 hash verify checks put in place along with short link expiry window defined. This will be discussed in detail in the next section.
- Client initiates upload on this generated link attaching the file.
- S3 to verify all policy rules are satisfied along with checksum validation before storing in bucket.
MD5 checksum helps cross check that after URL is generated for a particular file chosen, the exact same file is uploaded or not. If a some malicious user tries to upload tampered file, S3 performs the hash check before storing and rejects the same due to Content-MD5 mismatch.
The Code
const { createPresignedPost } = require("@aws-sdk/s3-presigned-post");
const client = new S3Client({ region: "us-east-1" });
const { md5HashOfTheFile } = req.body;
const Bucket = "kycDocuments";
const fileName = `${require("uuid").v4()}${fileExtension}`; // sanitise file name
const Key = `${userAttribute}/KYC/${docName}/${fileName}`; // S3 path where to store the object
const Fields = { "Content-MD5": md5HashOfTheFile }; // any additional fields expected to be part file upload
const Conditions = [
{ bucket: Bucket },
{ key: Key },
{ "Content-MD5": md5HashOfTheFile },
[ "content-length-range", 1024, 5242880 ], // file size limit 1KB-5MB
[ "starts-with", "$Content-Type", "image/" ] // only support file of content-type: "image/jpg, image/png, image/gif"
];
const { url, fields } = await createPresignedPost(client, {
Bucket,
Key,
Conditions,
Fields,
Expires: 300, // Seconds before the presigned post expires. 3600 by default.
});
- bucket: S3 bucket destination for future files incoming.
- key: key name or S3 path where the uploaded file must be stored in.
Please note: Since users chose the file, It is important to safeguard against S3 path-traversal attacks as filenames could be malicious, hence I sanitised the filename by renaming it with random UUID string. - Content-MD5: Set Content-MD5 header as a POST Policy condition, for S3 to verify post file upload and cross verify checksums.
- content-length-range: Restricts file within a minimum and max range
eg: [“content-length-range”, 1048576, 10485760] restricts the file to be between 1MB and 10MB, otherwise upload fails. - Content-Type: Restricts file to be of specific type.
eg: [“starts-with”, “$Content-Type”, “image/”] restricts incoming files to be an image only with any of the following content-types “image/jpg, image/png, image/gif” - Expires: Set expiry time for pre-signed url post, as precaution configure this as low as possible based in application design. In my project I could configure it as low as 10s as we post the upload we no longer require the link to be active.
If even one of the pre-conditions are not met, S3 automatically throws 422 Unprocessable Entity HTTP code and the object will not be stored.
Only once all conditions are met and correct file is uploaded via Pre-signed POST URL, S3 sends 204 No Content response indicating upload successful.
Conclusion
With this blog post I hope that I gave you an idea about what to keep in mind while designing a user upload feature with pre-signed URLs. As you can see, depending on your threat model, the things to keep in mind can be different.
File uploads can be very dangerous functionalities and the risks involved are multiple. Even if you follow these recommendation, you don’t know if the file being uploaded from a user is malicious or not, and processing it could have unwanted results. That’s why it is suggested to process un-trusted files in a restricted environment.
Finally, AWS provides lot of documentation on S3 and how to secure it further, I suggest you to read this document if you’d like to know more about how to secure files in S3 buckets.
About the author
Amruth Skanda Murthy V, a Software Engineer @Playgames24x7 | Developer, Solver, Explorer
Amruth, is from Bangalore where he works as a Senior Software Engineer at Playgames24x7. He has 6+ years of experience with Fullstack JavaScript/NodeJS with a focus on Frontend. In his spare time, Amruth enjoys Trekking, Badminton, Movies and Music.
References
- https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/modules/_aws_sdk_s3_presigned_post.html#post-file-using-formdata-in-nodejs
- POST Policy — Amazon Simple Storage Service
- Example: Browser-Based Upload using HTTP POST (Using AWS Signature Version 4) — Amazon Simple Storage Service
- Common Request Headers — Amazon Simple Storage Service
- https://aws.amazon.com/premiumsupport/knowledge-center/data-integrity-s3/