Authenticated File uploads from browser to non-public S3 buckets using AWS Lambda
Problem statement
A user can belong to a set of categories under which he can upload files. Each category has its own set of rules, for which the files that can be accepted, namely - type of file that can be uploaded, size of the file, length of the playable video/audio in the file, mime-type, AV scanning, limiting the number of uploads that can be done under a category within a time window, allowing team of people to upload under the same category( group work); to name a few. So in essence, a lot of rules associated with who can upload what and where.
The user experience should be that of a synchronous upload and not “come back later” or “we will notify you when done”. User should be told what the issue is with any validations on the file in real time.
Solution under the monolith architecture

Within this architecture, the backend is a monolith which is deployed as a task on AWS ECS with scaling policy in place. The ECS cluster is attached to an AWS EBS for storing a copy of the file so that it can be processed by utility like FFMPEG to get the correct length of the audio/video content(FFMPEG doesn’t work with in memory objects; and using AWS transcoders can be an expensive option to just get the length of video/audio), all other validations can be performed using various Open-source/ custom utilities. All these validation rules are stored in the database.

The issue with this approach is scalability. As the traffic increases, both the memory and CPU utilization go up as the whole file content stream is being passed through the application and then onto the AWS S3 bucket. This causes the autoscaling to kick in and increases the cost of compute and memory just for handling file uploads. You also pay for the EBS for storing the file copy and the network data passing in/out of the EC2s. The response times suffer tremendously.
Also, this is not a scalable solution, there is only a limit to the number of instances that you can spawn on ECS till it becomes cost prohibitive(EC2s are expensive!).
Lambda to the rescue!
This solution can be re-architected using lambdas. Lambda does have extremely high scalability, but it has some limitations. If the uploads require extremely large files, lambda might not be a possible solution. One of the many possible solutions that can be implemented is stated below. This is also an example of moving towards a microservice architecture piece-meal.

Step 1: Initiate the upload.
Step 2: Front end task passes the file information to backend ECS task.
Step 3: ECS API does the basic validations.
Step 4: Create a Pre-signed URL on the temp bucket with a timeout set.
Step 5: Create a temp record in the DB for the initiated upload.
Step 6 : Encrypt the user data, file information and temp record using AES 256 and send it to browser along with the pre-signed URL.
Step 7: Browser initiates an S3 PUT of the file and the payload as metadata to the temp bucket on the pre-signed URL, bypassing the ECS. Upon completion, the browser starts polling for the temp DB record through another API. The polling is set at an interval with a timeout set(Step 11).
Step 8: S3 PUT triggers the Lambda. Lambda decrypts the payload, does the validations and if successful, moves the record to the main S3 bucket(Step 9) and updates the record with success(Step 10). In case of failure, the lambda updates the DB record with the reason for failed operation(Step 10).
This is just one of the ways to solve the scaling problem with a monolith. Other ways to improve this solution can be to use AWS Cognito, AWS STS or use separate database to keep track of uploads and rules to prevent polling.
The above solution gives the tremendous scaling ability of AWS lambdas, reduces scaling costs for ECS. Though it increases the S3 costs by a fraction and lambda costs( by much :-))but the ECS price reduction more than offsets this price increase. After all, how can one put a price tag on scalability?
