A Sufficiently Complex Cloud Application
Building a Video Stabilization Service Backed By AWS
One of the best ways to learn something is to build a sufficiently complex example and teach it to others. Something that is sufficiently complex, meaning more than a Hello World or a TodoMVC. Those aren’t sufficient for getting over the hype curve and getting a glimpse into specific nuances.
In this post we’re going to cover nearly everything it takes to build a video stabilization service. It will allow users to upload a video file and ffmpeg will do the heavy lifting for performing video stabilization.
# Calculate transform vectors
ffmpeg -i shaky-video.mp4 -vf vidstabdetect=stepsize=6:shakiness=5:accuracy=15:result=transform_vectors.trf -f null -# Stabilize the video
ffmpeg -i shaky-video.mp4 -vf vidstabtransform=input=transform_vectors.trf:zoom=0:smoothing=30:crop=black,unsharp=5:5:0.8:3:3:0.4 -vcodec libx264 -preset slow -tune film -crf 18 -acodec copy stabilized-video.mp4
This may seem too trivial, even the Hello World variety, but once we wire together various AWS services and apply some cloud best practices it’ll quickly become sufficiently complex to learn valuable lessons.
Running the script manually in EC2
Let’s start simple and get the above script running in the cloud. Since this is a long running process, potentially longer than Lambda’s 15 minutes maximum timeout, this process will run on an EC2 instance.
To keep things simple I’ve chosen a Amazon Linux 2 instance running on free tier hardware. After launching the instance into a publicly accessible subnet I ssh into the instance, download ffmpeg using curl, and paste the two ffmpeg commands into a bash script.
The commands are in place, but we need a shaky video to stabilize. We also need a way of making that video accessible to the EC2 instance. Of the storage options available S3 makes the most sense because we want users to be able to upload videos over HTTPS.
An important part when creating the bucket is to ensure that it isn’t publicly accessible. There’s no need to allow open access to the S3 bucket and we don’t want that door open anyway for security and privacy reasons.
With the S3 bucket created let’s upload a video to S3 for stabilization.
And now to get the video from S3 over to the EC2 instance. We should be able to accomplish this with the AWS CLI.
Right, this EC2 instance does not have permission to talk to S3. The correct way to resolve this is to create a new policy, assign that policy to a role, and assign that role to the EC2 instance.
Full S3 access is overkill here but once the concept is proven out we can go back and make the permissions more restrictive.
Finally we can run the script and upload the results back to S3 to download.
And there we have it. We proved out a Hello World quality application and now we should start working towards getting user upload functional, but there are some best practices we should address first.
Decoupling EC2 from S3
Our EC2 instance(s) needs to know when to download a new video from S3. We could let EC2 poll S3 directly but repeatedly polling S3 for new videos is neither efficient or architecturally sound. There is however an AWS service that is specifically intended to decouple architectural components and that service is SQS.
Conceptually what we want to accomplish is when a new video is uploaded to the S3 bucket a message is published to SQS. A script running in an infinite loop on the EC2 instance will perform long polling on the queue and when a message is detected then it will read the bucket/key from the message and download the video from S3.
S3 cannot put messages directly onto SQS queues so we’ll recruit the help of a Lambda function, which is triggered by an S3 upload, that will publish a message to the SQS queue.
This loop will run continuously and will poll SQS 20 seconds at a time for new messages. As soon as a video is uploaded the above Lambda function places a message on the queue and this script will see that message, parse its contents, and download and stabilize the video.
Note the final command to delete the message off the queue. When a message is read off an SQS queue it temporarily becomes invisible so that other listeners don’t pick it up. After a specified period if the message isn’t deleted the video will be re-processed and that can be avoided by deleting the message after processing is complete.
AMIs, Launch Templates, and Auto Scaling Groups
The above EC2 instance was created by hand. That’s not so great if the instance crashes or something worse happens, like an entire availability zone loses power. It’d be extremely tedious to download ffmpeg again, copy in the bash script, and manually kick it off.
To avoid all of that we can create an image of the instance (an AMI) and use that image to spin up new instances later on.
With the image created I can now navigate to Images > AMIs and spin up new instances manually.
Although, I still don’t want to start instances manually. This is where auto scaling and launch templates come into the picture. With EC2 Auto Scaling I can specify how many instances I want running and under what conditions instances are added or removed. For example, when there are more than a few messages waiting in the SQS queue then I want to spin up several more instances to process those messages in parallel. When the queue is empty I want to spin down to only one instance to save on compute time.
In order for auto scaling to create instances I need to specify a Launch Template which is essentially a reference to an AMI and some additional configuration such as an EC2 role, startup script, and hardware. Let’s create a Launch Template first.
And don’t forget the startup script under Advanced details.
With this Launch Template created we can now create an auto scaling group that will launch new t2.micro instances automatically with ffmpeg and other dependencies preinstalled and will kick off the bash script on startup automatically as well. Sweet.
Let’s pause here a moment. The auto scaling group configured above will keep the number of running EC2 instances between 1 and 4. The number of instances incrementally increase based on a CloudWatch alarm that triggers when there are visible messages on a specified SQS queue.
Similarly, instances are decreased back to a single instance when another CloudWatch alarm, signifying low CPU usage and therefore low video stabilization activity, triggers. Note that this second trigger is based on CPU usage and not queue messages. Because video stabilization is a long running process I don’t want instance to die if the queue is empty but videos are still being processed.
At this point, instance provisioning is completely automated. If I wanted to I could upload 10 videos simultaneously to S3. A Lambda function would automatically publish 10 messages onto the SQS queue and this auto scaling group would react to the respective CloudWatch alarm trigger and begin spinning up new instances. As new instances are spun up messages are automatically read off the queue and videos are downloaded, stabilized, and uploaded back to S3. Finally, when things quiet down, the autoscaling group kills off instances until there is only one left.
Uploading videos from the web
The final part of this sufficiently complex cloud application is to allow anonymous users the ability to upload small (less than 50MB) videos and to download the stabilized results through a website.
Doing this actually turns out to be a somewhat tricky task as there are many security concerns (users should not be able to see videos from other people) along with AWS nuances (API Gateway upload limits, S3 presigned urls).
Let’s start with a conceptual look at the architecture.
In order to keep the S3 bucket private the idea is to make an API Gateway endpoint available for a user to upload a video file and ultimately have that video end up in the correct bucket. Unfortunately, API Gateway does not allow uploading large files and this approach will not work.
After some research I learned about S3 presigned URLs which will allow us to obtain temporary restricted access to a bucket and upload files directly to S3 without the need to make the bucket public. With presigned URLs in the mix the architecture now looks like the following diagram for video uploads.
As depicted above, the sequence goes like this. When the user attempts to upload a video the web application will send a request to an API Gateway endpoint and pass relevant details to a Lambda function. The Lambda function asks S3 for a presigned URL which is then returned to the web app. A secondary request is then immediately made to upload the video directly to S3.
The web code for this is fairly straight forward even if you’re not familiar with the language it is written in.
The first function is an HTTP GET and the second function is an HTTP PUT.
retrieveSignedUrl is invoked first and once the URL is obtained it is immediately passed to an invocation of
upload and the video is uploaded.
Once video stabilization is complete the user should be able to download the video. It can take several minutes for ffmpeg to do its work so we’ll need to think about how to tell the user when their video is done. We could make the user wait with the website open and either through web sockets or long polling provide the user with a download URL.
I think a much simpler solution is to use SES. Let’s ask the user for an email address prior to video upload and when an EC2 instance finishes stabilizing a video it can ask S3 for a presigned download URL valid for an hour and email that URL to the user.
To accomplish this we’re going to get clever with tagging. We have to wait for the stabilized video to exist in S3 before requesting a download URL. That means the EC2 instance needs to request the presigned URL on demand and it also means we need to figure out how to get the user’s email address to the EC2 instance before invoking SES.
Fortunately, we can add tags to the S3 presigned URL and have S3 auto-tag the object for us.
By adding tags to the presigned URL S3 will automatically tag the uploaded video. The EC2 instance can ask S3 for the tag, obtain the user’s email address, and after stabilizing and uploading the video the instance will use SES to send an email to the user with the download link.
Finally, here is some of the supporting code to accomplish all of this.
And that’s it! There is plenty more to accomplish but for the sake of brevity I’ll leave things there. In particular there are plenty of security concerns to address such as restricting IAM policies, adding in referrer origins, and server side validation and error handling.
I learned a lot by building a sufficiently complex application, more than I can fit in this blog post.
Something to keep in mind is sufficient complexity is easy to overshoot. It’s easy to have a big idea that should be easy only to see that side project not survive the first weekend being worked. If you want to learn more about the cloud my advice is to start with something seemingly too trivial and begin working through the five pillars.
- Operational Excellence
- Cost Optimization
In this post we started with two ffmpeg commands and began diving into at least three of those pillars, with plenty of work to do before making it production worthy. Doing this allowed me to learn lessons I wouldn’t come across in a trivial Hello World or an overly ambitious, yet never completed big idea.