At Cluster we deal with a lot of mobile photo uploads, so it’s critical that we get users’ media uploaded and viewable as quickly as possible.
Mobile uploads to Cluster typically happen over slow mobile networks, and these uploads can take quite a long time — they also fail often and need to be retried.
As a result, we have separated uploads from normal backend API requests — we don’t want these finicky upload requests contending for the same infrastructure we use to serve user data quickly.
Since launching in early 2013, we’ve been using S3 as an intermediary for uploads, and this approach has served us well. But when we added support for video in Cluster 2.0, we needed a better solution.
Original approach: Uploads to Amazon S3
The original Cluster upload architecture is pretty simple.
First, we created a private S3 bucket with an aggressive expiration policy. Our native API client apps each get a unique IAM role with write-only access to this bucket, which we use to upload media with a UUID filename using the Amazon S3 SDK.
The API server itself gets a separate IAM role, with read-write access to the uploads bucket as well as write access to the world-readable media serving bucket.
So a file upload from a client looks like this:
- Prepare the media (resize, compress, apply filters, etc.)
- Upload the media to the “uploads” bucket with a unique filename
- Send the uploaded filename to the Cluster API
Then, in the Cluster API itself:
- Do some light validation on the uploaded file — don’t download the entire thing, but validate that it exists and is probably an image
- Move the file to the public serving bucket using a PUT+COPY
- Add the final publicly readable filename to the database and begin serving it to clients
The main drawback of the S3 upload approach is that we can’t really inspect the uploaded file without actually downloading the entire thing to the API server. Obviously we don’t want to take the time to do that, and it would slow things down considerably if we did.
For browser-based uploads (or third-party API client uploads eventually), we need a different approach — we can’t be reasonably confident these are sending us valid content.
When you introduce larger media in a wider variety of formats, the approach is even worse. Video was not going to work this way.
Enter the validating upload server
Our solution to all this is to build a new endpoint for handling uploads, which performs validation as files are uploaded. It’ll be completely independent of the normal API, so we don’t need to worry about it clogging things up over there. We also don’t need to use the same tools as the API, which allowed us to try something new.
Since this server is going to be handling many unreliable client connections, it should be built with tools that specialize in that sort of thing.
It would also be nice if we could streamline inspecting the media and uploading the media to its final destination, especially as the size of uploaded media grows and transfer time increases.
It’d be a shame to have to wait until 100% of the file uploaded
before we could transform it or upload it elsewhere.
Streaming media validation with Go
Building a simple streaming upload server is alarmingly simple with Go’s standard library.
The net/http package provides a MultipartReader, which can stream a multipart HTTP request’s POST body as it is being received. Given that, we can easily inspect a file as it is being uploaded, and write the result wherever we want (disk, S3, or both) on the fly.
In our case, we want to validate the video or photo as soon as we possibly can so we don’t waste any time.
Conveniently, most of the important validation bits are at the beginning of the files anyway: You can usually glean most of what you need to know about a jumbo-sized JPEG from just a few kilobytes of EXIF data at the beginning of the file. (For streaming-optimized MP4 video, you can do the same thing.)
So, we can actually pipe a subset of the bytes before the file is done uploading directly into ImageMagick, ffprobe, or whatever else in order to extract the metadata we need about the uploaded file.
To handle this, the HTTP request handler ends up looking something like this:
- Inspect the request to make sure it’s a valid multipart POST
- Start a goroutine with a MultipartReader, pulling the file out of the request body and into a temporary location (file, memory, etc.)
- When we have enough of the file, actually feed the uploaded bytes into a subprocess to read the metadata
- If the file is hopeless, or is some other unacceptable format, you can end the upload right away
- By the time the upload is done, we have all the validated data we need already — store this validated metadata and the file somewhere the API can read it.
The upload server then gives the client a unique token, which the client can pass to the backend API to actually associate the validated media with their account.
One step further: Heavy lifting as the file uploads
In some circumstances, our API clients can produce videos in orientations other than “up”, or videos that are too large, or videos that are in some nonstandard format. Altering the videos in the client can be prohibitively expensive, so ideally this would just be solved once and for all on the server side.
As a result, we wanted to go a step further: We wanted to do a quick first pass at video encoding, to make sure uploaded videos are immediately playable across most devices in a valid format and orientation.
To do that, we modified the uploader to actually start an ffmpeg process reading from stdin, which we can pipe the video data into as it is being delivered to us.
The uploader process ends up looking something like this:
(The video still needs to be re-encoded and optimized into different formats later, but after this first pass the video is immediately available and playable on most devices.)
Quantifying the time savings
The reason we’re doing all this is to make media uploads faster for users. So what’s this streaming encoder end up saving us?
To test this, I added an option in our uploader to perform a “serial” upload by waiting to encode a video until the full file was uploaded, and performed a bunch of file uploads using curl, alternating serial mode.
Here are the results:
3MB video — Upload to localhost, n=50
Average time Median Max/Min
Streaming 3.42 3.41 4.39/3.05
Serial 3.49 3.44 4.38/3.08
As you can see, in a low latency situation the difference between the two techniques is negligible, because the ultra-fast transfer time does not give the video encoder a head start. This changes when you introduce some network transit time.
3MB video — Upload to remote host over wifi, n=100
Average time Median Max/Min
Streaming 5.91s 5.58s 9.69/4.55
Serial 8.64s 8.12s 12.58/5.63
In the case of this small video upload (3MB), we shaved off 31% of the overall upload time by processing the video as it was delivered to us.
Turns out that as long as the encoder can keep up with the input stream, you can subtract the encode time from the overall upload time this way.
This might not work for 1080p high-definition video, but it works great for short mobile uploads, which is what we’re offering in Cluster.
12MB video — Upload to remote host over wifi, n=50
Average time Median Max/Min
Streaming 17.82s 16.553s 48.78/9.69
Serial 22.42s 21.937s 41.95/15.76
For larger videos, more time is spent transferring bits — but the saving are still substantial: around 20% in this case.
Our experience with Go so far has been great, and the uploader has handled user uploads swimmingly since our public launch last week. We will continue to post any technical insights we come up with as we continue to build out new features and infrastructure using Go.
Thanks for reading this! Any thoughts? If you enjoyed this article, I would really appreciate you hitting the recommend button below. Connect with me on Twitter @taylorhughes with any comments or thoughts.