Serverless, large file downloads to S3

Using AWS Step Functions and Lambda for Fanout

Lee Harding
circuitpeople
7 min readFeb 2, 2018

--

I have a love for FaaS, and in particular AWS Lambda for breaking so much ground in this space. Many of the most valuable uses I’ve found for Lambda involve cost and performance as core requirements — that is, if the service can be 10x faster or cheaper it will provide disruptive benefits to the customer.

Fanout is a key mechanism for achieving that kind of cost-efficient performance with Lambda. Fanout is a category of patterns for spreading work among multiple Function invocations to get more done sooner. This is, of course, horizontal scaling (also known as “scaling out”) and works by using many resources to side-step limitations associated with a single resource. Specifically, this might mean getting more CPU cycles in less time, more bytes over the network in less time, more memory, etc.

An example I like to use here is moving a large file into S3, where there will be a limit on the bandwidth available to the Function *and* a limit on the time the function can run (5 minutes). I’ve done some experiments to demonstrate the effective size of file that can be moved through a Lambda in this way. The image below shows the result of a recent one where a Step Function state machine is used to measure the time to download increasingly large files.

Attempting to download a set of increasingly large files. Timings shown in the table below.

The bottom line here is that files larger than a several GB won’t reliably download in a single Lambda invocation. The effective bandwidth over this range of files sizes varied from 400 to 700 million bits per second. Good, but not enough for moving some interesting things (e.g. the NSRL hashsets, Videos, ML training sets, etc.).

Download  MB      ms
10MB 10 221
100MB 100 1873
1GB 1024 16510
10GB 10240 190015
100GB 102400 failed

Note that AWS will very likely improve these numbers — they have a great track record of continuously delivering on such things. Nonetheless, there will always be a limit, and that limit is small enough now to cause problems. Lambda executions can only run for 5 minutes (300,000ms) so extrapolating the data above indicates that downloading anything above about 15GB will consistently fail. Extrapolating further, it looks like the Lambda execution time limit would need to be increased to over 30 minutes for a 100GB file to have a chance of downloading in a single execution.

At this point we could throw up our hands and go back to long-running transfers on EC2 or an ECS Container, but that would be silly. Fanout is the obvious answer, because:

  • We’re moving the file from a website that supports HTTP Range requests (i.e. we can request a specific sub-section of the file rather than the entire thing).
  • S3 supports Multi-part Uploads (i.e. we can upload different sections of the file into parts, and combine them once completed)
  • Lambda Function executions run as isolated environments with their own CPU and network capabilities

By using multiple executions we can download different ranges of the source file in parallel, with each creating a “part” in S3, and then combine the parts once all ranges are complete. To demonstrate the idea, consider this simple prototype with AWS StepFunctions.

Simple prototype state machine for downloading files from 0–20GB in size.

If you aren’t familiar with Step Functions (you might want to be, it’s an excellent tool to have in your kit), the important thing to know here is that each node in the diagram is either a link to a Lambda function to be run (aka. a Task state), or a flow-control node such as a Choice, Pass or Parallel state. Choice states allow control to be passed to one of many subsequent nodes based on conditions on the output of the preceding node. Pass states allow simple transformations to be applied to the input before passing it to the next node (without having to do so in a Lambda).

The core of this state machine is the Parallel state (represented by the dashed border region), which provides concurrency through: Executing its child state machines (aka. branches) asynchronously; waiting for them to complete, and; proceeding to the following node. The output of a Parallel state is an array containing the output of the last node in each child branch.

In the diagram above the left-most branch contains a single Task that downloads the first part of the file (the other two nodes are Pass states that exist only to format input or output). The other branches contain conditional logic based on the size of the file:

  • If the file is larger than the minimum needed by the part, download the appropriate 1/5th of the file. For example the second branch will download and create a part only if the file is larger than 5MB, the third 10MB, etc.
  • But if the file is less than 5MB ,(or 10, 15, etc. for the other branches) the download is skipped.

As you can see, this idea can be scaled-out to allow the download of very large files and with broad concurrency. What are some of the details here?

The first step is to determine if the source URL supports Ranges would normally be to make an OPTIONS request. AWS S3 endpoints support Ranges but because it’s used for CORS it doesn’t work for simple queries like ours (basically it requires a couple extra headers). So we instead make this check using a HEAD request, which achieves the same result:

var request = WebRequest.CreateHttp(url);
request.Method = "HEAD";
using (var response = await request.GetResponseAsync())
{
var supports_ranges =
response.Headers.AllKeys.Contains("Accept-Ranges")
&& response.Headers["Accept-Ranges"].Contains("bytes");
var content_length =
long.Parse(response.Headers[HttpResponseHeader.ContentLength]);
...
}

Another HTTP-related detail is how to make a request for a subset of content once we know supports_ranges is true. Not all servers/domains will support ranges. If they don’t, asking for a range may (or may not depending on the server software) cause an error response. In some cases, the range request will simply be ignored and the entire content will be returned.

var start = ...index of first byte to be returned
var end = ...index of last byte to be returned, _inclusive_
var request = WebRequest.CreateHttp(url);
request.Headers[HttpRequestHeader.Range] = $"bytes={start}-{end}";
using (var response = await request.GetResponseAsync())
{
var part_length =
long.Parse(response.Headers[HttpResponseHeader.ContentLength]);
if (length != (end - start + 1)) // remember, end is _inclusive_
throw new Exception("Oops. Unexpected length of content.");
...
}

To create S3 upload parts from specific ranges we need to obey some rules for multi-part uploads. Primarily, only the last part can be smaller than 5MB. It’s also notable that we can have no more than 10,000 parts in all. This StepFunction based prototype works well within those bounds.

One more more implementation detail. The payload passed to the function for downloading and creating each part must include the:

  • Source URL
  • Multi-part Upload ID
  • Part Number

The part number and upload ID are required by S3’s UploadPart API. The part number is also used to determine the range of bytes to copy (remember, the end byte index is inclusive). With all parts created, the final step is to combine them by calling S3’s CompleteMultipartUpload API:

  • Multi-part Upload ID
  • List of Part Numbers and associated ETags returned by the S3 UploadPart API

And that’s it.

Here are what the timings looked like for downloading the same large files mentioned in the start of this article:

Download  MB      ms
10MB 10 883
100MB 100 1299
1GB 1024 3713
10GB 10240 36267
100GB 102400 see below...

Except for the smallest file, where the overhead of transitions in the state machine dominate, we’ve delivered a pretty nice speed up. For the largest file (10GB) the speed-up is a near-linear 5x. That’s what I wanted to see in a prototype.

With only 5 branches each limited to 5GB (the maximum size of a part) the maximum download is 25GB. To test the 100GB file I expanded the number of branches to 20 and found the download time to be 93,128ms (that’s an effective download speed of ~1GB/s or 8Gbps). Since each branch in the 10GB file case downloaded only 2GB vs. 5GB in the 100GB case, this again represents near-linear scaling — the best that can be hoped for with concurrency.

How far will this go? To support the full potential of S3 would require 10,000 branches — perhaps that would work, but think other things would start going sideways at that scale. Maybe I’ll find out by looking into dynamically-generating the AWS StepFunctions state machine (with retry and error handling, of course)…

This prototype has taken us from “it can’t do this” to “rocking the download world” with Lambda and a clear and obvious application of the Fanout concept. In a subsequent article I’ll look at a different fanout pattern and scaling out with recursive Lambda executions — mind the guardrails.

Want to go further with this? Improve robustness by making the part creation restart-able. S3 has an API to list incomplete multi-part uploads and the parts created so far. So when the state machine is restarted the parts that completed on the previous try can be no-op’d. Caution, though. Think about all the ways corruption of the file might happen and what kind of verification is needed to make sure a set of parts are safe to use to complete the upload.

--

--