Handling URL-Encoded S3 File Names

I’ve been working on Greenhouse’s Onboarding application for about a year and a half now. One of the more recent features I’ve developed was adding the ability to export tasks to a CSV. Given that this could reach tens of thousands of tasks for larger customers, my first approach didn’t work — I couldn’t fetch the data server-side, stream it to the client, and build the CSV file using JavaScript.

The Feature Implementation

Instead, I implemented the feature such that when a user clicks the “Export” button, the request kicks off a background job (using Sidekiq Batches) and returns the Batch ID to the client. The client then polls the server every second with this ID in order to get a status update — a boolean completion status and a list of any errors, if any.

Assuming there are no errors when the batch completes, the server uploads the file to S3 and then stores a temporary URL in Redis. Once the client learns that the file building and uploading is complete, it can use the Batch ID to retrieve the S3 URL from Redis. And with that URL, it’s just a matter of adding a download button to the DOM for the user to click. Easy.

Well…

The Problem

That’s where I ran into the problem. When generating this CSV, I’d set the file name to something like Tasks Export 2017-11-05 06_10 PM.csv. When clicking the download button though, I’d see a file named Tasks+Export+%282017–11–05+06%3A10+PM%29.csv. Not the most user-friendly experience…

I was naming the file properly, so why was this happening? Turns out that our wrapper around the AWS SDK automatically URL-encodes the key used for writing a file to S3. This is considered a best practice and is encouraged by AWS (see the “Characters That Might Require Special Handling” section). There are certain characters that are valid and supported by most operating systems’ file services, but AWS requires them to be URL-encoded.

My first thought to deal with this hurdle was to use JavaScript to trigger the download and set the file name, but that just seemed like it would be too much work to support across all supported browsers. There had to be something more straightforward.

Then I decided to rely on the download attribute on anchor tags. Adding a value for that element attribute allows you to set the file name upon download — except when it doesn’t. There are a few exceptions to this rule.

Source

This attribute only works for same-origin URLs.

Given that the export files were stored in S3, regardless of what I put for the download value, it would be ignored. Internet safety for the win! One potential workaround I considered was to make a call to the server, allow the server to fetch the document from S3, and then respond to the client with it. But rather than adding even more server code, I found a one-line fix!

The Solution

If the HTTP header Content-Disposition: gives a different filename than this attribute, the HTTP header takes priority over this attribute.

With Onboarding’s use of the AWS SDK, writing a file to S3 uses the #put_object method under the hood. Adding some more logging, here’s what I saw being passed in…

put_object(
:bucket_name=>"gho-dev-bucket",
:content_length=>243141,
:data=>TempfileObject,
:key=>"stash/random/Tasks+Export+%282017–11–05+06%3A10+PM%29.csv",
)

With no value for content-disposition in the header, the file’s name was being set by the argument key’s value. Luckily, #put_object accepts content_disposition as an argument! All I needed was to pass it the below line and, upon download, the file’s name would no longer being URL-encoded.

:content_disposition=>“attachment; filename=\”#{filename}\””

The Code

Here’s what the relevant code looks like…

def stash_file(file, filename, options={})
s3_options = options.merge({
key: "stash/#{secure_random_key}/#{CGI.escape(filename)}",
c
ontent_disposition: "attachment; filename=\"#{filename}\"",
})
s3_object.write(file, s3_options)
file_url = s3_object.url_for(:read, s3_options)
  $redis.set(redis_key, { url: file_url }.to_json)
end

Accepting a Tempfile object, the file name, and options to be passed to #put_object, #stash_file will upload a file to S3 and store the S3 URL in Redis. S3Object#write calls #put_object under the hood.

Hope this has been helpful!