File Download with Error Handling

Atlantbh
Atlantbh Engineering
11 min readOct 6, 2020

Designing an API that serves files to be downloaded is not easy. In a way, it is an exception to the standard way of designing APIs. Nowadays most APIs produce XML or JSON, and that is not the case with download, which produces the format of the file being downloaded.

Multiple methods are deployed to achieve the goal of making a browser accept a file and start the downloading process. We’ll discuss some of them, along with their advantages and disadvantages.

OBJECTIVES

Our main objective is to ensure a robust file downloading process. That means the detection of download failures as well.

Let’s assume the most demanding use case (not including on-the-fly encryption): on event (e.g. button click) start the download process (browser taking control of pausing/stopping, progress tracking) but if for some reason the download failed show dialog and inform user, staying on the same page. This is not as easy as it sounds at first glance — and we will write about this in a moment.

Other objectives:

  • Preserve original file name and extension
  • Preserve content-type
  • User shall be notified about download start immediately (not wait until it is finished)
  • Make no difference between small files (a couple of kilobytes) and big ones (few gigabytes)
  • Take advantage of file streaming (chunked response) if possible

PROBLEM

While designing standard APIs, there are best practices to follow. So, various styles are created, e.g. SOAP, REST (or REST-like), GraphQL, and so on. In this blog, we’ll focus on a REST-like approach but the same applies to others, especially GraphQL.

There are two ways to signal to an API consumer that something went wrong:

  • HTTP status codes (e.g. 404 — NOT FOUND, 400 — BAD REQUEST)
  • Structured response indicating error

HTTP status codes are easy but limited. For example the 400 BAD REQUEST status code — it tells us that our input may be malformed or incorrect. But it won’t tell us anything else, e.g. what fields are incorrect.

A structured response may have a little more information, but it may be harder to use. For example, we may structure our JSON response as follows:

{
"success": true|false,
"data": {
"some_data_if_success": 1
},
"error": {
"error_info_if_failure": 0
}
}

Usually, a combination of http status code and structured responses is used, even in case of error.

But, for download there is an exception! Our goal is to make browsers handle downloading and all subsequent things (user interaction, pausing, resuming, progress tracking, download speed tracking and so on), which nowadays browsers are very good at. So we need to serve files directly, not via structured response. Also, we are serving different kinds of files, not JSON, but we’ll discuss this as well.

There are several methods for downloading files, and they all are shortly described below.

DOWNLOAD FILES FROM BACKEND

To keep our blog concise as possible, we’ll use NodeJS with Express for our backend example. The server starts at port 8082, allows CORS for our example frontend app (on port 8000, see below) and exposes Content-Disposition header, so we can extract filename from header in some of our methods. Only one route is present — to download the file. Path parameter should_fail is used to simulate download failure.

This code will be updated later to accommodate new download methods.

const express = require('express');
const cors = require('cors');

const app = express();
app.use(cors({
origin: 'http://localhost:8000',
credentials: true,
exposedHeaders: ['Content-Disposition']
}))

app.get('/download/:should_fail', function (req, res) {
if (req.params.should_fail === 'true') {
res.status('404').send("Download failed!")
} else {
res.download(`${__dirname}/sample.pdf`);
}
})

var server = app.listen(8082, function () {
console.log(`Started server at port ${server.address().port}!`);
})

POSSIBLE SOLUTIONS

All methods are written in plain JavaScript without any external libraries. ES6 is used. For simplicity, the code is served using python’s simple http server, but other web servers could be used (e.g. Apache, nginx, …). Python’s server will start at port 8000.

We can start the server like this:

For python2:

python -m SimpleHTTPServer

For python3:

python -m http.server

Method 1: Wrapped download response

The first possible solution we’ll discuss is to somehow include the downloaded file inside the structured response. Of course, file content shall be encoded in some form, e.g. Base64, because file content may not be text (it may be binary data), and structured response (e.g. JSON) is a text, and there can be misinterpretation of file content bytes leading to a corrupted file.

An example may look like this (actual content is obviously trimmed down):

{
"success": true,
"data": {
"file_name": "sample.pdf",
"content_type": "application/pdf",
"content": "JVBER ... NCg=="
}
}

This way we can detect possible errors, field success will be false and error field will be present indicating the nature of the error. However, there are many disadvantages with this approach:

  • The file must be encoded (e.g. via Base64) on the backend which takes CPU time, and probably RAM resources if the encoding is not done in a streaming fashion
  • The file cannot be streamed
  • The user must wait for the request to finish before the download will be shown
  • The browser tricks may be deployed to trigger the download (e.g. via data URL, will be described in another method)
  • The file must be decoded on the user’s browser which also takes CPU time and this time almost certainly RAM resources because whole response will be in memory at one point

Advantages:

  • Can detect errors
  • The user stays on the same page (no refreshing, no redirects)
  • May be acceptable for (very) small files e.g. user avatars or some icons.

Method 2: AJAX/FETCH request

As already stated, the biggest problem is how to reliably detect download errors and do something about them. With this method we use AJAX to perform a direct call to the backend, fetch file, read content-type or HTTP status code to determine if the download is successful and then use some tricks to determine filename and trigger download.

To determine filename, we can read Content-Disposition header which may look like this: content-disposition: attachment; filename=”sample.pdf”. Filename can be extracted via RegEx (see this StackOverflow answer) or in some other way.

When a response is received we check for status code and if it is 404 (see server implementation above) we show some kind of a dialog. Otherwise, we prepare the anchor element (link) and set its href property. There are two ways of doing this:

  1. As a data url: we construct data url as follows:
    data:<CONTENT_TYPE>;base64,<ACTUAL_DATA>, for example: data:application/pdf;base64,JVBER … NCg==
  2. As an URL object: we create Blob (with specified content type) and create URL object, like: window.URL.createObjectURL(blob)

Option 2 is preferred because it is much better for big files (some browsers may stream it into their internal buffer), and can be revoked (removed from memory).

Here is a code snippet, using the fetch API (shorter and more robust than plain old AJAX).

let downloadedBlob;
function download(shouldFail) {
fetch(`http://localhost:8082/download/${shouldFail}`)
.then(response => {
const statusCode = response.status;
if (statusCode === 404) {
alert("Download failed!") //show real dialog
} else {
let link = document.getElementById("ex-download-link");
if (!link) { //element not created yet
link = document.createElement('a');
link.id = 'ex-download-link';
document.body.appendChild(link);
}
if (downloadedBlob) { //check if URL component is already created
window.URL.revokeObjectURL(downloadedBlob);
downloadedBlob = 'undefined';
}
response.blob().then(blob => {
const disposition = response.headers.get("Content-Disposition");
link.download = extractFilename(disposition);
downloadedBlob = blob;
link.href = window.URL.createObjectURL(blob);
link.click();
})
}
})
}

Code is called as download(true) or download(false). We may add on Click listeners to two buttons to demonstrate this.

<button onclick="download(true)">Download (fail)</button>
<button onclick="download(false)">Download (success)</button>

Advantages:

  • Detects errors
  • The user stays on the same page (no refresh, no redirect)
  • May be acceptable for (very) small files e.g. user avatars or some icons.

Disadvantages:

  • Event if the file is streamed from backend, it is fully loaded on browser
  • The user must wait for the request to finish before download will be shown
  • Browser tricks may be deployed to trigger the download ( data url or createObjectUrl)

Method 3: Simple redirect

This is probably the most simple method here. It just redirects the app to the backend URL which is serving files. Redirect is done by the browser so all cookies are sent as well and the backend can authenticate the request.

<button onclick="download(true)">Download (fail)</button><button onclick="download(false)">Download (success)</button>function download(shouldFail) {
window.location.href = `http://localhost:8082/download/${shouldFail}`;
}

Advantages:

  • Simple
  • The file can be streamed if supported
  • The filename is preserved
  • No need to encode on frontend or backend
  • The user is notified immediately
  • When download is successful, there is no redirect nor refresh

Disadvantages:

  • When a download fails, the user is really redirected to the page (Figure 1), which may not be acceptable. Actually, one of our mandatory goals from the beginning of the blog is that no redirects are not allowed.

So, if there is no requirement to stay on the same page, this method is preferred. But in other cases and in cases of single-page applications, this is not acceptable.

Method 4: HEAD and GET

This solution requires lifting the weight to the backend. The idea is that the frontend app first sends a HEAD http request (via e.g. fetch from method 2) to the backend to check if the file is there or if there are any errors with the file. Backend may respond via header and/or http status code. If the frontend app concludes that there are no errors, it may fallback to classic window.location.href = workingUrl from method 3.

function download(shouldFail) {
const URL = `http://localhost:8082/download/${shouldFail}`;
fetch(URL, {method: 'HEAD'})
.then(response => {
const statusCode = response.status;
if (statusCode === 404) {
alert("Download failed!") //show real dialog
} else {
window.location.href = URL; //from method 3, actual GET
}
})
}

Advantages:

  • The file streaming is supported if possible
  • The user stays on the same page (no refresh, no redirect)
  • Fairly simple
  • Detects errors if any
  • The user is notified immediately
  • The filename is preserved

Disadvantages:

  • Two requests required
  • Complex backend logic. For example if retrieving a file means that it is created on the backend (e.g. real-time reports), the backend must create the file twice (for HEAD and GET requests) or somehow store it between the request (which is also a problem if the file is deleted after being served — how long to keep the file, what if the GET request is never done…)
  • There may not be a guarantee that the GET request will succeed. E.g. if the backend is retrieving a file from Amazon S3 and on HEAD there is a file but on the second request, the GET one, the file may not be there and we have our first problem. Also, imagine the complexity of the HEAD method on the backend to check if there is a file on another, external, server (eg. the S3) without retrieving it!

Method 5: IFrame trick

HTML element <iframe> can be used to download files. Just like with method 2, but simpler. We create an iframe, “load” file in it and viola! Download is triggered.

function download(shouldFail) {
let iframe = document.getElementById("ex-download-iframe");
if (!iframe) { //element not created yet
iframe = document.createElement('iframe');
iframe.id = 'ex-download-iframe';
iframe.style.display = 'none'
document.body.appendChild(iframe);

}
iframe.onload = e => {
//what?
//check content?
}
iframe.src = `http://localhost:8082/download/${shouldFail}`;
}

Advantages:

  • Simple
  • The user stays on the same page (no refresh, no redirect)
  • May support file streaming
  • The user is immediately notified of the download process

Disadvantages:

  • Impossible (or really hard) to detect download failure. Because the iframe is hidden, on download failure nothing happens from the user’s perspective.
  • We do have onload event, but it is hard to retrieve content (impossible on a different domain because of CORS) and if we somehow do retrieve the content, what about it? To check specific strings? When? When download is finished? Those are the questions that are steering us from this method.

Method 6: Backend redirect

Backend can redirect (via 302 FOUND status code) response in case of a download failure. This differs from method 3 (window.location.href = url) in a way that different URL may be specified and if some advanced framework is used on the frontend, it may trigger a specific route which will show the desired dialog. For example, the backend may redirect to <frontend.url>/components/download-error. If that URL can be used to show dialog only and not redirect or refresh, our problem is solved!

In this approach we can use AngularJS or some advanced framework with substate management to achieve a desired result. Backend would redirect to some route, the app will follow it and show dialog without redirecting. We tried an approach with AngularJS and it’s working fine, but unfortunately, we couldn’t avoid refresh. So if you have some ideas to make it work, please let us know.

Advantages: same as with method 3, but the real disadvantage is refresh.

SOLUTION

To achieve all of our goals from the beginning of the blog, we figured out this solution. The idea is that the backend returns executable JavaScript code in case of a download error. Returned JavaScript will do a very simple task: post a message to the parent that says “download error happened”.

const express = require('express');
const cors = require('cors');

const app = express();
app.use(cors({
origin: 'http://localhost:8000',
credentials: true,
exposedHeaders: ['Content-Disposition']
}))

app.get('/download/:should_fail', function (req, res) {
if (req.params.should_fail === 'true') {
res.status('404')
.header('Content-Type', "text/html")
.send(`
<!DOCTYPE html>
<html lang="en">
<head>
<script>
window.onload = () => parent
.postMessage('download.error', 'http://localhost:8000');
</script>
</head>
<body>
</body>
</html>`
)
} else {
res.download(`${__dirname}/sample.pdf`);
}
})

var server = app.listen(8082, function () {
console.log(`Started server at port ${server.address().port}!`);
})

The script is sent in a <head> tag. When the body finishes loading (and that is when the body is loaded into our iframe), a message is posted to the parent window.

Our modified frontend looks like this:

const SERVER = 'http://localhost:8082';
//listen to events
const event = window.addEventListener ? 'addEventListener' : 'attachEvent';
const messageEvent = window.addEventListener ? 'message' : 'onmessage';
const eventFunction = window[event];
eventFunction(messageEvent, listenToEvents); //listen to events

function download(shouldFail) {
let iframe = document.getElementById("ex-download-iframe");
if (!iframe) { //element not created yet
iframe = document.createElement('iframe');
iframe.id = 'ex-download-iframe';
iframe.style.display = 'none';
document.body.appendChild(iframe);

}
iframe.src = `${SERVER}/download/${shouldFail}`;
}
function listenToEvents(event) {
const key = event.data ? 'data' : 'message';
if (event.origin === SERVER
&& event[key] === 'download.error') {
alert ('Download failed'); //show real dialog
}
}

On the frontend, we register postMessage listener to listenToEvents function. When download is successful, iframe will load content and start the downloading process. In case of a failed download, an iframe will load our HTML we returned from the backend. That HTML contains JavaScript code that sends a message “download.error” to its parent. We specify the origin of the parent (frontend app).

On the frontend, listenToEvents will be triggered when an iframe loads JavaScript code that sends a message. We check for origin (mandatory! Malicious sites can send messages too) and if message data is “download.error” we show the dialog.

To support older browsers (e.g. IE8), we need to check if addEventListener is available, so the code that attaches itself for events may look like this.

const SERVER = 'http://localhost:8082';
//listen to events
const event = window.addEventListener ? 'addEventListener' : 'attachEvent';
const messageEvent = window.addEventListener ? 'message' : 'onmessage';
const eventFunction = window[event];
eventFunction(messageEvent, listenToEvents); //listen to events

function download(shouldFail) {
let iframe = document.getElementById("ex-download-iframe");
if (!iframe) { //element not created yet
iframe = document.createElement('iframe');
iframe.id = 'ex-download-iframe';
iframe.style.display = 'none';
document.body.appendChild(iframe);

}
iframe.src = `${SERVER}/download/${shouldFail}`;
}
function listenToEvents(event) {
const key = event.data ? 'data' : 'message';
if (event.origin === SERVER
&& event[key] === 'download.error') {
alert ('Download failed'); //show real dialog
}
}

SUMMARY & CONCLUSION

The title “How to detect download errors” is justified by showing multiple ways of detecting errors. Every method has its advantages and disadvantages, and we listed most of them:

For one (rather specific) case “detect download error and show a dialog if it happens, staying on the same page without refreshing it”, the last method was the solution because it fulfills all the objectives from the beginning of the blog including file streaming.

For other use cases, method 3 is probably best, it is easy to implement, covers most cases and concerns, and it can notify users about failed download via a simple redirect to a “generic” page e.g. “Download failed… Go Back (link/button)”.

Blog by Neira Pulo, Software Engineer at Atlantbh

Originally published at https://www.atlantbh.com on October 6, 2020.

--

--

Atlantbh
Atlantbh Engineering

Tech-tips and lessons learned about Software Engineering