Multithreaded file uploading with JavaScript

Michael Pilov
8 min readJan 29, 2018

--

File uploading is a trivial and pretty common task for web development. There are plenty of libraries, articles, guides and so on about it, but most of them doesn’t talk about major file uploading problem — errors. I think each of us knows how it hurts when uploading stops on 90% because of connection error and we have to upload whole file from the beginning again.

Today I want to talk about how to handle uploading connection problems and make your application work faster and more user-friendly. Well, let’s begin!

So how to handle it?

When we use standard uploading approach we can control almost nothing — we just have to wait until loading will complete. If there connection error will happen all uploaded data will be lost.

But… We can observe uploading progress, right? And we have convenient Blob.slice() method. Let’s cut our file by last loaded byte, upload rest part and stick parts on server. And we really can do so!.. except a problem — progress status is not really accurate.

Well, general idea is good, maybe we can improve it? What if we request server to report last loaded byte? And it really solves accuracy problem.

And we already can write code, but thing is I won’t speak about that approach :-) No, it is a good idea and it works fine, but I want to suggest you another method. What if we’ll cut file to plenty of small parts and send each part separately? If we do so we can send parts in parallel and probably improve uploading speed along with resume.

In this article I want to show you how to code multithreaded file uploading, and will compare it with classic uploading approach in the next one.

Preparing the file

Before we start let’s define chunkSize as a size of one file part. Then we have Math.ceil(file.size / chunkSize) parts. Let’s also define it as an array of parts indexes and then reverse it.

const chunksQueue = new Array(chunksQuantity).fill().map((_, index) => index).reverse();

I reversed it because I want to send parts consistently and take next part’s id from the end.

Now we ready to split a file:

function sendNext() {
if (!chunksQueue.length) {
console.log("All parts uploaded");
return;
}

const chunkId = chunksQueue.pop();
const begin = chunkId * chunkSize;
const chunk = file.slice(begin, begin + chunkSize);

upload(chunk, chunkId)
.then(() => {
sendNext();
})
.catch(() => {
chunksQueue.push(chunkId);
});
}

I hope you noticed that I return chunkId in the queue if the chunk uploading failed. It is important because we need to successfully send every part, and by doing this I can just repeat sendNext until chunksQueue will become empty.

Sending parts

Let’s begin with one part sending — basically it is not much different from classic file uploading:

function upload(chunk, chunkId) {
return new Promise((resolve, reject) => {
const xhr = new XMLHttpRequest();

xhr.open("post", "/upload");

xhr.setRequestHeader("Content-Type", "application/octet-stream");
xhr.setRequestHeader("X-Chunk-Id", chunkId);
xhr.setRequestHeader("X-Content-Id", fileId);
xhr.setRequestHeader("Content-Length", chunk.size);
// Size and real name of whole file, not just a chunk
xhr.setRequestHeader("X-Content-Length", file.size);
xhr.setRequestHeader("X-Content-Name", file.name);

xhr.onreadystatechange = () => {
if (xhr.readyState === 4 && xhr.status === 200) {
resolve();
}
};

xhr.onerror = reject;

xhr.send(chunk);
});
}

For transfer some service information I use custom headers — it is headers which begin from X-. It is a pretty good way to say your server something about incoming request. In that case I send chunkId as an X-Chunk-Id header for an instance.

You might also noticed that I set Content-Type as application/octet-stream although you might to expect multipart/form-data. Actually, you can use multipart, but it makes your server code a little more difficult, because you also have to parse multipart properly. And there are plenty of libraries for this, of course, but, actually, you don’t need multipart, because we send only files and separate them by themselves, so we don’t need to transfer boundary and so on.

I think the most mysterious thing in the function for you is fileId. Actually it is just an unique id for a file (not for a chunk!). And we have to send it, because the server needs to know chunk of which file it is receiving now. I hope you agree that if the server will concat chunks of two different files it will terrible!

But how to define fileId? It is a pretty good question. If you have some unique client id such as session you can just define id as a timestamp and will check session and the id on your server. But if you haven’t, I recommend to define id on the server side before first chunk will start uploading. In that case you can also send file.name and file.size before upload and don’t send them anymore.

Multithreading

Until now we have been sending chunks in one thread. So it’s time to improve it!

let activeConnections = 0;
function sendNext() {
if (activeConnections >= threadsQuantity) {
return;
}

if (!chunksQueue.length) {
if (!activeConnections) {
console.log("All parts uploaded");
}
return;
}

const chunkId = chunksQueue.pop();
const begin = chunkId * chunkSize;
const chunk = file.slice(begin, begin + chunkSize);
activeConnections += 1;
upload(chunk, chunkId)
.then(() => {
activeConnections -= 1;
sendNext();
})
.catch((error) => {
activeConnections -= 1;
chunksQueue.push(chunkId);
});

sendNext();
}

threadsQuantity here, as you might guess, is quantity of parallel chunk’s uploading. I also have to know how many connections are running for limiting parallel uploads. I also can’t say what file has uploaded before all connections will complete. But that isn’t much different as the previous method at the same time, really?

Server

Before we start code let’s think which problems we have to solve being building the server part.

First of all, we need a method which receives a file and writes it somewhere. In the case when I say a file I mean a chunk — one small part of the big file, but there isn’t really difference for the server. Next, we have to perform concatenation of all chunks when all chunks will be received. And last but not least — we have to ignore a chunk and not include it in a whole file if something went wrong and we didn’t receive full chunk.

app.post("/upload", (request, response) => {
const chunk = [];
request.on("data", (part) => {
chunk.push(part); // It is parts of chunk, NOT chunks of a file
}).on("end", () => {
response.setHeader("Content-Type", "application/json");
response.write(JSON.stringify({status: 200}));
response.end();
});
}));

In code above I use express for a http server creation, but I do this for a less code only! There is no any needs to use express and you might use whatever you want.

So, the first part almost done — we received chunk, but before we write it, lets think where? Remember, chunks can be sent in any order, but we have to concat them consistently — in the exact order they were in the file. And it sounds like we actually can store chunks in array and use chunkId as an index.

const fileChunks = [];
/* Not changed code */
}).on("end", () => {
const chunkId = request.headers["x-chunk-id"];
fileChunks[chunkId] = Buffer.concat(chunk);

Let’s also remember that server can receive plenty of files at the same time and it will be cool to don’t mess them up. Finally we can use our file id!

Final server code looks like this:

const fileStorage = {};

app.post("/upload", (request, response) => {
const fileId = request.headers["x-content-id"];
const chunkSize = Number(request.headers["content-length"]);
const chunkId = request.headers["x-chunk-id"];
const chunksQuantity = request.headers["x-chunks-quantity"];
const fileName = request.headers["x-content-name"];
const fileSize = Number(request.headers["x-content-length"]);
const file = fileStorage[fileId] = fileStorage[fileId] || [];
const chunk = [];

request.on("data", (part) => {
chunk.push(part);
}).on("end", () => {
const completeChunk = Buffer.concat(chunk);

if (completeChunk.length !== chunkSize) {
sendBadRequest(response);
return;
}
file[chunkId] = completeChunk;

const fileCompleted = file.filter(chunk => !chunk).length === chunksQuantity;

if (fileCompleted) {
const completeFile = Buffer.concat(file);

if (completeFile.length !== fileSize) {
sendBadRequest(response);
return;
}

const fileStream = fs.createWriteStream(__dirname + '/files/' + fileName);

fileStream.write(completeFile);
fileStream.end();

delete fileStorage[fileId];
}

response.setHeader("Content-Type", "application/json");
response.write(JSON.stringify({status: 200}));
response.end();
});
});

function sendBadRequest(response) {
response.setHeader("Content-Type", "application/json");
response.write(JSON.stringify({status: 400}));
response.end();
}

If during some chunk uploading the server will response with a bad request status (400) we will send it again and rewrite old chunk in the fileStorage.

Calculating progress

We almost done here. The last part is calculating progress of uploading file.

Actually, we can count only percent of uploaded chunks. It is easy, and maybe really quite method, but let’s count all real progress for a science (or for fun at least).

So, while counting progress of fully uploaded chunks isn’t a problem, counting of ‘in progress’ part is a little bit complicated. As an uncompleted chunks on the server, we have to ignore the chunk loading progress which has done with error. I did that with implementing of progress cache — all progress of active connections will be there.

const progressCache = {};
let uploadedSize = 0;

function onProgressCallback({loaded, total}) {
const percent = Math.round(loaded / total * 100 * 100) / 100;
console.log("Uploaded", percent);
}

function onProgress(chunkId, event) {
if (event.type === "progress" || event.type === "error" || event.type === "abort") {
progressCache[chunkId] = event.loaded;
}

if (event.type === "loadend") {
uploadedSize += progressCache[chunkId] || 0;
delete progressCache[chunkId];
}

const inProgress = Object.keys(progressCache).reduce((memo, id) => memo + progressCache[id], 0);
const sentLength = Math.min(uploadedSize + inProgress, file.size);

onProgressCallback({
loaded: sentLength,
total: this.file.size
})
}

As you can see, I also check type of progress event. I do this because error and abort events will write zero in progress and we won’t count cached progress of failed chunk as uploading progress. For that events you have to add a few listeners in upload function:

function upload(chunk, chunkId) {
return new Promise((resolve, reject) => {
const xhr = new XMLHttpRequest();
const progressListener = onProgress.bind(null, chunkId);
xhr.upload.addEventListener("progress", progressListener);

xhr.addEventListener("error", progressListener);
xhr.addEventListener("abort", progressListener);
xhr.addEventListener("loadend", progressListener);

xhr.open("post", "/upload");
/* Not changed code */

And that’s it

So, now you can connect all code above and try it! Or you can just clone my testing implementation from github.

git push origin master

Before you will push that in production I have to warn you about couple of things.

First, that code wasn’t tested in production use. As you can see, I didn’t care about old browsers support (such as Internet Explorer). Actually, you can transpile the code to ES5, but I have been testing it in Chrome and Firefox only, so be careful.

Second, remember that all uncompleted files store in the server memory. Probably, a better idea is to create an empty file for each new fileId and to append every chunk of the file. There is one big problem with that approach — to calculate starting byte properly, but you can solve it with chunk index and common chunk size. And, please, don’t forget that last chunk can be less than others.

Third, you have to add more checks and errors handling. Some of them (but not all) I added in the test repo.

And last, this concept especially good with request queue I was talking about last time. So, if you missed it I recommend to check it out.

Some last words

So now you can make your user’s life a little bit easier. And I hope the article was useful for you too. Thank you for reading!

--

--