Big files and SHA-1 computation in the browser

szydan
3 min readAug 20, 2016

--

The usual conversation with a customer. Lets call him Bob.

Bob: I would like you to implement client side SHA-1 computation. Should be easy to do. Just use the new shiny HTML5 crypto API and you are done.

Me: Is this the only requirement ?

Bob: Hmm let me think. Well if you could do this nice drag and drop thing so users do not have to click the ugly button, but instead they just drag and drop a file.

Me: Anything else ? What about the size of these files ?

Bob: Well anything from few bytes to few gigabytes. Ahh good that you asked this question I’m thinking now that if the file is big it would be nice to show a progress bar. Yeah definitely we need a progress bar. That would be it, let me know when you have a prototype ready.

Bob: Ahhh one last thing . It has to be fast. We do not want to force our users to go and make themselves a cup of tea while waiting for the sha. Yeah definitely this is very important. Please make it fast.

Nice conversation, easy task, piece of cake, lets start I will use the shiny HTML5 crypto “thing” and be done with it in no time.

A bit later after testing the native crypto APIs…

Cool the native functions are fast. Really fast very good but …

Problem 1

The HTML5 crypto API does not support any callbacks or anything to report the progress. Ok I’ll just cheat and display the progress when reading the file.

Problem 2

Nope. This was a bad idea. The only parameters you can pass to the digest method are algo and buffer. As you need to load the whole data into the buffer there is an extra bonus — forget about large files as your browser will crash before you even get the chance to use the API.

So no progress and no big files using the native API. What about some libraries. Sure there must be tons of good libraries which will support what I want.

Few libraries and hours later...

Problems 3,4,5

Yep there are libraries and some of them work, but not all have a progress reporting. Some of these which can report progress do not seem to support big files. Some which seems to support big files and progress somehow report a wrong sha. Others which support both are terribly slow.

Another few libraries and hours later…

Ok lets take the best one I can find and adapt it a bit. And the winner is: rusha.js. This library is crazy fast but it lacks the progress reporting and big files support. After a few hours I’ve got something that worked, well not quite worked but I was able to calculate my first SHA-1. It took a few iteration to match the Rusha speed and another few to correctly handle some edge cases and report the progress.

Chunksha

The results is called chunksha, and at the moment it can:

  • calculate SHA-1
  • report the progress
  • handle really big file

The picture below is a screenshot of the benchmark graph done when comparing versions of chunksha to the native browser and rusha.js implementation

The green line is a current version of chunksha. It matches the speed of rusha.js and handle files well above 20GB in size.

Wrap-up

It was a nice experience to remind myself about hash functions. It also reminded me the old truth that things are almost never as easy as they seem in the first place.

The native crypto.digest API could be done in a bit better way to support
big files and progress report.

The SHA-1 is really deprecated these days so it would be nice to extend chunksha to support SHA-2 family. The code is available on github and here is a demo page here where you can just drag and drop a file (All computation is done client side and your file is never transferred to any server). Any comments/PRs/bug reports are welcome.

Would like to thanks the author of rusha.js for doing his implementation which is very fast and well written.

--

--

szydan

Software Engineer, Doer. Interested in bitcoin, arduino. Currently building http://siren.io