Non-blocking CSV export in the browser with web workers

Published in

Sight Machine

5 min readSep 22, 2018

For our example application, let’s say that you want to view tabular data in an html table. Data is requested by the browser and returned from the server in a nice json format. Our user didn’t have to navigate to a special ftp site or download a separate application. The data appears right in the browser!

Let’s say now that the data has been retrieved and displaying in the browser, the user wants a local copy of that data. Why should data be confined to the chains of the tabular HTML browser world? Information should be free, man. Why bother with another roundtrip request to the backend when you could simply reformat the data you’ve already requested and re-package it nicely? There are many libraries to export javascript and strings to csv files. One popular library is d3.

Assuming you have access to an array of formatted data , you can call the method exportData below:

// import d3 from 'd3';/* Export a csv
 *
 * @param {Array} data - e.g. [{x: 1, y:1}]
 * @method exportData
 * @return {Undefined}
 */
const exportData = (data) => {
  console.log('Initiate blocking csv download');
  const res = d3.csvFormat(data);
  const blob = new Blob([res], { type: 'text/csv;charset=utf-8;' });
  saveFile(blob);
}
/* Take a blob and force browser to click a link and save it from a download path
 *
 * @param {Blob} blob
 * @method saveFile
 * @return {Undefined}
 */
const saveFile = (blob) => {
  const uniqTime = new Date().getTime();
  const filename = `my_file_${uniqTime}`;  if (navigator.msSaveBlob) {
    // IE 10+
    console.info('Starting call for ' + 'ie download');
    const ieFilename = `${filename}.csv`;
    navigator.msSaveBlob(blob, ieFilename);
  } else {
    console.info(`Starting call for html5 download`);
    const link = document.createElement("a");
    if (link.download !== undefined) { // feature detection
        // Browsers that support HTML5 download attribute
        const url = URL.createObjectURL(blob);
        link.setAttribute("href", url);
        link.setAttribute("download", filename);
        link.style.visibility = 'hidden';
        document.body.appendChild(link);
        link.click();
        document.body.removeChild(link);
    }
  }
}

The drawback? When you’re running d3.csvFormat, it blocks the browser!

One click and you’ve frozen your entire browser UI until the process finishes. For small files, it shouldn’t be too noticeable. For larger files, your user will be left confused, frightened, and irritated. This is MY browser and I need it NOW!

Our hapless user only wanted to export data but as a result, the browser has been immobilized until the .csv conversion process has finished.

To get around the browser locking up for this potentially long running process, we can turn to web workers.

To refactor this code with workers, create a worker.js file that has a defined onmessage property with proper callbacks for specific types of message events in a switch statement.

// worker.js
if('function' === typeof importScripts) {importScripts('https://cdnjs.cloudflare.com/ajax/libs/d3/4.8.0/d3.min.js')  onmessage = function(e) {
    const data = e.data;
    const type = data.type;
    const arg = data.arg;    // console.log('Message received from main script');
    switch (type) {
      case 'csvFormat':
        // console.log('Posting message back to main script');        postMessage({
          type: type,
          data: res,
        });        break;
      case 'blobber':
        const blob = new Blob([arg], { type: 'text/csv;charset=utf-8;' });
        postMessage({
          type: type,
          data: blob,
        });
        break;
      default:
        console.error('invalid type passed in');
        break;
    }
  }
}

Once you’ve defined how to handle particular types of messages sent to your worker’s onmessage callback, you can call postMessage outside of the worker context to notify the main thread that you will be posting a message to the worker. Once the worker is done calling its function, it will in turn postMessage back to the main thread. In this example, I passed in arguments for the type of message and its corresponding arg.

// index.jsconst myWorker = new Worker('worker.js');/* This method will use the data passed in and trigger an export
 * with the csv conversion process offloaded to a worker
 *
 * @param {Array} data - e.g. [{x: 1, y: 1}]
 * @method nonBlockingExport
 * @return {Undefined}
 */
const nonBlockingExport = (data) => {
  clickStart = new Date().getTime();
  getCSV(data);
}/* This method will call the worker with a particular type that maps to a callback to format the csv
 *
 * @param {Array} data - e.g. [{x: 1, y: 1}]
 * @method getCSV
 * @return {Undefined}
 */
const getCSV = (data) => {
  console.log('Formatting csv...');
  workerMaker('csvFormat', data);
}/* This method will call the worker with a particular type that maps to a callback to create a blob
 *
 * @param {File} csvFile
 * @method getBlob
 * @return {Undefined}
 */
const getBlob = (csvFile) => {
  console.log('creating blob...');
  workerMaker('blobber', csvFile);
}const workerMaker = (type, arg) => {
  // check if a Worker has been defined before calling postMessage with specified arguments
  if (window.Worker) {
    myWorker.postMessage({type, arg});
  }
}myWorker.onmessage = function(e) {
  console.log('Message received from worker');
  const response = e.data;
  const data = response.data;
  const type = response.type;
  if (type === 'csvFormat') {
    getBlob(data);
  } else if (type === 'blobber') {
    saveFile(data);
  } else {
    console.error('An Invalid type has been passed in');
  }
}

In the newly defined code to offload csv formatting into the worker thread, we’ve defined a few types: blobber and csvFormat. These “keys” specify different functions that are run based on that type so that our worker knows what to do with the arguments it is given.

With this set of functions, our application can now pass data to the worker (freeing up the main thread of this burden) and continue to run unhampered until the worker is done running its callback.

However, there is one drawback: data passed into the worker cannot be re-accessed until the worker is done processing it. Make sure you don’t need to run any other transformations on the data you’ve passed your data to the worker.

Production use case

Here at Sight Machine, we are actually using this web worker based data export method after requesting specific datasets for our data visualization page. Oh and btw, we’re hiring.

Run it for yourself

https://github.com/oshikryu/side-projects/tree/master/csv-webworkers

Conclusion

For long running processes that may lock up the browser, use web workers!

Happy data exporting!

Non-blocking CSV export in the browser with web workers

Written by Ryuta Oshikiri