Data Imports — Forget Upload, Use Copy and Paste

As a developer, whatever industry you’re in, you will work on data imports more than once in your career. It often goes something like this…

Megan (project manager)

We’ve got Joanna, new big and important customer with 5,000 records of existing data. We need to import them into our system.

Sandeep (developer)

Just use the import tool. We accept Excel and CSV. Here’s a template.

Megan

Can you help? Last time a customer used the import tool it created duplicates in the database and it took 2 days to clean.

Sandeep

Like I said before, it was the customer’s fault. Anyhow, can’t someone else do it? We are working on an important feature right now. Besides, working on the import is boring.

From experience, and talking to fellow colleagues, import tools are often painful, and frankly, boring to develop and work on. Parsing Excel in the back-end is not fun at all.

Fortunately, Joanna has a very powerful computer on her desk. In addition, she is using a very powerful and modern browser. Lots can be done in JavaScript on her computer before anything is ever sent to the server. Here are 3 strategies you can use to change how you develop data importers.

1. Leverage the Clipboard

What if Joanna could just copy a full worksheet of data from Excel, and paste it directly in your web page? Guess what, with the Clipboard API, this is not only possible — but really trivial to do.

When data copied from Excel onto the clipboard, it gets transformed into tab-separated-value text. With this code, you can capture the paste event on your web page and transform the data into an array of arrays.

function onPaste(e) {
if (!e.originalEvent.clipboardData ||
!e.originalEvent.clipboardData.items) return;
  var items = e.originalEvent.clipboardData.items;
var data;
for (var i = 0; i < items.length; i++) {
if (items[i].type == 'text/plain') {
data = items[i];
break;
}
}
if (!data) return;
  data.getAsString(function(text) {
text = text.replace(/\r/g, '').trim('\n');
var rowsOfText = text.split('\n');
var header = [];
var rows = [];
rowsOfText.forEach(function(rowAsText) {
// Remove wrapping double quotes
var row = rowAsText.split('\t').map(function(colAsText) {
return colAsText.trim().replace(/^"(.*)"$/, '$1');
});
// The first row containing data is assumed to be the header
if (header.length == 0) {
// Remove empty columns
while (row.length && !row[row.length-1].trim()) row.pop();
if (row.length == 0) return;
header = row;
} else {
rows.push(row.slice(0, header.length));
}
});
buildTable(header, rows);
});
}
$(document).on('paste', onPaste);

At this point, it is easy to convert the data into an HTML table. Building on the previous example, this function would create the HTML table content (assuming the table already exists in the DOM):

function buildTable(header, rows) {
var $thead = $('table>thead').empty();
var $tbody = $('table>tbody').empty();
  var $tr = $('<tr></tr>').appendTo($thead);
header.forEach(function(value) {
$tr.append('<th>' + value + '</th>');
});
  rows.forEach(function(row) {
var $tr = $('<tr></tr>').appendTo($tbody);
row.forEach(function(value) {
$tr.append('<td>' + value + '</td>');
});
});
}

See a live example and the full code here: https://martindrapeau.github.io/excel-clipboard

Copy/paste is a lot faster than uploading a file. Think about the time Joanna will save. In addition, an upload usually involves sending the file to the server and back. Why go through that hassle?

And if you’re worried about creating too many DOM nodes, you can paginate the data. Only show a subset of rows at a time. For example, have a look at CSVJSON DataClean, it offers a good example of copy/pasting data from Excel or Google Sheets into a paginated table. Give it a try with your own data sets.

CSVJSON Data Clean copy/paste of Google Sheets data in a paginated HTML table.

2. Parse and Validate Data in the Browser

In JavaScript you can parse and validate the data. You can then transform it into something your back-end better understands — into an array of JSON objects your API can accept for example.

If errors are found, you can show them to the user with instructions on how to fix them. You avoid the round trip to your server and the extra processing load. Use Joanna’s computer processing power instead.

Here’s an example of what we built at Amilia to help import team schedules. Notice how fast it is to identify and fix an error using copy paste. Not having to save and upload a file is a real time saver.

Validating imported data in the browser and providing feedback. Copy/paste makes it fast to correct the data and import again.

3. Leverage and Extend Your API

If you have existing APIs to fetch data, you can leverage them to match data. For example, if you are importing new accounts you can ask your API to see if they already exist and thus avoid importing duplicates. Tell the user right there and then in the browser there is a problem.

Taking the previous example, we are able to pull existing match events from the API and match them against what the user input and report the error there and then for the user to correct.

API calls are made to validate user imported data.

If you don’t have the required endpoints, this is an opportunity to create them. It is a much better investment to extend your API over writing a one-purpose back-end function to parse data. Save Sandeep the trouble of maintaining that and get him working on coding API endpoints instead. He’ll thank you and so will your integration partners.

At this point, you’re ready to send the data to the back-end to do the actual import (using another API call). At least now, both developer Sandeep and customer Joanna will have confidence the data is clean. This time around project manager Megan should have peace of mind.