Cleaning and Transforming Excel Data in Your Browser With Data Janitor

Martin Drapeau
6 min readJan 13, 2019

--

Illustration credit: Vecteezy

Importing data from Excel is a very common workflow for many applications, especially during on-boarding. For example at Amilia an administrator can import existing clients into their CRM. We provide an Excel template facilitating this process.

The tedious part is building this Excel file. Usually, the admin will have a list of their clients in another Excel file in a different format. This list usually comes from a CSV export from another CRM. The admin will copy, paste and transform the data manually in Excel. The process is long, painful and error prone. Worst of all, it is not reproducible. A subsequent import will be just as painful.

There is a better way. Using CSVJSON’s Data Janitor, any programmer can build a conversion recipe in minutes the admin can use and reuse.

Data Janitor is a point web tool that allows you to clean and transform your Excel data by following these 3 steps:

  1. Copy the data from Excel into Data Janitor
  2. Write and run a JavaScript function to clean and transform the data into another format
  3. Paste back the transformed data in Excel

A programmer will perform these 3 steps in just a few minutes. The admin can then reuse the recipe applying steps 1 and 3.

Data Janitor is a web tool. It runs in your browser so you don’t need to install anything. Its also a point tool meaning there is no log in required to use. You can save and share your session (recipe) for reuse by you or collaborators.

A Practical Example

Let me demonstrate with an import I had to do the other day. It consisted in importing hundreds of players across dozens of teams in 5 different leagues. The source file consists of an Excel file with 5 tabs.

Source Excel file with players to import

The Amilia Excel file for import has a different format. Columns are named and ordered differently, gender values are different, team names are different and players must be presented one per row in a single sheet.

Destination Excel file format

The Import Strategy

I decided to perform 5 different imports, one per tab or league. This would allow the admin to verify that every player was properly imported into the right team. The recipe could be adjusted each time. Now it was a question of building the recipe for the first tab.

Step 1 — Copy and Paste Data From Excel

Data Janitor allows you to simply copy your sheet data from Excel (Ctrl+c), and then pasting it directly in the web page (Ctrl+v). Copy and pasting is much faster than uploading a file.

Data Janitor will detect column heads and ignore empty rows.

Copying sheet data from Excel, and pasting it into Data Janitor.

Step 2 — Write the JavaScript Function

With over 20 million developers in the world, the majority of which have written JavaScript, writing a conversion function is very easy. Data Janitor tries to simplify the process by providing an inline editor with built-in error validation, reporting and even linting.

Data Janitor uses CodeMirror to prodive a built-in JavaScript editor.

Data Janitor asks you to write a process function that returns the transformed data. The process function is passed input, an array of row objects, and a list of columns. Each row object consists of a hash of column name to value. For example, the two first players to import:

[
{
"No": "69",
"Team": "FC LaTchass",
"First": "Felix",
"Last": "Loxy",
"Email": "felix.loxy@example.com",
"Captain": "C",
"Male/Female": "Male"
},
{
"No": "7",
"Team": "FC LaTchass",
"First": "Jean François",
"Last": "Piedmon",
"Email": "jean françois.piedmon@example.com",
"Captain": "",
"Male/Female": "Male"
}
]

The process function is in charge of mapping that array of objects into this:

[
{
"First Name": "Felix",
"Last Name": "Loxy",
"Gender": "M",
"Date of Birth": "2000-01-01",
"Join Date": "2018-01-01",
"email": "felix.loxy@example.com",
"Activity": "Équipes Automne-Hiver 2018-2019::Futsal::Mercredi profutsal::FC LaTchass"
},
{
"First Name": "Jean François",
"Last Name": "Piedmon",
"Gender": "M",
"Date of Birth": "2000-01-01",
"Join Date": "2018-01-01",
"email": "jeanfrançois.piedmon@example.com",
"Activity": "Équipes Automne-Hiver 2018-2019::Futsal::Mercredi profutsal::FC LaTchass"
}
]

Data Janitor will allow you to run interactively the process function and show you the first 2 rows of input and output. Everything is convenient and it makes it fast and fun to code.

In this particular case, mapping rules had to be written for 6 columns with different situations. For example, here’s how we handle the gender column mapping.

Interactive column mapping in Data Janitor

Validating an email is possible via the _.isEmail(email) helper function. It is even possible to throw an error to stop processing. Useful strategy to iteratively clean data. For example in this case we throw an exception when the email is invalid. In the exception we report the row on which the error is found. With the Input inspector, we can navigate to that row and see firsthand the issue.

Throwing an error to stop processing in Data Janitor.

Data Janitor includes helper libraries Moment.js (to parse and convert dates even across different locales), underscore.js (helpers to map data) and underscore.string (helpers to parse strings). Check out the Tips page for many examples on how to handle data cleaning and mapping.

What about security? Data Janitor runs the JavaScript function in a web worker. This serves as an isolated environment making code safe to run.

Step 3 — Paste Data Back into Excel

Going back to the Data tab, Data Janitor will show you the Output data in the form of a table. You can either download a CSV or copy the entire table to the clipboard. Then, in the Excel file, you can simply paste the tabular data.

Copying data from Data Janitor and pasting back into Excel.

At this point the clients are then ready to be imported into the CRM. Steps 1 and 3 can be repeated for multiple imports.

Saving Your Session for Collaboration

By default Data Janitor always saves your session to local storage. This means no data is ever sent to the cloud. Useful for private data. Yet you can leave and come back to where you have left off. You won’t have lost your work.

For collaboration Data Janitor allows you to save your recipe as a session for later. This uploads the session to the server and generates a unique URL you can share with others. For example sharing with an admin so they can reuse the recipe in subsequent imports.

Saving your Data Janitor session for collaboration.

You can try Data Janitor with this specific example by clicking here.

About CSVJSON

CSVJSON is a toolbox of online developer tools I built over the years. Useful for parsing and converting CSV, validating and beautifying JSON and now cleaning and transforming data.

Data Janitor is currently in BETA and continuously being improved. If you have suggestions, please go ahead and open a GitHub issue. I’m looking forward to receiving feedback.

--

--

Martin Drapeau

Founder of Activity Messenger, email and SMS for Sport & Leisure. Former CTO of Amilia. https://www.linkedin.com/in/martin-drapeau