Import Wizard: Now Anyone Can Turn Spreadsheets into Editable Pre-Labels & More🎉

Upload JSONs and CSV Spreadsheets into Ready to Work on Annotations and More. Saving Hundreds of Hours of Coding.

Published in

Diffgram

5 min readMay 31, 2021

Until now, the only way to get pre-labeled data into Diffgram was through a technical integration (API/SDK). This took a lot of time and was a big problem for many teams. This meant that many businesses and people who wanted to add pre-labels weren’t able to, or didn’t do it as often as they would like.

Now we have created this easy to use upload wizard that steps you through everything, one step at a time. It turns what was a previously a very difficult technical process into an easier process, saving your team valuable engineering hours.

The code for this best in class data annotation tooling is open source on github.

Example of Selecting What Annotation “Type” Header is called in your File. For example you maybe call it “my_type”. The same applies to other fields, for example maybe “run_id” is called “run_ref”. The system does the transformations for you — no need to code it.

Common Use Cases for Pre-Labels

All of the following use cases (and more!) are made dramatically more accessible by this new upload wizard.

Pre-Labeling (Annotation): A time savings approach where an AI model runs before humans look at the data. As compared to humans looking at a “raw” image and having to start from scratch. If used effectively can result in as much as an order of magnitude time of labor savings.

How do I know the humans are correct? (Quality Assurance) In this approach, various model run results are compared to human ground truth. When predictions from a normally accurate model is substantially different from the ground truth they can be automatically flagged for human review as potential ground truth errors. Essentially, using the model to debug the human — improving quality, confidence, and saving QA labor.

Model Run Debugging — Dataset Discovery: (Data Science)Where are my runs failing? How do different statistically results actually look on the data? What do the failure cases visually look like? How do different models visually (or at least per sample) compare? This improves the ability to update Label Schema (Ontology) and more.

Today I want to show you how it works! 😊

This wizard visually guides you through the process of adding both your raw media files and your annotations. No coding skills required — follow the prompts and you will be able to do it!

This is really useful if you already have annotations in JSON or CSV format and you just want to get started without writing any code or having to ready docs about the SDK.

First I’ll show a video explaining the usage. Then I’ll unpack it and walkthrough it step by step in this article.

Video Walkthrough of Annotation Upload Wizard. CSV Spreadsheets work similarly.

Note: This example is for one file, but the process is the same even if there were hundreds of files or videos in the batch.

Going Through The Wizard:

The steps of the wizard are pretty simple. I’m going to name them here so that you can get an overview of what to expect when going through the process, but most likely you should be able to figure it out without reading anything at all.

1. Update Existing or Add New Data?

The first step will ask you to upload new data or update existing files. Make sure to have the Diffgram file ID’s handy in your JSON or CSV if you are updating existing files.

2. Selecting a Dataset

Choose or create a new dataset for the upload

3. Choose You Data Source

Either your local machine or a Storage Cloud Provider (S3, GCP or Azure)

4. Add Pre Labeled Data?

If you add pre labeled data you’ll need to provide a JSON or CSV with the annotations data.

5. Add You files

Upload as many as you want, we can handle dozen GB videos and thousands of images!

6. Map You Pre-Labeled Data

Diffgram will ask you to select the keys that correspond to the file’s metadata in your JSON or CSV. Make sure to have the file name, label name, instance type and frame number/sequence number if on video.

7. Map You Spatial Data

Then Diffgram will detect all the types of instances in you JSON file and ask you to map the spatial values to the keys in your JSON file or columns in the CSV file.