Announcing LightTagThe easy way to Annotate text

Tal Perry
6 min readMay 14, 2018

--

Doing large scale text annotations is hard. Often times companies can’t outsource the work due to either regulatory constraints on their data or the expertise required to annotate it. Data science teams end up running annotation projects in-house but the infrastructure and software to run and manage an annotation project just isn’t there.

Until now.

Don’t want to read the whole thing? Watch the video instead

Today we’re proud to announce LightTag in general availability. LightTag is built to address the pains of a modern-day annotation project with a host of the features that modern projects require:

A Great UX

The truth is that annotation is not a glorious job but that doesn’t mean annotators should have anything less than a glorious interface in which to annotate.
LightTag’s annotation interface is both beautiful and optimized to maximize annotator productivity while minimizing subtle biases that a UX can introduce.
LightTag’s interface allows annotation with a mouse or keyboard. Keyboard shortcuts abound to make the annotators work fast and easy.

Annotate Any Language

The fact is that the text most worthy of annotation is often non-standard. Be it English financial jargon, Chinese legalese, Arabic tweets or Hebrew medical records, the text that interests us today does not consistently conform to any model of language.

Easily annotate RTL languages and capture subwords as well as phrases

LightTag makes no assumptions about your data and particularly doesn’t restrict you to a tokenization scheme or vocabulary. Your annotators are free to annotate words, partial words or phrases as necessary.

Annotate With a team

The data requirements of today’s models mean that one person can’t label a dataset by themselves. LightTag provides everything you need to work with a team out of the box. No more rolling your own authentication. Every annotation is attributed to the annotator who made it so you always know who did what and when.

Easily see who created which annotation when

Manage your teams work with ease

Managing a team of annotators used to mean assigning work to an annotator with a list of filenames or document ids. And then keeping tabs on which work was done and which needs to be reassigned.

But it’s 2018 and there is no reason for you to waste your time on that, nor waste your annotators time as they click through ten dialogue boxes on their way to that last piece of work they need to do.

Easily define the work that needs to be done. LightTag will derive and dispatch tasks automatically.

LightTag allows you to define an annotation task and specify how many annotators you’d like to have annotating each example. LightTag will assign work to annotators as they arrive, making sure that you reach your annotation goals in the most efficient way possible.

Manage the quality of your data

Having a high-quality golden source dataset is paramount to evaluating your progress, and having a high-quality training set is the crucial step towards building a model. But data doesn’t come with golden labels and no one is going to tell you if your annotators are doing a good job or not.

Annotator agreement heatmap. We can easily spot Camus as a problematic annotator

LightTag will help you find out fast. By assigning work to the right person at the right time LightTag ensures you have the data you need to measure inter-annotator agreement, the only metric you can use to understand the quality of your annotations.

LightTag’s management dashboard show’s the performance of each annotator.

With LightTag you can discover poorly specified tasks or poor performers at the start of your annotation project instead of two weeks after putting your model into production.

By visualizing which tags are commonly confused we can easily spot problems in our schema specifications before bad data reaches production

Understand your annotation project

Do you know how many annotations your team produces a day and what the variability is? Can you estimate when your current annotation project is set to end? Do you know which of your annotators tend to agree with others and which are below average? Do you know what the average agreement for annotations on your dataset is?

With LightTag you can say answer each of those questions at the click of a button. Our analytics dashboard gives you an overview of the high-level metrics you need to know about your annotation projects, as well as detailed drill-downs into how each individual annotator or label is performing.

Devops free with our hosted solution

Ensuring your team stays productive and efficient means maintaining an operational infrastructure. But why should you put your data science and labeling budget into configuring network ports, debugging server response time or implementing high availability?

With LightTag’s hosted solution you’ll never think of DevOps in your annotation project. You’ll have your own domain (you.lighttag.io) to work from, high availability through 3x server replication, an isolated database and daily backups with 30-day retention. And the best part is you’ll never configure a thing, launch a server or index a database table.

And total control with our on premise solution

But many of our customers deal with data that is too sensitive or regulated to be put on the cloud. LightTag comes in an on-premise version as well that fits into your Kubernetes, Openshift or Docker Swarm cluster. Our concierge onboarding services will ensure you are up and annotating fast with minimum fuss.

Bootstrap your projects with suggestions

LightTag continuously learns from your teams labels and uses a number of machine learning models to provide suggestions drastically increasing your team’s efficiency.

But, you know your domain better than anyone and can leverage that knowledge with LightTag. You can generate suggestions for you annotators from your own dictionaries, regex or models and upload them to LightTag via our REST API.

LightTag will track every acceptance and rejection of a suggestion, and you can register multiple models to quickly evaluate and compare their performance while enlarging your training sets.

Keep Your Metadata with your data

We know your data doesn’t start and end with the text your annotating, your data has context and metadata that should stay together with the text. LightTag enables you to upload arbitrary data, present it to your annotators and have it side by side with your annotations. No linking to other files, no joins and no data loss. Just all of your data and all of your annotations in one place.

Consume your data easily

Forget about complex XML with annotations that need to be joined on raw text. LightTag gives you all of your data, annotations, text, and metadata in one easy to consume JSON. With LightTag you can easily take your annotations into your downstream algorithms in Tensorflow, pyTorch SciKit or wherever else you process your annotations

Ready to Start?

Visit us at www.lighttag.io to find out more.

--

--

Tal Perry

Founder of LightTag.io, platform to annotate text for NLP. Google developer expert in ML. Former NLP@Citi CTO@Superfly