I think the article lacks structure, in the third paragraph you promise " would like to argue that nowadays specializing in a certain domain allows you to easily cross over to other domains"

You seem to go ahead and illustrate that with a series of anecdotes, but you don't tie any of the anecdotes to your point explicitly and it is not clear whether each of them illustrates a subtly different point.

Section headings would be very helpful for the reader.

A summary section after the Example(s) would help tie the point.

Within the anecdotes, most of them illustrate the…

A checklist of things that can go wrong and how to fix them

It’s better to anticipate and fix errors before they reach production

Labeling Data for NLP, like flying a plane, is one something that looks easy at first glance but can go subtly wrong in strange and wonderful ways. Knowing what can go wrong and why are good first steps to detecting and fixing the errors.

Labeling Data for NLP is one something that looks easy at first glance but can go subtly wrong in strange and wonderful ways. Knowing what can go wrong and why are good first steps to detecting and fixing the errors.

At LightTag, we make text annotation tools and work closely with our customers to understand what is happening in their annotation efforts and how our product can help them get labeled data faster and with higher quality. …

This year I founded a company, LightTag. In 2018 LightTag’s business performance was much better than I expected but much worse than it could have been. Being the CEO, it’s been my job to reflect on why and figure out what to improve for 2019.

Me, Often

LightTag is a bootstrapped company, and the things that went right or wrong, are very much the things that I did right or wrong. This examination is about my actions, what led to them and how they’ve evolved.

What is LightTag

These lessons are independent of what LightTag is and does, but if you’re curious… LightTag provides tools…

This Blog Comes With Code

If you don’t care for what I have to say (and why should you really ? ) then the code is here

When Geeks Go On Vacation

A few years ago I wrote a post about deep learning the stock market. It got a lot of traction, but I think more do to it being well written and clickbaity than actually very insightful.

Since then, I’ve founded a company in the NLP space, moved countries and had a child. I’ve been happily consumed with building a business and changing diapers so finance has taken a back seat. However, we went on a long vacation and…

Doing large scale text annotations is hard. Often times companies can’t outsource the work due to either regulatory constraints on their data or the expertise required to annotate it. Data science teams end up running annotation projects in-house but the infrastructure and software to run and manage an annotation project just isn’t there.

Until now.

Don’t want to read the whole thing? Watch the video instead

Today we’re proud to announce LightTag in general availability. LightTag is built to address the pains of a modern-day annotation project with a host of the features that modern projects require:

A Great UX

The truth is…

Labeled data has become paramount to the success of many business ventures and research projects. But obtaining labeled data remains a costly exercise. Active Learning is a technique that promises to make obtaining labeled data more efficient and has recently been hyped by a number of companies.

At LightTag we provide our customers with a platform to execute and manage large scale annotation projects. Our basic interface looks like this:

Using LightTag to label hardware attributes from Reddit’s /r/hardwareswap

Increasing our customers labeling efficiency is literally the reason we go to work. To that end, our system learns from customers as they label and provides suggestions.

Deep learning has made NLP easier by providing us with algorithms that can operate on arbitrary sequences. While the algorithms are crystal clear and many implementations are widely available, getting your data into them is often opaque, tedious and frustrating. Often, its the part of the job that makes me feel like this:

Getting Text Into a Deep learning framework

This post will discuss consuming text in Tensorflow with the Dataset API, which makes things almost easy. To illustrate the ideas in this post, I’ve uploaded a repo with an implementation of the end to end process described here. It contains a model that reads a verse…

Andrej Karpathy, director of AI at Tesla, recently wrote a blog post “Software 2.0” where he said:

It turns out that a large portion of real-world problems have the property that it is significantly easier to collect the data (…) than to explicitly write the program. A large portion of programmers of tomorrow do not maintain complex software repositories, write intricate programs, or analyze their running times. They collect, clean, manipulate, label, analyze and visualize data that feeds neural networks.

I whole heartedly agree and want to elaborate on how “deep learning” has brought this shift about, and what it…


  • RNNS work great for text but convolutions can do it faster
  • Any part of a sentence can influence the semantics of a word. For that reason we want our network to see the entire input at once
  • Getting that big a receptive can make gradients vanish and our networks fail
  • We can solve the vanishing gradient problem with DenseNets or Dilated Convolutions
  • Sometimes we need to generate text. We can use “deconvolutions” to generate arbitrarily long outputs.


Over the last three years, the field of NLP has gone through a huge revolution thanks to deep learning. The leader of this…

Update 25.1.17 — Took me a while but here is an ipython notebook with a rough implementation

In the past few months I’ve been fascinated with “Deep Learning”, especially its applications to language and text. I’ve spent the bulk of my career in financial technologies, mostly in algorithmic trading and alternative data services. You can see where this is going.

I wrote this to get my ideas straight in my head. While I’ve become a “Deep Learning” enthusiast, I don’t have too many opportunities to brain dump an idea in most of its messy glory. I think that a decent…

Tal Perry

Founder of LightTag.io, platform to annotate text for NLP. Google developer expert in ML. Former NLP@Citi CTO@Superfly

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store