A Datasaur has hatched!

Ivan Lee
Published in
3 min readMay 15, 2019

In a world rapidly moving towards AI and data-driven solutions, we find ourselves consuming exponentially increasing amounts of data. In order to properly utilize that data, we rely on humans to manually label raw information to feed into an AI algorithm; once provided with human-labeled examples, the algorithm can learn to perform the task on its own. For example, a bank may want to identify entities in a document to extract who signed an agreement and on what date. A media company may want to evaluate key attributes in a news article to identify whether it is valid or fake news.

Behind every AI algorithm are thousands of human-labeled training examples. Organizing and labeling such data today is tedious, time-consuming and expensive. We started Datasaur to create the most intelligent, efficient and productive tools for all your data labeling needs.

The Problem

Businesses today spend far too much time and money labeling their data. Many reinvent the wheel by building their own in-house tools or resort to basic spreadsheets, compromising for an inadequate and error-prone interface. Such ad hoc solutions waste time; “garbage in, garbage out” holds particularly true for AI — mistakes in the training data can result in bizarrely incorrect results that then take even more time to debug.

The other alternative is to work with full-service data labeling partners. Such companies charge premium prices for white-glove service, promising to work with you to receive your data and return it labeled. Unfortunately, cutthroat pricing incentivizes workers to rush and focus on quantity over quality.

More importantly, consumers and regulators alike are starting to take heed. Companies are receiving flack for outsourcing the labeling of your private posts to strangers. We believe large swaths of data labeling will necessarily be brought in-house to trained employees; such employees will need tools to label efficiently.

The Solution

Datasaur stands ready to provide those tools. Our human-centric products enable labelers to work more efficiently, improve the quality of your work and uphold data privacy and security.

Powerful intelligence under the hood helps to provide quality assurance. AI-driven models proactively suggest labels, saving you significant time and money. Labels that do not align with previous tagging behavior or are contextually out of place are highlighted for verification. Project managers can easily set up work to be labeled multiple times in order to guarantee accuracy. Finally, we hold ourselves accountable to the highest data privacy and security standards to ensure your company’s data stays safe.

We believe there is a whole suite of supporting tools we can build to support your AI team, so you don’t have to. We are starting with text, focusing on Natural Language Processing (NLP). But we intend to expand into audio and other adjacent fields in the future. As the need for labeled data grows, our mission is to democratize and simplify the process, allowing you to focus on your AI solution.

Datasaur Status and an Ask for Help

We have been building Datasaur for the past few months and are launching in closed beta in June 2019. We have raised a seed round and built a stellar team equipped to advance our mission.

If you or someone you know is working on text-based AI products, we’d love to invite you to partner with us and take a step toward improving your data labeling process.

Please welcome Datasaur.ai.



Ivan Lee

I enjoy thinking about, designing and building impactful products. I approach life like a game.