AI’s New Workforce: The Data-Labelling Industry Spreads Globally

Hundreds of thousands employed in lower-income countries such as India and Philippines

The Financial Times
Financial Times
Published in
5 min readAug 8, 2019

--

Leila Janah, Samasource Founder and CEO, speaks at the Fortune + Time Global Forum 2016. Photo: Elisabetta Villa/Getty Images for TIME

By Madhumita Murgia

On the fringes of the Indian city of Kolkata, in the dusty, crowded neighbourhood of Metiabruz, 460 young women are working at the vanguard of artificial intelligence.

The women, mostly from the local Muslim community, are helping to train computer vision algorithms used in autonomous vehicles and augmented reality systems, for the likes of Amazon, Microsoft, eBay and TripAdvisor.

The all-female centre is one of eight Indian offices operated by iMerit, an India- and US-based data annotation company, whose 2,200 local employees label the oceans of data generated by industries as diverse as manufacturing, medical imaging, autonomous driving, retail, insurance and agriculture.

The operation is part of a growing data-labelling industry that employs hundreds of thousands of workers in lower-income countries including Kenya, India and the Philippines.

Companies such as Figure Eight and Mighty AI, and more traditional IT companies such as Accenture and Wipro, are forming part of a so-called “AI supply chain” that creates algorithms able to interpret material including driving footage, search results and photos for the largest US and European multinationals, including Facebook, Volkswagen and Google.

Today, companies are embracing artificial intelligence as a way to automate decision-making and help drive new business opportunities. The challenge is that the algorithms that underpin the technology are as naive as newborns. They need to be fed millions of labelled examples to teach them to “see”.

For a self-driving car algorithm to be taught the meaning of road signs, or to tell the difference between a child and a fox, hours of footage have to be watched and objects tagged, frame by frame. An hour of video takes eight hours to annotate. In fact, a McKinsey report from 2018 listed data labelling as the biggest obstacle to AI adoption in industry.

According to a January 2019 report by analyst firm Cognilytica, the market for third-party data labelling solutions…

--

--