The Quirky Ways AI Researchers Gather Data to Feed Their Algorithms

Here are four of the most creative data collection methods used by experts at the leading annual conference on natural-language processing

MIT Technology Review
MIT Technology Review

--

Photo: Franck V./Unsplash

By Karen Hao

Data is the oil that fuels AI development, and it gives us many of the advances we take for granted: YouTube captions, Spotify music recommendations, those creepy ads that follow you around the Internet.

But when it comes to collecting useful data, AI experts often have to get creative. Take natural-language processing (NLP), a subfield of AI that focuses on teaching computers how to parse human language. At the annual Conference on Empirical Methods in NLP, experts presented a broad range of research that drew on information gathered in some ingenious ways. We’ve summarized four of our favorite projects below.

Spanglish

Among the papers on multilingual NLP this year, Microsoft presented one that focused on processing “code-mixed language” — text or speech that switches fluidly between two languages. Considering that more than half of the world’s population is multilingual, this understudied area is important.

--

--

MIT Technology Review
MIT Technology Review

Reporting on important technologies and innovators since 1899