Data Subjects and Manure Entrepreneurs

CMU’s Block Center for Technology and Society

Published in

Consequential Podcast

5 min readNov 6, 2019

When it comes to how your data is being used to drive the technologies of the future, you have a seat at the table. It’s just been empty.

Did you know that you played a vital role in the digitization of the entire New York Times archive, the development of Google Maps, and the creation of Amazon’s recommendation engine? That’s right, you!

Listen to the third episode of the Consequential podcast, Data Subjects and Manure Entrepreneurs.

Whether you know it or not, you’ve been part of the expansion of artificial intelligence in society today. When you make choices of what to watch on Netflix or YouTube, you’re informing their recommendation engine. When you interact with Alexa or Siri, you help train their voice recognition software.

If you’ve ever had to confirm your humanity online, then you’re familiar with CAPTCHA.

CAPTCHA stands for “Completely Automated Public Turing test to tell Computers and Humans Apart.” Originally, CAPTCHA would pop up when you were logging into a website, and you’d have to identify letters and numbers that had been warped or obscured in some way. But other than a modicum of added security, it didn’t really accomplish anything useful, and that started to bother one of the early developers of CAPTCHA, Luis von Ahn. Before von Ahn became the co-founder and CEO of the language-learning platform Duolingo, he was a Ph.D. candidate at Carnegie Mellon University, where he developed some of the first CAPTCHAs with his advisor, Manuel Blum. In 2007, von Ahn and a team of computer scientists at Carnegie Mellon established reCAPTCHA, a CAPTCHA-like system that didn’t just spit out a bunch of random letters and numbers — it borrowed text from otherwise hard-to-decipher books, many of which were out-of-print or used complex fonts. So now, instead of just proving you weren’t a robot, you were also helping to digitize Pride and Prejudice and The Adventures of Sherlock Holmes.

The reCAPTCHA system helps to correct over 10 million words each day, allowing people to freely access books and articles online that they may never have had access to before. It’s also responsible for digitizing the entire New York Times archive, from 1851 to the present day. Bravo! You did that!

Now, reCAPTCHA looks a little different. Instead of words, you’ll see pictures, and you’ll be asked which pictures have storefronts in them. Or dogs. Or stop signs. What’s that about? Well, think about it. What kind of computer needs to be able to differentiate a stop sign from a yield sign? That’s right, when you complete a reCAPTCHA, you are part of the future of self-driving cars.

When it comes to making books freely available, it’s easy to see this as a work of altruism for the common good. That’s what Luis von Ahn envisioned: a collective effort on the part of humanity to share knowledge and literature across the world wide web. Wikipedia is another example of open-source labor and collective intelligence, but one in which the labor is voluntary.

In the case of reCAPTCHA, one could make the argument that you were an un-consenting, unpaid laborer in the process. In fact, an unsuccessful 2015 class-action lawsuit did just that. But this isn’t just a financial issue or a labor issue. Your data is an incredibly valuable and ultimately essential resource, and it’s driving more than just autonomous vehicles. In our last post, we discussed how pervasive algorithms have become, from recommending the things we buy and watch to supporting health and hiring decisions. But it’s important to remember that these algorithms didn’t just appear out of nowhere. The algorithms that we use every day could not exist without the data that we passively offer up anytime we click on an advertisement or order a t-shirt or binge that new show everyone’s talking about.

You may feel like you don’t have a seat at the table in all this. But here’s the thing: You have a seat, it’s just been empty.

If these algorithms need our data to function, that means we’re an absolutely necessary part of this process. And that might entitle us to some kind of authority over how our data is being used. In order to define our rights when it comes to our data, we need to define what sort of authority we have.

In our latest episode of the Block Center’s Consequential podcast, we talk to Carnegie Mellon University business ethics professor Tae Wan Kim about the ethics of data capitalism and the question of who really owns data and what rights do we have to our own information?

Our typical understanding of data subjects are that they are consumers — we offer data to Facebook, Facebook offers a service in exchange.

“The bottom line is informed consent,” said Kim. “But the problem is, informed consent assumes that the data is mine and then I transfer the exclusive right to use that data to another company. But it’s not that clear of an issue.”

Professor Kim is interested in a different framework for encouraging data subjects to take a proactive role in this decision-making: data subjects as investors.

In our newest episode, “Data Subjects and Manure Entrepreneurs,” we’ll speak at length with Professor Kim, discuss what privacy means in an age of surveillance, and learn what our online data has in common with horse droppings. We’ll also hear more from University of Pennsylvania professor Kartik Hosanagar — who joined us last week to discuss the pervasiveness of algorithms — to talk about the need for greater digital literacy in the education system.

Data Subjects and Manure Entrepreneurs is available now on Apple Podcasts, Google Podcasts, Spotify, Stitcher or wherever you listen to podcasts!

Data Subjects and Manure Entrepreneurs

When it comes to how your data is being used to drive the technologies of the future, you have a seat at the table. It’s just been empty.

Written by CMU’s Block Center for Technology and Society