Data as Labor — Or How We Must Understand Data to Unleash New Productivity Gains of AI

Nikolaus Lipusch
vencortex®
Published in
10 min readJun 12, 2019
Photo by studiostoks on Shutterstock

Quite recently I came across an article by IMANOL ARRIETA IBARRA, LEONARD GOFF, DIEGO JIMENEZ ´ HERNANDEZ ´, JARON LANIER, AND E. GLEN WEYL”, dealing with the topic of how data is treated by current AI companies and why some of their business models are flawed.

Introducing Two Conflicting Data/ AI Paradigms

To highlight the problems with the current business models of AI companies the article distinguishes between two main paradigms, namely Data as Capital (DaC) and Data as Labor (DaL). Right now, the majority of companies seems to follow a Data as Capital paradigm. But what does this mean?

Data as Capital considers data as natural exhaust from users who consume AI services. This data is collected by AI companies who use it to come up with new services and products. While it can be argued that users are rewarded for their data with free services (although this is not entirely true since users have to pay in form of targeted ads) the biggest payoffs of the collected data accrue to AI companies which use this data to automate existing services or provide new ones. Another characteristic of the DaC paradigm is that it conceives AI as a technology that is going to disrupt industries mainly by automation. One consequence of this is that workers of these industries will be deprived of their source of income. Another consequence of this paradigm is that it leaves humans to find meaning outside the digital economy as it does not consider data work as labor that users should be paid for.

Data as Labor, on the other hand, considers data, not as a byproduct of users that is collected and used by AI services. Rather data is considered as something that is not only created by users but that should also be owned by users. Hence, one major distinction of the DaL paradigm is that the proceeds of data should primarily go to individual users to facilitate higher quality and quantity of data. Also, DaL naturally conceives AI not as a technology that is there to replace humans, but rather as a technology that enhances the productivity of existing laborer’s and creates a new class of “data jobs”. One logical consequence of this paradigm is that it does not precarize data work but that transforms this type of work in a way that allows data workers to retain their dignity. Overall DaL is a paradigm that is necessary to create new more sensible business models that will ultimately yield a functioning data labor market in which data workers will not be reliant on a universal basic income because all their jobs have been taken over by machines.

To sum up:

DaC

- treats data as a natural exhaust collected and used by firms

- channels payoffs from data to AI companies

- sees AI as a technology that will replace workers

- forces data workers to find meaning and dignity beside their data work

DaL

- treats data as a user possession

- channels payments to users, to increase the quality and quantity of data.

- sees ML & AI as a technology to create new jobs as well as productivity gains

- provides data workers with meaning and dignity on the job

Why Should We Treat Data As Labor?

A lot of the problems that we encounter in this age of digitalization can be attributed to the fact that companies treat data as capital. One of the biggest concern with regard to the DaC paradigm is that AI will gradually replace human workers. Thus, in recent years digital companies such as Google and Facebook manage to create tremendous value with only a small fraction of the workforce employed by traditional businesses. This trend is likely to progress only further as these companies handle data as a free good for which workers are not getting remunerated.

Another problem concerns the development and provision of AI solutions itself. The first generation of AI solutions failed to achieve its goals mainly due to a lack of engineers that at that time where needed to hard code the rules of these solutions. With advancements in machine learning (especially the development of self-learning algorithms) this reality has changed. Hence the main bottleneck of today’s AI solutions is data that these new algorithms need to be fed to derive rules and models on their own. Consequently, one of the main factors inhibiting productivity gains of today’s AI solutions is human workers that provide the data that is needed to train machine learning algorithms.

A further problem concerns the type of data that is used by most of today’s AI companies. Since the DaC paradigm advocates a free data model most of the data used by AI companies is consumption-oriented data. As already mentioned above consumption-oriented data is generated by people who use a certain AI service and provide their data for free. While such a model is attractive from a cost perspective it often leads AI companies to neglect productivity related data. Productivity related data is data that occurs mostly within firms and is created by people who expect to be paid for it. Since this data has a cost attached companies in which this data occurs usually do not surrender it for free. One consequence of this is that it leads AI companies to neglect certain domains simply because they are not willing to pay a prize for that data.

Closely related to the aforementioned problem is the question of how we adequately reward people for productivity related data. While platforms such as Amazon Mechanical Turk and Figure Eight were among the first to pay people for labeling and providing data, one problem that remains is that data work is still not remunerated adequately. One result of this is that such platforms do not only attract low wage workers but also low-quality work. However, if we want AI to progress further AI companies must be prepared to price and remunerate data work adequately. Consequently, paying a fair price is the only way of attracting more skilled workers that possess the expertise and knowledge that is necessary to solve more complex and pressing problems.

In the age of artificial intelligence, data must be treated as a form of labor. One of the main reasons for this is that AI is highly dependent on the collective intelligence of human workers. Thus, most AI algorithms need to be fed by human-generated examples to know what the right answers to certain problems are. Hence, if we want AI to tackle more challenging problems the integration of human experts and the creation of adequate incentives for these experts is inevitable.

What Companies Need to Consider If They Want To Adopt A Data As Labor Paradigm?

While the above-cited article largely covers how the adoption of a DaL paradigm may change the potential of AI as well as the socioeconomic reality of data workers it leaves out what companies need to consider if they want to employ such a paradigm.

#1 Change the Narrative of Data Work

First of all, it is important to communicate to people how important their data contributions are (this can be the provision of data, the labeling of data or the training of models). By doing so it is important to make clear that it is not the ultimate goal to replace them but that their work is actually needed to make AI work and evolve over time.

One way to achieve this is to transparently communicate to data workers which data you need from them, why you collect it and what you intend to do with the collected data. There is plenty of cases in which companies are not transparent about their data strategy. The result of such a lack of transparency is in the best-case low-quality data. In the worst case, such practices can lead to widespread backlashes that result in companies losing their customers and ultimately their credibility (see Facebooks Cambridge Analytica scandal). Hence, the single most important thing companies need to consider if they want to employ a DaL paradigm is to be transparent about their data project and to change the narrative from “we need data to develop AI that automates processes” to “we need people to power AI and make ML algorithms work”. Hence the only way AI can progress further (i.e., solving more complex problems) is if companies are willing to change the narrative of data work so as to get more people to provide better-quality data.

#2 Design a Meaningful Data Work Process

The second point AI companies need to consider when they want to employ a DaL paradigm is to create data work processes that people perceive as meaningful. For data work to be perceived as meaningful, it has to fulfill three conditions i.e. the work must be engaging, people must be in control of the work outcome and people need to be continuously involved.

Hence, in order for data work to be perceived as meaningful AI companies have to ensure that it is engaging. This means that people need to enjoy the task and that the task should not become dull over time. While this is probably one of the biggest challenges for AI companies there are some companies that have found ways to resolve this problem. The most popular example is Foldit which is a company who manages to involve people in the generation of data on new protein structures. To engage people in this process Foldit relies on a gaming mechanism that works like a puzzle. This means people can try out different protein structures. The most efficient protein structures receive the highest scores.

Another way to get workers to perceive data work as meaningful is to make sure that they are in control of the outcome and feel responsible for their work. One way of doing this is to provide workers with regular feedback on their work (e.g., feedback on their data labeling accuracy) based on which they can improve. Furthermore, the process should be also designed in a way that the workers are motivated to continuously improve the outcome of their work (one possible way to achieve this is through employing an adequate incentive structure see #3). Another way to make workers feel responsible for their data work is to engage them in projects that they feel a strong obligation towards. For example, the best way to involve medics in data work is to let them contribute to data projects that aim to solve a medical problem or challenge. Hence, if you provide people with the opportunity to solve an interesting and challenging problem (i.e., usually a problem with high stakes) in their domain of expertise they will be naturally motivated to solve it (this of course presupposes that you don’t give them the feeling that they will be replaced by the solution).

The last point AI companies must consider for data work to be perceived as meaningful is to make sure that people are involved continuously in a data work process. The main rationale behind this thought is that people develop strong feelings towards things they do on a continuous basis(ever heard of you become what you do?). Hence for people to identify and attribute meaning to a certain work the work has to become habitual. Continuous engagement of workers is also important against the background that knowledge changes over time. This means what’s right today might not be right tomorrow. As a result of this AI needs to be constantly fed new knowledge and trained continuously.

#3 Create Adequate Incentives

The third point AI companies need to consider when they want to employ a DaL paradigm is come up with adequate incentive strategies. This means in order to obtain good quality data, AI companies must be prepared to pay something. A popular example is Amazon Mechanical Turk (AMT) which pays its workers for searching, extracting and labeling data. While AMT employs a very basic incentive model (basically a payment model that factors in time and cost), there is still a lot of room for more flexible models. Hence, future incentive models should also consider the inclusion of non-material rewards. One example is social rewards that allow users to project a certain self-image towards a community or a network of other experts. Social rewards can thus include status attainments, such as virtual badges and group status or social capital, such as a reputation as an expert. Another example is to reward users through personal utility. One way to implement this is to allow users to “quantify themselves” (e.g., allowing them to track their progress). Another way might be giving them the possibility to train a personalized AI model for the purpose of creating a digital twin or a personalized recommendation engine.

Why AI Companies Should Bother?

Finally, one might ask why companies, such as Google that are highly successful in the market should change their business model. The reason is very simple. For AI to live up to its full potential it is important to get more (this also means more diverse) and better data. This is especially important if we want AI to solve more important and complex problems. One example is the domain of healthcare where room for mistakes is very little and hence the knowledge of experts is highly valuable. Since the time of experts is constrained and costly, we need to remunerate them adequately if we want them to get involved in data work. While this higher cost of integrating experts might squeeze some the profits of AI companies, companies should not oversee that this loss is likely to be offset by giving them the possibility to address new problem domains and enter new markets. Hence, if done right DaL bears not only the potential to improve AI but also to distribute the wealth created by AI more evenly.

We at vencortex have understood that data workers are the key to leveraging the power of data. As a result of this, we employ a special technology called hybrid intelligence. Hybrid intelligence denotes the ability to accomplish complex goals by combining human and artificial intelligence to collectively achieve superior results and continuously improve by learning from each other. One prerequisite for hybrid intelligence to work is to create an environment in which users are motivated to share their expertise and knowledge and get rewarded accordingly. An important precondition to achieve this goal is to conceive of Data as Labor.

--

--

Nikolaus Lipusch
vencortex®

Austrian born researcher with interest in AI, Decentralized Ecosystems and Token Economies, Lead Token Engineer @vencortex