How Not to Know Ourselves

Platform data do not provide a direct window into human behavior. Rather, they are direct records of how we behave under platforms’ influence.

Published in

Data & Society: Points

5 min readJul 28, 2020

By Angela Xiao Wu, assistant professor at New York University

This blog post comes out of a paper by Angela Xiao Wu and Harsh Taneja that offers a new take on social sciences’ ongoing embrace of platform log data by questioning their measurement conditions. The distinct nature of platform datafication is foregrounded in comparison with the longer tradition of third-party audience measurement.

Illustration by Yichi Liu

Surfing a wave of societal awe and excitement about “Big Data,” platforms formed a habit of releasing “data science” insights on what we search, like, express, purchase, obsess over, attempt to hide, and prefer to forget. These colorful graphics and juicy taglines — most notably from OKCupid and PornHub, whose data lay claims to the quirks and desires of our intimate lives — are always popular novelties to behold, ponder, and reference. If knowing ourselves through platform data is a practice of our age, it is certainly not confined to platforms themselves. Aspiring data scientists, curious programmers, vigilant data journalists, analysts of civic organizations and political campaigns, and (last but not the least) academic social scientists such as myself make up the growing field that is figuring out who we are, what we do, and how we sway in the swathes of platform data.

Such data can be impressive due to their unprecedented granularity and volume, as well as the fact that they are seemingly “unobtrusive” recordings of our activities when no one is watching. These apparent strengths of data for social research are outweighed by a problem in what we call the “measurement conditions”: platform data are platforms’ records of their own behavioral experimentation. Trying to know ourselves through platform data tends to yield partial and contorted accounts of human behavior that conceal platform interventions. Moreover, though increasingly produced by non-corporate actors, such knowledge accounts and narratives tend to be amenable to platform money-making and image-building.

Trying to know ourselves through platform data tends to yield partial and contorted accounts of human behavior that conceal platform interventions.

To be clear, for years many have contested the ascendance of platform data as a staple in quantitative social sciences alongside conventional data collection methods, such as surveys and experiments. These contestations focus on issues about the data’s representativeness, privacy concerns, and precarious access at the mercy of platform companies. The “measurement conditions” problem, however, is entirely different. In our newly published paper, Harsh Taneja and I call for attention to the circumstances under which these data come about: what purpose does the measurement initially serve? As historians have told us, measurement — or converting parts of the social world into quantities according to some enduring instrument — is not an end in itself, but a means for managing events and coordinating actions. Measurement is thus a product of the social and institutional context (i.e., “measurement conditions”) in which it is called upon and carried out.

A closer look at the measurement conditions of platforms allows us to rethink the nature of platform log data: they are essentially “administrative data” that platforms generate to realize their own organizational goals, which go little beyond enlarging advertising income, harvesting intermediary fees, and attracting venture capitals. These companies track user engagements with their platforms to evaluate and showcase “product performance.” Such data analytics are integral to the iterative process whereby platforms tinker with their digital architectures in attempts to shape usage in ways that maximize profits.

In other words, platform log data are not “unobtrusive” recordings of human behavior out in the wild. Rather, their measurement conditions determine that they are accounts of putative user activity — “putative” in a sense that platforms are often incentivized to keep bots and other fake accounts around, because, from their standpoint, it’s always a numbers game with investors, marketers, and the actual, oft-insecure users. With calculated neglect comes calibrated nudges: platform user activity, in the first place, is induced, coaxed, and experimented on by the platform environment. From multilayered graphical organization to complex algorithmic recommendation, it is from all these platform arrangements that user activity arises. Conversely, it is to make decisions about these arrangements that platform companies measure usage.

Thus, it is difficult to tell to what extent the patterns emerging from platform data are about “us,” rather than testimonies to the effects of platform nudges.

Of course, when bulks of platform log data become available for inquisitive parties to crunch, platforms keep the other part of the iterative process — shifting platform arrangements aimed to nudge usage — in the dark. Thus, it is difficult to tell to what extent the patterns emerging from platform data are about “us,” rather than testimonies to the effects of platform nudges. When we are experimental subjects oblivious to platforms’ treatments on us, taking our induced behaviors as “natural” means regarding these platforms as benign, transparent vehicles for our inherent intentions, and thus obscuring their prevailing power.

Consider peeking into our innate preferences (by race, geography, and daily rhythms!) based on “patterns” that emerge from PornHub’s log data, when the site’s visual design, temporal pacing, and content curation is all about eliciting and extending the user’s state of pleasure and pleasure seeking; or using Twitter data to study the insurgent online protests during Occupy Wall Street when, due to unknown algorithmic workings, the very term failed to trend; or using Uber’s rides data to study commuting habits when Uber wields its driving force with strategies, such as price surging under the name of (predicted but unverifiable) high demand; or using YouTube, or more fantastically Netflix data, to discern media preferences when these platforms’ entire business rests on herding sequences of viewing. (Each of these platform strategies have been creatively uncovered by critical scholars.)

…platforms’ intervention in human behavior is at once the center of platform business models and the secret that platforms strive to hide.

When we wind up finding human nature in platform data, we take administrative records from insulated digital experiments as expressions of humanity in our society. The data envelope a platform-shaped hole that may eschew the scrutiny of the most sophisticated computational techniques. Such a data analytic pitfall, increasingly common in data science showcases, journalistic reporting, and academic research, effectively obscures platforms’ intervention in human behavior. And platforms’ intervention in human behavior is at once the center of platform business models and the secret that platforms strive to hide.

What are the human actions and predispositions that initially spark our curiosity? What is the kind of self-knowledge that we would cherish as a foundation for enriching our sociality, our civil and public institutions, and our democratic process? Readily resorting to platform data analytics for such knowledge risks taking platform environments as our entire world. Instead, when dealing with platform data we should aspire to “put the platforms in perspective,” foregrounding rather than obscuring their interventions in how we behave.

In this collective effort, non-corporate critical actors may find useful some of the strategies discussed in our paper.

Angela Xiao Wu is an assistant professor in Media, Culture and Communication at New York University researching information technology, knowledge production, and political cultures.

How Not to Know Ourselves

Platform data do not provide a direct window into human behavior. Rather, they are direct records of how we behave under platforms’ influence.

Written by Angela Xiao Wu