“Since you are tongue-tied and so loath to speak, 
 In dumb significants proclaim your thoughts.” —Shakespeare, Henry VI, Part 1
“I feel so bad for the millennials. God, they just had their universe handed to them in hashtags.” —Ottessa Moshfegh

The primitive level of user feedback encouraged by online services is a feature, not a bug. It is vastly easier for a computer to make sense out of a “like” or a “⭐⭐⭐⭐⭐” than to parse meaning out of raw text. Yelp’s user reviews are a necessary part of their appeal to restaurant-goers, but Yelp could not exist without the star ratings, which allow for convenient sorting, filtering, and historical analysis over time (for instance, to track whether a restaurant is getting worse). This leads to what I’ll term…

The First Law of Internet Data

In any computational context, explicitly structured data floats to the top.

“Explicitly structured” data is any data that brings with it categories, quantification, and/or rankings. This data is self-contained, not requiring any greater context to be put to use. Data that exists in a structured and quantifiable context — be it the DSM, credit records, Dungeons & Dragons, financial transactions, Amazon product categories, or Facebook profiles — will become more useful and more important to algorithms, and to the people and companies using those algorithms, than unstructured data like text in human language, images, and video.

This law was obscured in the early days of the internet because there was so little explicitly quantified data. Explicitly quantified metadata, like the link graph, which Google exploited so lucratively, underscored that algorithms gravitate toward explicitly quantified data. In other words, the early days of the internet were an aberration. In retrospect, the early internet was the unrepresentative beginning of a process of explicit quantification that has since taken off with the advent of social media platforms like Facebook, Snapchat, Instagram, and Twitter, which are all part of the new norm. This also includes Amazon, eBay, and other companies dealing in explicitly quantified data.

Web 2.0 was not about social per se. Rather, it was about the classification of the social, and more generally the classification of life. Google had vacuumed up all that could be vacuumed out of the unstructured data. The maturation of the web demanded more explicitly organized content that could more easily be analyzed by computers. And the best way to do this at scale was to employ users to create that data.

Explicitly quantified data requires that data be labeled and classified before it can be sorted and ordered. The project of archives like the Library of Congress isn’t sorting the books per se; it’s developing the overarching classification that determines what order the books should be in. No classification, no sorting. Even machine learning fares worse when “unsupervised” — that is, when it is not provided with a preexisting classificatory framework.

The Second Law of Internet Data

For any data set, the classification is more important than what’s being classified.

The conclusions and impact of data analyses more often flow from the classifications under which the data has been gathered than from the data itself. When Facebook groups people together in some category like “beer drinkers” or “fashion enthusiasts,” there isn’t some essential trait to what unifies the people in that group. Like Google’s secret recipe, Facebook’s classification has no actual secret to it. It is just an amalgam of all the individual factors that, when summed, happened to trip the category detector. Whatever it was that caused Facebook to decide I had an African-American “ethnic affinity” (was it my Sun Ra records?), it’s not anything that would clearly cause a human to decide that I have such an affinity.

What’s important, instead, is that such a category exists, because it dictates how I will be treated in the future. The name of the category — whether “African American,” “ethnic minority,” “African descent,” or “black” — is more important than the criteria for the category. Facebook’s learned criteria for these categories would significantly overlap, yet the ultimate classification possesses a distinctly different meaning in each case. But the distinction between criteria is obscured. We never see the criteria, and very frequently this criteria is arbitrary or flat-out wrong. The choice of classification is more important than how the classification is performed.

Written comments on an article don’t give Facebook a lot to go on; it’s too difficult to derive sentiment from the ambiguities of written text.

Here, Facebook and other computational classifiers exacerbate the existing problems of provisional taxonomies. The categories of the DSM dictated more about how a patient population was seen than the underlying characteristics of each individual, because it was the category tallies that made it into the data syntheses. One’s picture of the economy depends more on how unemployment is defined (whether it includes people who’ve stopped looking for a job, part-time workers, temporary workers, etc.) than it does on the raw experiences and opinions of citizens. And your opinion of your own health depends more on whether your weight, diet, and lifestyle are classified into “healthy” or “unhealthy” buckets than it does on the raw statistics themselves. Even the name of a category — “fat” versus “overweight” versus “obese” — carries with it associations that condition how the classification is interpreted.

Some classifications are far more successful and popular than others. The dominant rule of thumb is…

The Third Law of Internet Data

Simpler classifications will tend to defeat more elaborate classifications.

The simplicity of feedback mechanisms (likes, star ratings, etc.) is intentional. Internet services can deal with complicated ontologies when they need to, but business and technical inertia privilege simpler ones. Facebook waited 10 years to add reactions beyond “like” and long resisted the calls for a “dislike” button, forcing their users to like death announcements and political scandals. Facebook preferred a simple bimodal interested/uninterested metric. When Facebook finally decided to appease its users, it added five sentiments to the original like: love, haha, wow, sad, and angry. It is no coincidence that the two negative sentiments are at the end: “sad” and “angry” are more ambiguous than the others. If I express a positive reaction to something, I’m definitely interested in it. If I’m made sad or angry by something, I may still be interested in it, or perhaps I want to avoid it. Those reactions are less useful to Facebook.

Facebook’s six reactions are similar to emoji, in that they allow users to express emotion nonverbally, but they are more useful to Facebook because they comprise a simpler classification than the thousands of emoji. BuzzFeed employs a similar, slightly hipper scheme for the reactions it permits users to post to articles. BuzzFeed’s scheme is tailor-made for market research: content can be surprising, adorable, shocking, funny, etc.

Bloomberg’s Sarah Frier explained how Facebook formulated its new reactions:

Facebook researchers started the project by compiling the most frequent responses people had to posts: “haha,” “LOL,” and “omg so funny” all went in the laughter category, for instance…Then they boiled those categories into six common responses, which Facebook calls Reactions: angry, sad, wow, haha, yay, and love…Yay was ultimately rejected because “it was not universally understood,” says a Facebook spokesperson.

These primitive sentiments, ironically, enable more sophisticated analyses than a more complicated schema would allow — an important reason why simpler classifications tend to defeat more elaborate classifications. Written comments on an article don’t give Facebook a lot to go on; it’s too difficult to derive sentiment from the ambiguities of written text unless the text is as simple as “LOL” or “great.” But a sixfold classification has multiple advantages. Facebook, BuzzFeed, and their kin seek universal and unambiguous sentiments. There is little to no variation in reaction choices across different countries, languages, and cultures.

The sentiments also make it easy to compare posts quantitatively. Users themselves sort articles into “funny,” “happy,” “sad,” “heartwarming,” and “infuriating.” From looking at textual responses, it would be difficult to gauge that “Canada stalls on trade pact” and “pop singer walks off stage” have anything in common, but if they both infuriate users enough to click the “angry” icon, Facebook can detect a commonality. Those classifications permit Facebook to match users’ sentiments with similarly classified articles or try to cheer them up if they’re sad or angry. If reactions to an article are split, Facebook can build subcategories like “funny-heartwarming” and “heartwarming-surprising.” It can track which users react more with anger or laughter and then predict what kinds of content they’ll tend to respond to in the future. Facebook can isolate particularly grumpy people and reduce their exposure to other users so they don’t drag down the Facebook population. And Facebook trains algorithms to make guesses about articles that don’t yet have reactions. Most significantly, even though these particular six reactions are not a default and universal set, Facebook’s choices will reinforce them as a default set, making them more universal through a feedback loop. The more we classify our reactions by that set of six, the more we’ll be conditioned to gauge our emotions in those terms.

The default six smooth out the variations that were observed when Facebook was conducting tests with a far larger set of emotions, all designed by Disney-Pixar’s Matt Jones. The full list included everything from admiration and affirmation to anger, rage, and terror. A simple classification won out. It is both easier to use and more universal — at the expense of cultural and personal variation. Also, to hear researcher Dacher Keltner tell it to Radiolab’s Andrew Zolli, at the expense of happiness:

Countries that expressed the most “happiness” were not actually the happiest in real life. Instead, it was the countries that used the widest array of stickers that did better on various measures of societal health, well-being — even longevity. “It’s not about being the happiest,” Keltner told me, “it’s about being the most emotionally diverse.”

If the restricted set of six reactions has the effect of narrowing emotional diversity, social media and advertising companies view this tradeoff as the necessary cost of gathering better data on users. The restricted emotional language employed by Facebook is a language a computer can understand and manipulate at scale. The simplified language of a core set of emotional reactions bridges the computational-human gap — more successfully than the overcomplicated ad hoc classifications of the DSM did. Instead, these reaction sets are reminiscent of the simpler folk taxonomies of Myers-Briggs, OCEAN, and HEXACO, which also break down complex phenomena into a handful of axes. Facebook’s reactions even approximately map to the Big Five:

Like: Agreeableness
Love: Extroversion
Wow: Openness
Sad: Neuroticism
Angry: Conscientiousness

The odd one out is “haha,” because, as always, laughter eludes easy classification despite being the most universal and nonnegotiable of expressions. Yet for the remaining five, there is an inevitable flattening of cultural differences. Despite Facebook’s empirical research to generalize its six, it’s unlikely that the company is capturing the same sentiments across cultures — rather, it found sentiments that were recognizable by multiple cultures. If the data miners and user profilers get their way, soon enough we will all be loving, wowing, sadding, and angrying in lockstep.

The language of Reactions is a primitive vocabulary of emotion, vastly simpler than our human languages. It is far better suited to computers and computational analysis. When I introduced graphical emoticons into the Messenger client in 1999, I didn’t foresee any of this. Around 2015, I began noticing a change on my Facebook wall. There was less discussion happening. People I knew were far more content to respond to posts with monosyllables like “yeah” or “ugh,” or with simple emoji or Facebook’s six reactions. I caught myself dumbly contributing like this, to my dismay.

I went back and checked posts from 2009 and 2010. I had written in complete sentences, arguments, with multiple paragraphs. The shift was obvious and drastic. Diversity, nuance, and ambiguity had declined. If passions were fervent, and I disagreed with the chorus of “yeahs” or “ughs,” the crowd was far more likely to pounce on me; the same went for any other dissenters. What had happened? These were my friends. But they no longer seemed like the same people. We had been standardized. We were all speaking a different language now. It was the language of Facebook — of computers.


From the book Bitwise: A Life in Code by David Auerbach. Copyright © 2018 by David Auerbach. Published by Pantheon, an imprint of the Knopf Doubleday Publishing Group, a division of Penguin Random House, LLC.