Tracking Disinformation by Reading Metadata

Image via Thomas Hawk

by Data & Society Affiliate Amelia Acker

Last week, Twitter announced that it would begin deleting millions of fake accounts that were suspended earlier this summer for abusing the platform’s terms of service. The company has framed the problem as an accuracy issue related to confidence in “follower counts” and says most Twitter users will see a drop in their followers as part of the update. The New York Times reported that the sweep of locked-down accounts could result in up to a 6% total drop in accounts across the platform. Earlier this year, a New York Times investigation revealed the marketplace of “fake followers” that can be bought in batches and used for boosting credibility on celebrities’ and social media influencers’ accounts. Such fraudulent accounts tend to behave in predictable ways, such as tweeting massive amounts of replies, mentions, or links.

While deleting millions of fake followers is an admirable effort by Twitter to clean up the platform, it’s still not clear whether this new focus on artificial follower counts will address the slew of other kinds of automated accounts that spread disinformation and propaganda. Earlier this month, New Knowledge, a data science firm in Austin TX, posted some of their findings on Russian-connected Twitter bots and fake Facebook accounts used to manipulate the public discourse during the 2016 US election. The bots and accounts in question remain active on social media platforms.

Screenshot from @Joe_America1776 Twitter profile in July 2018. His avatar image features a reference to the Q Anon conspiracy theory.

Above, we see @Joe_America1776, or Joey Brooklyn, which joined Twitter in July 2015. Since then, the account has posted over 580,000 times and liked more than 21,500 tweets. At this rate, “Joey” has tweeted more than 500 times every day for three years. Tweets during this period have been full of repetitive memes and videos, many of them forwarding conspiracies like Q Anon or Pizzagate. And yes, Joey Brooklyn is a bot.

New Knowledge found that Joey Brooklyn tweets have been picked up across platforms. They have been re-shared and amplified on Facebook by fake accounts with stolen usernames and profile pictures from real human users. Bots like the Joey Brooklyn account, also called “sockpuppets,” are getting more and more sophisticated. Many bots now have a form of conversational AI; “Joey” even talked back to New Knowledge’s post to claim he wasn’t a bot. As I write, the Joey Brooklyn account is still active and tweeting, so it remains to be seen whether this specific account will be flagged as part of Twitter’s efforts to encourage healthy conversation on the platform.

This screenshot displays how bots like “Joey Brooklyn” use conversational AI to respond to tweets that label them as automated saboteurs.

Over the past few years, this fake account and thousands of others have been detected by social media researchers and tech journalists closely examining metadata such as the rapidity of account activity, follower/audience counts, post timestamps, media content, user bios, and location data. When taken together in context, these metadata features can provide activity frequency signals that are clearly spammy, spotty, formulaic, or suspicious. Accounts like @Joe_America1776 can be judged as hoaxes by carefully reading metadata categories from user activity that are too busy, empty, filled with stolen personal information from elsewhere on platforms, or have been gamed and exploited.

I argue that these methods that leverage platform features, spread disinformation by mimicking human behavior, and create swaths of networked digital traces are a kind of data craft.

I argue that these methods that leverage platform features, spread disinformation by mimicking human behavior, and create swaths of networked digital traces are a kind of data craft. “Craft” as defined by Glenn Adamson is work that is supplemental, material, and skillful (2010: 4). Data craft are practices that create, rely, or even play with the datafication of the human lifeworld by engaging with new computational, algorithmic mechanisms of organization and classification. In the case of sockpuppet accounts like Joey Brooklyn, the account owner plays with a platform’s recommendation algorithms and trending topics by crafting new digital traces that are collected and read as platform activity data. The presence of this false activity data has had a number of unpredictable consequences for social media users and society — from election tampering to shifting political discourse around social debates like immigration reform or racial politics. One way of more fully understanding the data craftwork of disinformation on social media platforms is by reading the metadata just as closely as the algorithms do.

Metadata is used to name, organize, collect, and access all the other data we create while using platforms. Communication infrastructures like the internet or cellular networks are metadata dependent systems — they can’t work without it. In addition to the metadata that platforms display to users externally in interfaces like view counts or engagement metrics, there are swaths of metadata hidden and leveraged internally by platforms, repurposed for new and existing products, sold or given away to third parties.

For example, metadata includes the demographic “buckets” that users are sorted into for advertising technology and personalization tools. Users may be given the option to connect metadata from one platform to another to enrich their experience by, for example, connecting a Spotify or Instagram account to a Tinder profile. As a result of using and engaging with these communication infrastructures, users are having their platform metadata constantly created, combined, and repurposed. While some of this activity is automated and anonymized, it is more often deeply specific and unique to individuals’ everyday activities from shopping to keeping in touch to consuming entertainment and the news. These activity traces of metadata are incredibly personal and powerful when collected and viewed in aggregate. They are what Edward Snowden called “an activity dossier” of our digital lives.

Increasingly, we’re seeing deceptive accounts like @Joey_America1776 craft data to look like real user engagement with conversation online. However, the contextual metadata surrounding sockpuppet accounts reveals intentions, slippages, and noise — which further reveal the truth of automated manipulation. The techniques used by these bots — gaming legacy categories with unstructured metadata, jamming the platform with noise, exploiting filters and advertising technology — are all examples of what Gabriella Coleman calls the “interplay between craft and craftiness” of hacking (2016: 164). That is, deep technical skill is also combined with the clever manipulation of social and technical systems. For example, when we see enormous amounts of tweets, faves or RTs, or when we see switches in the URL from account names and location, we can identify fakes, but we can also see a deliberate crafty intention to “think outside the box” and test the platform as a socio-technical system. Some of these hoaxes and fakes are rather crafty in their ability to circumvent the boundaries of platforms and their terms of service agreements.

When we read metadata that’s been exploited or gamed in social media platforms as data craft, we can decode the signals and noise found in automated disinformation campaigns.

When we read metadata that’s been exploited or gamed in social media platforms as data craft, we can decode the signals and noise found in automated disinformation campaigns. Data craftwork not only gives us insight into the emerging techniques of manipulators, it is also a way of understanding the power structures of platforms themselves, a means of apprehending the currents and flows of personalization algorithms that underwrite the classification mechanisms that now structure our digital lives.

But before we can understand how metadata categories are harnessed and hacked, it’s necessary to have a fuller picture of what platform metadata is, how it is encoded and decoded, and how it is created and collected for use by a range of actors — from technologists and providers, to individual users, to governments, to media manipulators. Currently, there is a range of known manipulation tactics for gaming engagement data. Social media professionals are known to inflate engagement by increasing likes, views, follower counts, and comments for profit. Other efforts at deception involve the amateur manipulation of metadata, which includes the coordinated participation of networked harassment, troll brigades, and sockpuppets.

How are disinformation campaigns bolstered by the metadata features of platforms? How can we track disinformation by reading these metadata for context, craftiness, and intent? Reading metadata that has been gamed, exploited, or changed promises a new method for tracing the spread of misinformation and disinformation campaigns in social media platforms. Moreover, reading metadata can improve data literacy of citizens by providing a way to judge authenticity and deceptive messaging. In the coming months, I’ll be diving further into how researchers, journalists, activists, and users can all read metadata categories that are being gamed by trolls, saboteurs, and disinformation architects.

The critique and historicization of social media platforms must also include the ways in which different groups name and order our data, especially disinformation campaigns that involve gaming metadata. Over the next few months, members of the Media Manipulation Initiative at Data & Society and I will be focusing on reading platform metadata to track disinformation campaigns across social media as a new digital method for researching digital culture and data craft. We will test the claim that: misinformation and disinformation campaigns frequently leverage a variety of platform metadata in order to spread content, gain traction, influence public discourse, and sow discord. From data voids that leverage search engine optimization terms, to source hacking with document forgery collages and keyword squatting, researchers here at Data & Society have already identified data craftwork where metadata categories are emptied, stripped of context, or exploited to amplify reactionary media coverage or move disinformation across platforms. By investigating a number of disinformation case studies where platform metadata is being gamed, stripped, exploited, or glitched, we’ll add to a growing toolbox of digital methods to identify disinformation and manipulation campaigns and data craftwork driven by bots, trolls, sockpuppets, and the ramping up of hyperpartisan news coverage in platforms.


REFERENCES

Adamson, G. (2007). Thinking through craft. Bloomsbury.

Coleman, G. (2016) Hacker. In Peters, B. (Ed.), Digital keywords: a vocabulary of information society and culture. Princeton University Press.

van Dijck, J. (2014). Datafication, dataism and dataveillance: Big Data between scientific paradigm and ideology. Surveillance & Society, 12(2), 197–208.