This image from an interactive data visualization based on study results (see below for more information)

Trolls on Twitter: How Mainstream and Local News Outlets Were Used to Drive a Polarized News Agenda

Fake Accounts, Real News?

Jonathan Albright
Berkman Klein Center Collection
7 min readFeb 15, 2018

--

Last December, I volunteered to help Ben Popken at NBC News with a large Twitter data analysis project, an effort that ended in the surfacing of David Allen’s excellent thematic analysis in the news story below. The larger underlying data set has since been shared publicly on NBC.com.

With my smaller, but no less interesting set of 36,500 “troll tweets,” I wanted to look further into the external, or outbound, links to understand more about:

  1. The news sources that these troll (fake) accounts linked to and (re)tweeted in the six months leading up to the U.S. presidential election;
  2. When and how these news-related links were being shared, and the types of news stories, linking strategies, and agendas that were pushed; and
  3. How the wide range of trackers, referral codes, and url “wrappers” found in this data set (I expanded thousands of links) might be involved.

There were some interesting discoveries along the way. One such finding after resolving more than 2500 Bitly links involved still-open Trump campaign donation and press release link sharing statistics :

<left: http://bitly.com/info/2dMMBsa and right:> http://bitly.com/2dvfUPV

In the end, I found more than 16,000 outbound “t.co” URLs in the larger set of 36,500 tweets. Several thousand links pointed to Youtube (many lead to videos which have since been removed). After a data cleaning effort, I was able to isolate more than 11,000 outbound links that pointed toward news organizations, regional news outlets, blogs and Wordpress “micro-sites,” and other news-related media.

Once I unpacked the thousands of short URLs from services such as Bitly, fb.me, ow.ly, and trib.al, I went through the entire set of data and catalogued (coded) every news property, blog, news outlet, or if unclear and/or difficult to identify, the page title. I’ll admit, this was a huge project, so it might have a typo here or there, or a few rogue links.

I shared the result from this effort as a .csv file on data.world.

Troll News Linking Results

The chart below is the top-line breakdown of where these 11-plus thousand external links in my set of 36.5k troll tweets from 2016 pointed to. This includes the expanded short URLs and redirects. This shows the news outlets the troll accounts (through tweeting, retweeting, and tweet-quoting) tended to re-broadcast from the middle of 2016 through election day:

Top 25 most-linked news sources across 11.5k troll tweets (using thousands of expanded short links)

Looking at this breakdown, a result from this sample of tens of thousands of tweets is that the most-shared news outlets from 11.5k links across 388 troll accounts in the six months leading up to the election isn’t your typical hyper-partisan “fake news.”

Sure, Breitbart ranks first, but it’s followed by a long list of what many would argue are credible — if not mainstreamnews organizations, as well a surprising number of local and regional news outlets.

Another result from this analysis is the effect of “regional” troll accounts, aka the fake accounts with a city or region name in the handle (e.g., HoustonTopNews, DailySanFran, OnlineCleveland), which showed a pattern of systematically re-broadcasting local news outlets’ stories.

The linking pattern is also consistent: a large number of story links are Bitly-wrapped, and links to local outlets often originate through RSS or Google Feedproxy — to some degree co-opting local outlets’ content streams in an attempt to establish themselves and connect with local audiences.

Example of observed linking pattern to “regional” troll account

So — now we have fake regionaltroll accounts consistently tweeting not only real, but local news. This pushes back against some of the more established narratives around hyper-partisan media and “fake news.” Most of these troll accounts are situated in cities with a history of systemic social and racial issues.

I’ve argued in the past that we can’t move towards solutions too quickly, and I feel this study is a showcase example. Only once we have the data to reconstruct the past (election) can we start preparing for the future. If the data to discover these kinds of insights remains locked in tech companies’ platforms, app walled gardens, or trapped in legislative committees, then observations like how local and regional media carried some of the burden (and success) in IRA troll accounts’ efforts to target local communities — might never be realized. Or perhaps worse — it’s realized too late.

The best way to present this large set of data is through a visualization on Tableau. The results from this visual analysis are also interactive, so you can scroll down, hover over, and click on ALL data points:

Static image of tab in my data viz on Tableau showing news linking patters over time by outlet/organization
Static image on tab in my data viz on Tableau showing overall trends in tweeting from 388 troll accounts

I’ll attempt to summarize the main themes in this write-up, but the broader linking patterns of the troll accounts show three initial things:

  1. Trolls are using real news — and in particular local news — to drive reactionary news coverage, set the daily news agenda, and target local journalists and community influencers to follow certain stories.
  2. At certain points, such as the week following Hillary Clinton’s 9/11 illness in 2016, all the categories of trolls — meaning accounts focused on local, BLM, and the far-right hyper-partisan accounts— come together to push out the full gamut of polarizing news coverage. Specifically, during the time around 9/15/16 — 9/18/16, (see the Tableau image above with the volume peaks) many links to stories that were broadcast originated from Breitbart, The Daily Caller, and The Gateway Pundit. Yet a significant portion also came from outlets such as The Hill, The Washington Post, as well as the full gamut of regional “local” media.
  3. There is a clear pattern of setting up accounts to amplify “real” local news coverage in American cities struggling with racial, class, and structural social divides. The linking activity show a concerted effort in cities like Chicago, Houston, St. Louis, Kansas City, Baton Rouge, and New Orleans. In most cases, the tweets in the data set were categorized as “original tweets” but these tweets often appear like a retweet. Some links also appeared to come directly out of local news organizations own RSS feeds. The use of Bitly, regular RSS feeds, (Google) Feedproxy, and Trendolizer were prominent themes in the local linkage patterns.

The media coverage IRA troll accounts received was touched upon in a November Recode piece that described how many of these fake Twitter profiles received earned media coverage (ie, free PR) in McClatchy’s regional outlets as well as the Washington Post. In essence, the local media results might represent the other half of the equation, hinting that part of the surprising outcome from Recode’s story might have been the result of these fake accounts repeatedly pushing certain news outlets’ stories.

Top 10 Linked Sources for the Fake Regional/Local Troll Accounts

The obviously BLM and Black Matters accounts (see chart below) show a similar linking pattern; news stories tend to concentrate on the troll websites Black To Live and BlackmattersUS (a site that used Facebook Custom Audiences code during the 2016 election), a group of New York City market-based media, trending sites like Fusion and Raw Story, and the Washington Post.

Top 10 Linked Sources for identified fake Black Matters/BLM troll accounts

On the whole, there was a somewhat cryptic pattern in outgoing links. There was a range of different services (some might say “shady”) URL shorteners and link feeding (Feedproxy) services in the data. Whether these come from the news outlets themselves or have been added on by the troll accounts cannot be fully known based on this data. I will say that a couple shortening services I chose not to resolve links through for security reasons.

Hundreds of these t.co link out to news organization first travel through Facebook via the fb.me shortener. And despite excluding this from my organizationally focused news linking analysis, thousands of links also went to YouTube. (I did find that many of the YouTube links lead to now-deleted videos or content that has been removed due to copyright claims)

This is the kind of data that compels us to rethink how we understand Twitter — and what I feel are more influential platforms for reaching regular people that include Facebook, Instagram, Google, and Tumblr, as well as understand ad tech tracking and RSS feed-harvesting as part of the greater propaganda ecosystem. Thanks to Ben Popken at NBC for sharing the larger data set. I hope this emboldens the cause for platforms to take responsibility and for legislators to push to get regular (unfunded and not hand-picked researchers) with the data we need to better understand the past — so we can prepare for the future.

--

--

Jonathan Albright
Berkman Klein Center Collection

Professor/researcher. Award-nominated data journalist. Media, data, & tech frmly #columbiajournalism #towcenter #berkmanklein #elonuniversity