Building a Background Check for Sources on Social Media

Jon Keegan
Tow Center
Published in
4 min readDec 8, 2017
Participants from The Tow Center’s “Hacking Social Media Sources” event on November 18th got a first look at our pre-release API and database.

In the next few weeks, you may start to see some new visual cues popping up in your news feeds, and search results identifying trustworthy news sources. These new, pro-transparency widgets are an effort by The Trust Project, an international consortium of publishers and platforms, to increase transparency around (and trust in) journalism. Essentially, the project aims to address one of the most stubborn problems in this confusing moment of journalistic distrust, fake news and opportunistic platform exploits: Where did this information come from?

After several years of planning, The Trust Project identified several core indicators of trust around which to build standards. These include some fundamental questions such as “Who reported this?,” providing an opportunity to highlight the author’s expertise and history of work. Other indicators seek to shine light on the origin of a publisher’s financial backing, methodologies adopted and ethical standards, including corrections policies. In addition to these boilerplate questions, there are also indicators for detailing an outlet’s effort to include diverse voices and source news locally.

The first universal, visual “trust indicators” have already started appearing on Facebook, and Google News will follow shortly. They will also begin appearing in the sites and stories of the extensive list of participating outlets over the next few months.

This innovative effort builds on the work of First Draft News (founded by former Tow Center research director Claire Wardle), which recently became part of the Shorenstein Center at the Harvard Kennedy School. First Draft was behind CrossCheck, an ambitious online verification partnership of 37 newsrooms in the United Kingdom and France, focused on actively and collaboratively fact-checking news of the French Presidential election in 2016. In addition to this project, First Draft also publishes a number of verification guides and tip sheets.

Last week, the Tow Center held an event called “Hacking Social Media Sources” at Columbia Journalism School (hosted by my colleague Priyanjana Bengani and I), at which we showcased our early efforts at building more tools to help address these problems. We invited students from journalism, computer science and the Data Science Institute to attend our first hackathon, hoping to introduce our efforts to a diverse group of individuals, get their feedback and get them involved in this project in the nascent stages.

Before joining the Tow Center, I worked on The Wall Street Journal’s “Blue Feed, Red Feed,” which explored partisan filter bubbles on Facebook. While working on the project, I was struck by an important side effect of the filter bubble phenomenon: not only did you never see the stories from the “other side,” you had never even heard of the publishers who were posting those stories. An entire ecosystem exists of influential sites with large audiences that aren’t on your radar. I set out to build a tool to more easily follow Facebook’s own advice and “investigate the source” of stories you may be skeptical about.

My plan involved a simple website where you could paste a link and see a “background check” comprised of all of the public signals available about a source. These signals include: key metadata about their audience on social media platforms like Twitter, Facebook and YouTube; WHOIS domain registration records; unique identifiers used by advertising and analytics tags; publishing platform metadata; RSS feeds; and screenshots of the desktop and mobile versions of the source website.

A screenshot of our (pre-release) editor’s dashboard for pulling public sources from news publishers’ websites.

Much of the technology needed for this site to work (an API to interface with the data, databases, scripts to harvest data from the different platforms, etc.) overlapped with the research Priyanjana was already conducting to create a tool that lets people see who’s reading which local news in each state. Together, we saw the chance to bootstrap much of the data research being done at Tow to build a flexible platform that we will be able to control and expand upon in the future. The API will be made open and accessible to everyone, once it’s more stable.

During our hackathon, we shared our vision for our projects, as well as our code and tools with the participants. By the end of the day, we had new design mockups, a module to interface with Pinterest’s API, and code to fetch and decrypt a site’s SSL certificate, which itself can be a strong signal of a legitimate organization behind a website.

The next steps include rolling out the public-facing website to let users run these “source reports,” and continuing to add to the signals we analyze, including terms of service and privacy policy texts, and author entity extraction.

We hope to release the first public parts for this project early in 2018. To stay up to date on the project, follow us on Twitter:

--

--

Jon Keegan
Tow Center

Investigative Data Reporter at The Markup. Previously: The Tow Center for Digital Journalism; The Wall Street Journal.