TRI data collection and filtering process

--

A still from our video explaining how to get involved in TRI — see www.tref.ie/get-involved

By Liz Carolan

A few weeks ago we wrote about the challenges we faced in trying to build a database of social media political ads for the referendum on the 8th amendment. This database is now up and running — www.tref.ie/database

We said Plan A was collaboration with the platforms. Unfortunately, they are not in a position to do this. So we are using Plan B: crowdsourcing.

We have a partnership with Who Targets Me, a UK based group of volunteers. They have built a tool that enables crowdsourcing of all the ads shown to Facebook users who install the tool as a Chrome or Firefox plugin. Their privacy policy is available here.

WhoTargetsMe compiles all of the data gathered from individual users of the plugin into a single database. This data contains metadata on all ads — political and commercial (see here for a note on the difference) — such as a link to the ad, the page that placed it, the date it was placed, the number of likes, comments etc.

As we are only concerned with political ads, we filter out the commercial ones.

This filtering is the tricky bit. Our approach is as follows:

  • We download the data as a single CSV file of all ads with country ID of “IE”
  • We filter it using a set of keywords for the referendum, which you can see below
  • We look through the filtered list and remove any ads that we are certain are not political in nature. We leave in content that is borderline — this includes promoted posts by news outlets that relate to the 8th (see why).
  • We ad these to a publicly viewable google sheet, which feeds a display page on out website — www.tref.ie/database
  • This google sheet also feeds a viewer on our website, where the original ads are pulled from Facebook, based on the URL in the database — www.tref.ie/viewer
  • The “interest” column will only show why particular viewers who have installed the plugin can see the ads. Until we have full transparency by the platforms on targeting, this will only give a limited snapshot. See our note here.

For this first iteration, the database is based on downloads by just 40 people. This yielded a little over 1,000 lines of data, from which we identified about 20 ads, 3 of which are sponsored posts by news outlets. The database is here.

We filtered this data using the process described above, but we were also able to go through it manually to make sure we didn’t miss anything. This took several days. As the number of users grows, this will be automated further.

For now, here is the list of keywords we have used — please tell us if we are missing any — either in a comment below, or on twitter or Facebook where we are @Transparentref:

  • 8th (this yields a lot of results! We will need to refine to “8th ref” and some variants)
  • Vote yes
  • Vote no
  • Repeal
  • Abortion
  • Unborn
  • Foetus
  • Pro-choice
  • Pro-life

--

--

Transparent Referendum Initiative

TRI aims to enable an open and honest #8thRef debate, through transparency and scrutiny of targeted, paid political ads on social media.