Testing Facebook algorithm in an electoral campaign (methodology)

Tracking Exposed
Mar 2, 2018 · 3 min read

We wish to observe the social media from many users point of observation: but pretty few knows about us, so far, therefore, few install the browser extension, and therefore we can’t get known showing our results: impasse.

After the first call-for-contributors last year in the Netherlands, we also observe, compare real users is not optimal: they are so diverse you can’t really do an honest comparison.

And by the way, the goal is not to get millions of users, just to show the algorithm impact!

And to tell such story, using Facebook users under our control, seem to be an optimal solution:

  • no risk of privacy leaks: we follow only public sources, and they are not real users
  • Fewer variables to keep in account. When we compared real users, the complexity of the Facebook informative experience is bound to too many variables.

With us in control of the profiles, we can manage precisely what they follow, how they are polarized, when they access, how much they see, the friend they have (zero). Give a welcome to the 6 profiles (avatar? persona? bot?) you’ll see in these blogposts:

Image for post
Image for post

We pick six pages for every political orientation (5), the 30 pages followed by all of them, just to be sure it is clear, this is a picture:

Image for post
Image for post

The volunteers of our team keep the profiles well polarized, liking also some of the post the page associated with their polarization, published.

This will train the algorithm and the posts presented to our puppet-users diverge quite rapidly.

The bots are using an autoscroller, a simple tool that at the same hour of the day refresh the newsfeed and make the computer scroll down, thus, collecting the posts. Every access is called a timeline, and we scheduled to get 13 timelines per bot per day, we begin the 10 of January 2018:

Image for post
Image for post
Note: 10 to 31 is from January. Number different than 13 means that day the bot didn’t work well or we did something manual. We improved expertise in this, but a reliably way to orchestrate web scrolling is not yet found. when the entry is empty, means the computer was not working at all, 0 timeline. Any suggestion for a more reliable way?

Impressions are the elements composing this dataset, because sometime, 13 timelines do not means the autoscroll worked perfectly, looking with more details, then:

Image for post
Image for post
Antonietta between 27 of January and 7 of February didn’t perform well, a technical fault we should keep in mind to ignore any additional bias.

On average, we got 47 impressions per timeline:

And a every impression in a timeline has recorded the impression order, how the posts appeared on the profile timeline:

Image for post
Image for post

The permaLink permits to trace back the post appeared. Above we are seeing the collection made the 5th of February, at 20:05, and considering the auto-scroller move the cursor 800 pixel down every 5 seconds, therefore, the browser extension collects an average of 20 impressions per minute.

This is opendata, you can download the dataset. Sadly, is poorly documented right now, but if any research group wish to collaborate, the project needs to get funding and expand partnerships.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store