Testing Facebook algorithm in an electoral campaign (methodology)

We wish to observe the social media from many users point of observation: but pretty few knows about us, so far, therefore, few install the browser extension, and therefore we can’t get known showing our results: impasse.

After the first call-for-contributors last year in the Netherlands, we also observe, compare real users is not optimal: they are so diverse you can’t really do an honest comparison.

And by the way, the goal is not to get millions of users, just to show the algorithm impact!

And to tell such story, using Facebook users under our control, seem to be an optimal solution:

  • no risk of privacy leaks: we follow only public sources, and they are not real users
  • Fewer variables to keep in account. When we compared real users, the complexity of the Facebook informative experience is bound to too many variables.

With us in control of the profiles, we can manage precisely what they follow, how they are polarized, when they access, how much they see, the friend they have (zero). Give a welcome to the 6 profiles (avatar? persona? bot?) you’ll see in these blogposts:

We pick six pages for every political orientation (5), the 30 pages followed by all of them, just to be sure it is clear, this is a picture:

The volunteers of our team keep the profiles well polarized, liking also some of the post the page associated with their polarization, published.

This will train the algorithm and the posts presented to our puppet-users diverge quite rapidly.

The bots are using an autoscroller, a simple tool that at the same hour of the day refresh the newsfeed and make the computer scroll down, thus, collecting the posts. Every access is called a timeline, and we scheduled to get 13 timelines per bot per day, we begin the 10 of January 2018:

Note: 10 to 31 is from January. Number different than 13 means that day the bot didn’t work well or we did something manual. We improved expertise in this, but a reliably way to orchestrate web scrolling is not yet found. when the entry is empty, means the computer was not working at all, 0 timeline. Any suggestion for a more reliable way?

Impressions are the elements composing this dataset, because sometime, 13 timelines do not means the autoscroll worked perfectly, looking with more details, then:

Antonietta between 27 of January and 7 of February didn’t perform well, a technical fault we should keep in mind to ignore any additional bias.

On average, we got 47 impressions per timeline:

And a every impression in a timeline has recorded the impression order, how the posts appeared on the profile timeline:

The permaLink permits to trace back the post appeared. Above we are seeing the collection made the 5th of February, at 20:05, and considering the auto-scroller move the cursor 800 pixel down every 5 seconds, therefore, the browser extension collects an average of 20 impressions per minute.

This is opendata, you can download the dataset. Sadly, is poorly documented right now, but if any research group wish to collaborate, the project needs to get funding and expand partnerships.