Preventing the next Cambridge Analytica(s)

The use and abuse of the data of 50 million Facebook users by Cambridge Analytica has sparked astonishment in mainstream debate, but it comes as no surprise for those who have been studying the phenomenon for years.

The project addresses exactly this problem.

Recognize the algorithm, hinder it and then set ourselves free from its control, escaping where possible from the metadata industry.

The wide-scale profiling of the electorate offers new, greater powers to those who own this data; resorting to profiling during electoral campaigns is just one of its many uses. The behavior of a person, and broadly of a nation, is not strictly based on the weeks of electoral campaign, but on everyday life. We also passively inform ourselves all year round from the people around us, our quality of life and our perception of the public sphere.

Indeed, this latest scandal is useful, above all, to turn the spotlight on the state of the digital information market. Regarding the implications of the scandal in terms of privacy and identity, we have nothing to add to what has already been written.

Handing over the power.

Two groups have given Facebook the power it now has: the users, who have built a network, and those who publish fresh content.

Before social media, there had never been such an in-detail representation of global users’ relationships (the so-called “social graph”). Now, with more than 10 years of history, Facebook is able to record, in real time, what a part of humanity is doing, with whom and where.

On the other hand, content publishers saw Facebook as an unprecedented opportunity to reach those users. But the amount of cacophony-noise perceived by users at a certain point has become stressful, with generic news updates mixed with discussions among friends, who follow different rhythms and passions. This allowed an algorithm to creep in and decide what was or was not important for us, with the result that we ended up seeing just a small percentage of the content addressed to us, whereas other content appears several times. The algorithm, as we already know, leads to an information and emotional distortion of reality.

Some long-term solutions are now being discussed: the Federal Trade Commission could break the monopoly of personal data ownership and free and decentralized architectures such as the social network Mastodon are being created, but these are all still distant solutions, politically and technologically complex.

We do not have a solution, but a basic and necessary approach

It may be that the algorithm shows you only a portion of what you should see on your timeline. That is what happened to our experimental users who, during the Italian election campaign, followed the same 30 sources, but who have all seen different stories on their timelines.

It may be that Cambridge Analytica is indeed able to show you psychologically irresistible messages.

It may be that you are profiled according to your income and the neighborhood in which you live, rather than according to algorithms of unspeakable complexity.

In all these cases, injustice is felt because, in a world of abundance of information, we are left in the dark.

We cannot know what the algorithm has prevented us to see, nor if the advertising that we see is part of a micro targeting campaign, because Facebook’s informational experience is ephemeral.

“Refresh the page and everything disappears”

The moment you refresh the page, the content in front of you may never appear again. What will appear will be decided by Facebook, whereas a citizen — to be collectively confronted with his peers — should be able to compare his timeline with that one of another person. Likewise, we should be able to do the same with Facebook timelines, since it has become in all respects a personal editor. allows you to make a copy of the data (public posts and advertising) that you see on your timeline and compare it with people you trust. The political problems of ignorance and propaganda are not solved, nor critical sense is automatically created, but at least people can confront each other and audit the information they have received.

That at this time there is still no clarity on the real influence of Russian advertising or Cambridge Analytica over the US elections is not acceptable. How often, how and where this targeted advertising appeared? How effective was it?

We cannot get back in time and record what appeared in 2016, but since July of the same year we have been working on (fbtrex), a project aimed exactly at this.

With fbtrex:

  • Your posts are collected for your own use;
  • A system, for the public interest, will allow us to analyze patterns, and not personal profiles (we are currently working with some research team for this experiment);
  • Users, if they wish so, will be able to compare each other’s information diet (we have received a grant from #keepiton for this, it’s a work in progress).

Surprise! As always, when it works, it is hampered

In the last year, other organizations have collectively gathered Facebook’s advertising. The US Congress’ attention over the issue was high and for the first time Facebook had to open up, disclosing the advertising attributed to Russian organizations. The social media’s operation to contain the damage was to make political advertising slightly more transparent.

According to us this is just a fig leaf, but we will not analyze it at this stage. Let’s just say that a few days ago Google took down our extension from Chrome web store, now showing a 404 error. We are in contact with our legal support group to get it back up; but in the meantime, you can use the extension with a browser without conflict of interest, Firefox (speaking of which, look at the Mozilla petition to demand a privacy reform from Facebook)

Also, Facebook for a few months has implemented a trivial but annoying system of obfuscation to defeat the pattern matching mechanisms that allow the analysis of third-party advertising. We have already reported the change here:

Consequences are clear, for example, on (that strange keyword that was previously used for pattern matching has some S’s that do not normally appear to users: SpoSnsoSrizSzataS)


  1. We are currently in TakeDown. Thanks to OpenRightsGroup for talking about it.
  2. Dataset retrieved during the Italian election monitoring are public and documented on our github repository, this is the analysis methodology.
  3. Tool like this can be used: in the short term, by research group. In the long term, by any user which want to compare their own personalized experience with other peers. We struggle in making the project more accessible and reliable, please give a look to few initial analysis and feel free to get in touch via mail, at support at tracking dot exposed ;)

cheers from the tracking-exposed team.