The risks of recommendation engines, automated newsfeeds, information ranking and the commercialisation of personal information

Bram van Es
The thinkerers
Published in
104 min readJan 25, 2017
are you inside, or outside the bubble? source

A little story

I will try to raise your awareness on how the use of ranking algorithms in search engines and recommendation engines for products and services and the associated trade in personal information impacts our society and our rights, directly and indirectly.

The magic that was once part of starting up a browser in your local library and unlocking a seemingly infinite labyrinth of knowledge and information that invited you to explore and learn, has disappeared. There is no longer a labyrinth that we have to explore but a helper that can fetch the information we want, when we want, always, everywhere. This is great, however, this helper does not just wait for our input, it gives suggestions, whenever we start to talk to the helper it will fill try to finish our sentences, based on his own associations between you and what you have said thus far. When we search for information we will receive nudges, hints, suggestions, we are being directed the moment we open our browser by a plethora of ques. Not only that, we have to use this helper because he is the only one that knows how to sort and retrieve the data. The helper is like a man at the door of the library, you are not allowed to go inside the library but rather you are to tell the helper what you think you want, the reason being that this library is supposedly too large for you to walk through. The helper though is like a lightning fast marathon runner that can fetch all the information you think you need, but still he needs some more information because you have just given him some keywords and the library, being as large at it is, will probably probably contain a lot irrelevant information just based on those keywords. He asks for your information through some questionnaires you have to fill out, but this is not enough. He will remember what you have asked him before, and he will secretly mine for more data, he will read your emails, your messages, analyse your location history. In time this helper knows everything about you, your friends, your interests, your religion, gender, sex, your personal issues and he is also interested in your medical history. Whenever you are not using his services, he sells your information to other helpers and also publishing platforms who like to know what types of information are most likely to be received favorably by whom. You have grown used to using this helper, in fact, you can hardly do without him.

All libraries are like this now and you have learned to trust all of the helpers.

You however, have no idea who these helpers really are, what they know about you and with who they have shared their information. All you know is that you are dependent on these helpers. In the mean time, you have been lied to, you have been spied on and your private information is unwittingly sold to numerous other helpers who you also don't know.

The helpers are happy with our information need, the more we use their services the better, because it exposes us to their sponsors. The more time we spend watching their sponsored suggestions the better. The more we activate our phone to see sponsored suggestions, the better. This is compounded by the value of our information, who we are and what we search for has itself become valuable, so it wants more of that as well. The helper is smart, he knows everything there is to know about neuro-linguistic programming and he knows everything about you. The helper comes up with tricks to make you want more information, from push messages, suggestions by friends or specific phrasing that arouses your curiosity. We slowly develop an urge that we must have access to data all the time, not because we need to but because we have become addicted. From being independent because of it, to being dependent on it. Like all bad habits, it has grown on us, and I think it is time to go cold turkey and cut ties with these all-knowing helpers.

This metaphor is based on the personalisation paradigm where every individual has his/her own helper. This personalisation paradigm is now deeply embedded in online marketing (microtargeting) and in online information retrieval. I believe that this paradigm is deeply flawed. On a philosophical level I believe that we need to wander into the unknown to become truly wise and open minded, on a societal level I believe that the central processing and storage of intimate personal information weakens our democracy, on a human rights level the incessant requirement to share personal information weakens our application of our right to privacy, seclusion and it muffles the use of our freedom of speech. On a technical level I think that we are now capable of building helpers that do not require any personal information. I believe that the situation we are in now is one of technical debt and of capitalist immoral legacy.

With this article I would like to point out some problems that urgently require some ducktape before the internet sustains irreparable damage and is irrevocably used against us. The structure is as follows: I will explain the origin of personalised recommendations and it's hunger for personal information, the definitions and concepts that are important to understand this topic, the fundamental problems that result from personalised recommendations (plus what is required to maintain it) and finally some suggestions to turn things around.

Why do online companies care so much?

Online companies are obviously in the business of making money, by selling products or services. Before these companies sell anything they will need to display an offer of some kind, this offer may or may not initiate a transaction with a customer. Often online businesses have little time to do so because a site visit is typically very short and for every online retailer there is a plethora of alternatives. Hence, it is crucial to make the most out of that brief encounter, not in the least because each encounter costs money, whether it is through advertisement or through an actual site visit. Basically, to maximize revenue, online businesses aim to maximize the following metrics:

  • conversion rate,
  • revenue per visit,
  • return rate, and
  • click-through-rate in combination with cost-per-click.

In other words, regardless of the goals/motivations of the website visitor, the algorithm wants to maximise

  • the probability that you click on, the most expensive, ads,
  • the probability that you purchase, the most expensive, products,
  • the probability that you visit the website again, and
  • the time that you spend on the website.

which may be conflicting with your goals. In fact, it may be conflicting with the interests of society as a whole. I hope this conflict will be clear by the end of this article.

So the main purpose of the rather euphemistic term personalised communication is not to improve your life but to maximize the effectiveness of advertisements or to increase your exposure to advertisements, of commercial, ideological or political nature. The natural result is that an increasing amount of information that you are exposed to is primed to activate consumerism and is designed to be easily accepted by your subconsciousness, which, in combination with personalised recommendation engines that maximize the probability of user engagement leads to a new type of information addiction and literally reduces our ability to focus.

Furthermore the technologies and methodologies used to facilitate and control personalised-communication activities can be easily used by malicious actors to undermine our autonomy by limiting and controlling the information that we are exposed to, to suppress the freedom of speech by automated censoring, and to degrade the pluralism and heterogeneity of democratic societies by the creation of filter bubbles or the viral spread of political propaganda, without us knowing.

The purpose of this text is not to dismiss the use of recommendation engines as it is clear that guidance/assistance is needed to make practical use of online data possible. To truly empower the netizens to independently explore the available online data and draw sensible conclusions from it we need to provide them with power tools to find, interpret and analyse this data. However it is also clear that the means and technologies that enable these personalised recommendations are easily abused by malicious actors, which are unfortunately abundantly present. The questions are then, can we systematically prevent such abuse without disrupting the usefulness of the internet?

Specifically we may wonder if features like automatic content feeds and autoplayed content in general are beneficial for the users and do not fundamentally undermine the autonomy of the citizens who are increasingly exposed to online, targeted information.

Freedom of speech on the decline

Ironically (or rather, cynically), the freedom of online speech has been on the decline since the emergence of social media has had significant societal impact. Arguably the mid-2010’s saw the height of the freedom of speech with the relative immunity of social media platforms as the legal facilitator. The rise of social media and the ease with which dissonant societal views are spread and amplified have catalyzed, if not facilitated, the Arab spring. The recognition that citizens could be politically mobilized relatively easy by unregulated online communication, has resulted in governmental interventions, from nation-wide blocks of social media (Turkey) to a 'nationalised' firewalled internet (China). Less noticeable and more gradually, increased exposure to dissonance combined with a fear of persecution and a liability of large media platforms has resulted in online hatespeech watchdogs who are being erected under the guise of ‘safe spaces’. These safe spaces are partially legitimised by the rise of fake news and political propaganda, as it threatens to destabilise the public discourse. It seems, that the potential of open platforms to connect the world and increase mutual intercultural understanding is not unlocked automatically.

Opinion: However, the knee-jerk responses from governmental organisations are too late and only contribute to the mistrust in established institutions, and are counter-productive from a more fundamental perspective: limiting free speech will not undo bigotry and hate. The biases that enforce each other in online echo chambers and filter bubbles can only be countered by removing the mechanisms that lead to echo chambers and filter bubbles in the first place.

A disturbing example is the BBC who have assumed the position of judge, jury and executioner when it comes to free speech on their platform, I quote from their cookie policy (2017):

If you post or send offensive, inappropriate or objectionable content anywhere on or to BBC websites or otherwise engage in any disruptive behaviour on any BBC service, the BBC may use your personal information to stop such behaviour.

16 Where the BBC reasonably believes that you are or may be in breach of any applicable laws (e.g. because content you have posted may be defamatory), the BBC may use your personal information to inform relevant third parties such as your employer, school email/internet provider or law enforcement agencies about the content and your behaviour.

The Swedish state has gone as far to collect a list of online hate disseminators, featuring politicians, academics and journalists. Google has imposed implicit algorithmic censorship by restricting the display of advertisements next to 'controversial' content, clearly frustrating the freedom of information and the freedom of speech.

In Austria a man was fined for 'liking' a slanderous comment on Facebook, the judge stated that

the defendant had failed to prove that the comments he had liked on Facebook were true.

In Germany they have the so-called Netzwerkdurchsetzungsgesetz, a very recent law that obliges social media platforms to remove hatespeech and fake news, within a day under threat of millions of euros in fines. The effect is obvious, for practical reasons the social media platforms are forced to use a broad brush, significantly curtailing the freedom of speech in an extra-judiciary manner. The European Commission sees words as weapons, literally, but even if they are weapons then surely the legitimacy of their usage should be evaluated by a court of law to prevent self-censorship.

Now we have arrived at a point that the free and unhindered access to, and the spread of, information is threatened from yet another angle; online media platforms and content providers that are primarily driven by economic motives have been operating in a legal and moral vacuum for the past decades. In this period a world-wide infrastructure has been constructed to store, enrich, process and trade personal data. Furthermore, it has been demonstrated numerous times that this infrastructure is vulnerable to attacks by governmental and private groups. The commercial actors that are responsible for the generation of this data have been shown to side with governmental actors sooner or later. In essence, in the last two decades the private sector created a global interconnected surveillance infrastructure that can, in practice, be utilised by governmental organisations at will. I.e. the internet slowly changed from the great emancipator to a powerful new tool for authoritarian regimes to centrally monitor, control and manipulate the people.

In the meantime governmental organisations do little to curb the process of increasing commercial control over personal information, in fact they would rather facilitate it because it represents an economic opportunity, and again, it allows for the expansion of their control span in terms of limiting free speech extra-judicially. This had led, not just to a de facto outsourcing of censorship to private corporations but even to a collaboration between semi-governmental and non-governmental parties to actively remove subversive information sources.

There is another, more pervasive factor; the algorithms used to generate information feeds are as of yet primarily focused on exploiting the weaknesses of the human mind, from cognitive consonance to confirmation bias. Most importantly, the information sharing platforms are now owned by relatively few, incredibly large, entities that have turned information sharing into an economic activity: from gathering, selling and reselling personal information to disseminating news as the precursor for displaying advertisements which undermines the role of the media as a societal and political watchdog. What is more, the data that is collected is enriched by automatically inferred statistical models for the prediction of personal behavior and preferences. It is my view that these inferred models should be categorically included in privacy legislation.

I also see another scenario, the liberation of the vast amounts of data that is floating online and that combined with open source data analysis tools and pro-rata computational resources serves the discovery of truth and knowledge, the protection of democracy and the enforcement and protection of fundamental human rights. This new type of information retrieval may not only be a means to empower all levels of society but can also form the breeding ground for real intellectual and entrepreneurial collaboration of people around the world. All of that with respect for your right to a personal life. I think it is time to get started with fulfilling the promise that the internet once held but first I need to convince you that we are now on the wrong path.

Important ideas you should be familiar with..

I think that laying out the vocabulary of this topic is a demonstration of and in itself. For each term I will explain the relevance and I will give examples.

Privacy

the fundamental human right of any natural person to have and to protect a personal life

There seems to be great confusion about the importance of privacy. A common response to privacy-related issues is

"I have nothing to hide.." — Nobody

When I write Nobody I mean that logically this is a person that either has no personal life, or this is a person that does not consider his or her personal life worthy of protection. In the former case this person is either enslaved by a master that does not allow anyone to be distinct or this person is voluntarily enslaved to conformity. In the latter case, a voluntarily renouncement implies that this person sees him or herself as a nobody, which is perhaps as much sad as it is tragic.

A common error is that people only look at their own situation when dismissing the importance of privacy, where things like social and sexual taboos are merely vocal expressions of discontent and not, for instance, physical harm, repudiation or even persecution. In the free west, we have little notion of such repercussions and are very rarely made aware that such a reality exists. However, even within the relatively safe borders of western societies there is the common risk of 'losing face', the degradation of your social reputation or even the risk of losing your job as the repercussions of a free expression. Hence, even in a country such as the Netherlands it is crucial that you are aware of and in control of the manner in which your expressions are shared. Privacy is not just about protecting your personal life, it is also about protecting your ideas and how they are disseminated.

Another important reason for the protection of your personal information and information regarding your personal behavior is the possible abuse by malicious actors, or the misuse by incompetent actors, which will be elaborated in this article.

When I write the term malicious actor I am well aware that it immediately reduces the amount of readers by 90% but I feel that we should at least hypothesize the emergence of evil doers that may do bad thing where perhaps the reader would do good things. Unfortunately we live in a world where such misanthropic evil doers are present in all layers of society, so, for the sake of argument

hope for the best, assume the worst.

Privacy is not only crucial for the free exercise of the freedom of speech but also for the free dissemination of information. It is, one could say, crucial for the exercise of one’s self and the exercise of the collective intelligence in a democracy. Furthermore, even if it does not apply to you, or you do not feel the urgency to defend this right, then please be aware that this right most certainly applies to and is most certainly urgently needed for: whistle-blowers, political commentators, journalistic sources, counselors, comedians, cartoonists, medical doctors and criminal witnesses, to name just a few.

Also see this great post:

Privacy is instrumental for the exercise of the freedom of speech, which in turn is instrumental for the effectiveness of civil control and empowerment in a liberal democracy and necessary for the democratic legitimacy of governmental control.

Without privacy there would also be no seclusion, no retreat from piercing eyes, judgmental opinions and the weight of expectations from others. This seclusion, the right to not speak in public and to retract in a private sphere where ideas and thoughts are secretly honed, gives room to the development of unique, potentially revolutionary, ideas.

Speaking is silver, silence is gold

In the end privacy and the right to seclusion is crucial for the exercise of your other rights. The mere awareness that data is gathered on your person, including your expressions of opinion, will lead to behavioral changes like increased conformity and risk aversion, i.e. a chilling effect will take place on dissonant opinions and more specifically, combined with the fear of social isolation, non-anonymity will lead to a spiral of silence where we increasingly conform to a increasingly narrow band of thought.

A direct example is China, besides the application of the social crediting platform Sesame the Chinese government is planning to have hundreds of millions of public camera’s on top of the hundreds of millions of camera’s that are already in place. These camera’s are part of a nationwide monitoring system that uses AI to identify and track individual citizens. An immediate consequence of this technology is that citizens can no longer protest anonymously, in fact, citizens cannot even organize gatherings anonymously without being flagged by the AI that is used to orchestrate the huge camera system. This is deceptively easy as the Chinese government only has to associate political and ideological inclinations to each individual which automatically leads to the ability to detect any clustering of ideologically similar individuals.

An important reason for privacy is job security; for one, due to increasing online presence employers can easily, and increasingly automatically, scan social media platforms, your social connectivity, your consumer profile, perhaps your credit score and any activity that created negative media coverage in the past.

So, privacy matters, a lot, regardless of whether you individually have ‘nothing to hide’.

Keep this in mind, because the information that you would want to treat and have treated with discretion is very valuable for effective personalisation engines.

How to influence people

I will shortly discuss the psychological weaknesses that are innate to all human beings. These weaknesses can be harmless were it not that these weaknesses are actively exploited, by politicians, marketeers and ideological evangelists.

Cognitive consonance: the established theory that being exposed to information you recognise or opinions you agree with is accompanied by a positive sentiment, and an actual physiological response to that effect, and vice versa for opinions you do not agree with. This is the basic mechanism that leads to most of the cognitive biases.

Cognitive bias: a systematic pattern of deviation from rationality in judgment, leading to perceptual distortion, inaccurate judgment, illogical interpretation

There are several types of cognitive bias, especially relevant for online information consumption are

  • confirmation bias: the tendency to selectively look for evidence that supports your point of view, ignoring alternative explanations and opposing evidence. This is driven by cognitive consonance.
  • salience: the tendency to focus on the most distinct, the most salient feature, in an image, or a text, to form an opinion. This is driven by the inherent difference in energy requirements between forming an instinctive and a rational opinion. Whereas the former is produced almost immediately and without much effort because we simply attach associations based an preconceived notions, the latter requires some level of contemplation, perhaps even introspection and in the worst case even an alteration of our prior knowledge. We are instinctively drawn to the most salient features of any type of information.
  • conservatism bias: the tendency to give more weight to prior evidence than new evidence. I.e. the first evidence presented to you will have a stronger effect on your opinion, all other things considered equal.
  • anchoring bias: the extremum of conservatism bias, where people tend to rely heavily on the first evidence that is presented. This is related to priming whereby an initial impression will influence the interpretation of following impression.
  • bandwagon effect: the observation that popularity/normalcy/acceptance has a self-enforcing effect whereby the probability of adoption increases with the increased actual adoption. In part due to a network effect where the probability of individual exposure increases with the actual exposure. In part due to the tendency of people to conform to governing opinions without considering evidence.
  • clustering illusion: the tendency to underestimate the amount of variability, and overestimate to amount of clustering, i.e. false pattern recognition.This can lead to the Texas sharpshooter fallacy whereby similarities are stressed and differences ignored.
  • selective perception: the tendency to easily forget and not even perceive information that is discomforting or contradicting with prior beliefs. This is strongly related to confirmation bias, being opposite in nature.
  • Mere exposure effect: the tendency that mere familiarity with information is associated with a higher likelihood of preference. This is perhaps the most important effect to enable propaganda of any kind. This effect is strongly related to cognitive consonance, as exposure leads to a confirmation of prior belief when exposed to the information again.
  • availability heuristic: akin to the mere exposure effect and the recency bias this is the tendency to overestimate the likelihood of events if they are easier to remember

A more complete list can be found here.

The takeaway message here is that we have cognitive blind spots that can be exploited by presenting information in a certain way and in a certain order (spatially or temporally). We cannot be aware of this continuously simply because the ground state of our brain is focused not on rational processing but on instinctive processing, see for instance the work by Kahneman et al. If interested, also read this article on how to deal with biases:

Negativity bias

I already mentioned the term salience and that not all information is not treated equally by our brains. We have several cognitive biases that are actually sensible from the perspective of self-protection and survival. One of them is the negativity bias. We overestimate negative features compared to positive features, leading to negativity dominance. Also, we are more inclined to take note of negative information, either through news items or daily discussions. In particular this pertains the aspect of risk as this is more closely related to our survival instinct than say a losing sports team or a forest fire on the other side of the world.

A rational explanation for our higher regard of negative features is the notion that in daily discourse the mentioning of negative features (critique) is more likely to be correct than the mentioning of positive features. For the same reason that a complaint is more likely to be factual/truthful than a compliment.

We tend to

  • notice negative information more easily
  • remember it longer, and
  • exaggerate it compared to positive information

and is comparable to the conservative bias.

What this means in practice, and in the context of this article, is that negative headlines, or negativity in general, can be used as a hook for recommended commercial information or political propaganda, i.e. the use of fear to attract attention and increase acceptance. The negativity bias is one of the factors explaining the effectivity of online trolls (besides salience and confirmation bias) in steering public discourse.

Priming: the idea that exposure to one stimulus will influence the response to another, following, stimulus

A common application would be to display ideological or commercial advertisement immediately after the required sentiment, say anger, pleasure, sadness. Indeed, the order in which I lay out my points will influence your perception of this topic. For instance, I could start with a heart-wrenching story that demonstrates an abuse of personal information and then relate back to that example throughout the text, inversely, if I wanted to persuade you of the benefits, the initial story would be exuberantly positive, explaining how e.g. the illiterate are empowered by personalised search engines.

Framing effect: the idea that the type of presentation of ideas will change the perception of the ideas themselves, this is an example of applied cognitive bias.

This is an example of applied cognitive bias and relates to psychological priming. Whereas priming is sequential, framing is simultaneous.

You can see framing as an artificial context which is attached to the ideas for the sole purpose of manipulating your perspective and with that, the probability that you dislike/like, accept or refute, the proposition. On a more instinctive level the artificial context triggers associations which we attach to the framed proposition.

Herd effect: the idea that we tend to believe the governing opinion or the majority vote.

The effect of collective intelligence or wisdom of the crowd improves the likelihood that a group decision is the most correct decision. However, this can only occur if a diverse group of people give independent estimates for a logical problem. The sense of collective intelligence might explain why, as individuals, we put so much trust in majority votes.

In reality, the individuals in a crowd are not independent, they are clustered in groups and often the issues that filter through the crowd are not of a logical nature at all, but rather of a complex societal or even ideological nature. Hence, the herd effect is the false believe in a universal validity of this collective intelligence.

This false believe is easily exploited by marketeers and politicians by implying normality and commonality. This ties in closely with groupthink but it is not confined to specific groups. Basically, by implying normality, as in the majority concurs with a certain stance, a general groupthink is activated and the likelihood of acceptance is increased.

Tribes: online communities of people that share common interests and/or ideas

Off line and on line, people flock towards liked-minded peers, for fraternisation and self-identity. You might call tribes the online equivalent of real-life clubs.

Of course, identifying to which tribe you belong can be incredibly powerful for governments, retailers or insurance companies, for targeted advertising, and profiling.

Facebook friend tribes, source

Information addiction and the ‘attention engineers’

All of the above terms come together in Tristan Harris’ idea that mobile phone apps are designed to create information addiction.

The basic mechanisms that he describes, and that, according to him, are being applied by large social media platforms can be summarised as

  • “control the menu”: show a limited curated list of options to bound the effective decision space of the user
  • “Put a Slot Machine In a Billion Pockets”: using the concept of intermittent variable rewards smartphone notifications are cognitively very similar to say gambling.
  • “Fear of Missing Something Important” (FOMSI): akin to the fear of missing out, a basic mechanism to greedily accept new information sources if there is a chance it will provide useful information in the future
  • “Social Approval”: By implying that commercially catered information directed at me is the result of social processes I feel pressured to participate.
  • “Social Reciprocity”: By implying that another person did me a favor by tagging me, recommending me or anything of that nature I may feel inclined to repay in kind.
  • “Bottomless bowls, Infinite Feeds, and Autoplay”: by creating a limitless supply of content that is automatically fed to the user, more information will be consumed then when the feed was limited and the movies did not perform autoplay. The mechanism behind this is simple; before you are able to consciously decide to switch off the content, more content is fed to you. It should be noted that the effect of "bottomless bowls" was based on research that is now contested. Nevertheless it seems obvious that autoplay promotes bingewatching and that infinite feeds promotes longer scrolling →for the simple reason that continuous A/B testing is the norm and the optimisation target is the time-on-site. It is more likely that researchers are behind the curve in terms of persuasion techniques, rather that social medial companies are ignorantly applying techniques that do not work.
  • "Aversion to loss": Through gamification of social interactions users build up a feeling of aversion to 'quit' the game and lose a so-called streak.

and several more that you can read here. All of these mechanisms combined lead to the creation of addictive habits. It has been argued that our addiction to social media is merely the result of primordial tendencies with regard to human interaction, that it is merely an expression of healthy group behavior. This however merely helps to explain our tendency to become addicted.

Consumer-internet businesses are about exploiting psychology. And that is one where you want to fail fast because..people are not predictable and so we want to psychologically figure out how we manipulate you as fast as possible and then give you back that dopamine hit. — Chamath Palihapitiya, former VP at Facebook

Tristan Harris can, hyperbolically, by described as someone who escaped the dark side of product development and is now evangelising a counter-movement. In his wake, other former employees of Facebook, Google and Twitter are giving similar signals, and are warning for a smartphone dystopia. Even the creator of Facebook’s like’-button has grown disenfranchised with it’s creation, as is the former mentor of Mark Zuckerberg who has teamed up with Tristan Harris to curb the negative effects of social-media platforms.

The same mechanisms that are responsible for creating information addiction will persuade you to stay inside your filter bubble, to garble up the information that is increasingly tailor-made to trigger a pleasurable response. As said, cognitive consonance works at the physiological level, forms the basis for the bulk of our cognitive biases and marketeers and political campaign consultants know very well how to use this information for their personal gain.

A testimony to the ongoing normalisation of such practices is the overt discussion and promotion of so-called neuromarketing; being broader than just the use of biases to improve conversion it also deals with consumer behavior after the purchase and aims to predict what products customers actually want.

“if the advertising is now purposely designed to bypass those rational defenses … protecting advertising speech in the marketplace has to be questioned.” — Jeff Chester

The smartphone in particular has specific characteristics that allows for a very effective habit creation, most noticeably the ability to display push notifications in various ways and, as said, the ability to create variable (unexpected) rewards.

“Why do we pay so much less attention to those things than we do to drugs and alcohol when they work on the same brain impulses?” — Cal Newport

The creation of self-enforcing habits and it’s relation to the creation of filter bubbles (and virtual safe-spaces) is one aspect. Another aspect is age, the average age that a person receives a smartphone is now 10.3 years. This is crucially important for the long term effects.

The younger the mind these techniques are applied to, the more profoundly they will affect the brain, because younger brains have more plasticity (are more easily changed) and because the creation of neural schema’s early on are more fundamental and more difficult to change later on. Basically what we are teaching the children of today is that it is normal to base their self-image on online alter-ego’s and have strictly online social relationships, where what they receive in terms of information is handed to them by algorithms designed to feed cognitive consonance. So, the first generation that grew up in the presence of internet and smart phones, generation iGen, is psychologically in a weak state compared to the millenials that precede them due to unrealistic self-expectations, a ‘social’ network of hundreds of perfectly sculpted alter-ego’s and an information network that supplies them with a constant flow of irrelevant but enticing information that feeds into their habits. The latter is in direct relation to the creation of so-called safe spaces that social media platforms aim to create. In these safe spaces the individual is less likely to be confronted with strongly dissenting views, content that they deem shocking or disturbing, and, as said they are confronted with sculpted images of their peers. This likely leads to stress, anxiety, loneliness and depression. A trend that is likely related is the increase of narcissism and the occurrence of obsessive compulsive behavior.

Rates of teen depression and suicide have skyrocketed since 2011. It’s not an exaggeration to describe iGen as being on the brink of the worst mental-health crisis in decades. Much of this deterioration can be traced to their phones.

source

When I asked Eken about other common sources of worry among highly anxious kids, she didn’t hesitate: social media. Anxious teenagers from all backgrounds are relentlessly comparing themselves with their peers, she said, and the results are almost uniformly distressing.

source

An indirect result is that the current generation will have a completely distorted view on the freedom of speech, which, according to their experience, is limited to views and ideas that are in line with their own ideology and are part of their safe-space.

Rosenthal effect

The Rosenthal effect is basically the creation and self-enforcement of stereotypes by the repeated exposure to an association between an individual/group and a negative/positive qualification. The positive variant is called the Pygmalion effect and can be used to create self-enforcing positive feedback loop, e.g. for special needs children, people who are rehabilitating, etc. The negative variant is called the Golem effect in which repeated exposure to negative news regarding a specific person or group of people leads to an actual demeaning attitude towards that person/group, most likely leading to a confirmation-bias fueled self-fulfilling prophecy.

I.e. the systematic distribution of negative information regarding groups with specific characteristics can lead to a self-fulfilling prophecy. This, of course, was always possible but by using personally targeted digital advertisements on a global scale the instigation of say societal polarisation can be done in a relatively obfuscated manner as compared to traditional media.

Doubt injection

Quite recently I have seen one particular article from the Guardian being recommended to me, repeatedly, through a scientific news aggregation app. This was particularly interesting because it was relevant for a societal discussion on the restriction of a particular substance, glyphosate. A substance which is suspected to cause genetic and hormonal disorders with bees, is definitely carcinogenic for rodents, and is very likely to have long-term effects on public health for humans for large enough concentrations. This substance is the most used herbicide in the world, and has been for decades. On top of that, the original creator and producer of this substance, Monsanto, developed genetically modified seeds that are resistent to this specific herbicide. If this substance would be seen as a known carcinogenic then basically the business model of Monsanto would evaporate. I.e. Monsanto has a lot to gain by simply creating confusion on the topic in the public mind and to cast doubt on the claims of criticasters, for instance by singling out favorable research. Basically what happens when scientific papers are pushed through targeted advertisement is that the discussion is taken away from the realm of scientific discourse and enters the realm of (the more polarised) political discourse where the conclusions are overly simplified. In this particular example the UN’s World Health Organisation had concluded that glyphosate is likely not carcinogenic for humans from exposure through the diet, and obviously this does not coincide with the headline “glyphosate unlikely to pose risk”.

One well-targeted study to cast doubt whenever necessary

Later I noted a similar thing on my facebook timeline, another sponsored article from “Facts of Science”.

Still trying to nudge me..

The effectiveness of this injection of counter-arguments is formed by a combination of biases:

  • the anchoring bias: if this is the first information one ingests about this subject it will likely stick as the most correct information
  • the mere-exposure effect: basically by massively distributing one particular counter-argument it will nudge a large number of people
  • the recency bias: the latest information is retrieved most easily.

The consequence of scientific articles being pushed in the public domain using commercial platforms is obvious. No longer will the societal impact of a scientific result be weighed by just it’s scientific merits, as judged by scientific peers. Rather, it becomes dependent on the amount of investment put in the public dissemination of the results.

This is an example of fake news, as it represents a deliberate attempt to stifle a discussion using selective facts, creating or maintaining controversy that prevents a political consensus (say to ban a specific chemical agent). Specifically, large corporations like Bayer-Monsanto have the reach/influence to apply what is called ghost writing. Where a text, largely written by the resident scientists of the corporation, are (co-)signed by supposedly independent researchers.

More generally, the persistent use of only superficially valid arguments based on selective cherry-picked facts to defuse/mute critique on the status quo is best denoted as the neoliberal optimism industry as it primarily serves to undermine non-profit institutions.

“the infectious spread of pernicious relativism disguised as legitimate skepticism” — Matthew D’Ancona

This article describes two ways to perform doubt injection:

  1. selective sharing: cherry picking favorable results
  2. selective defamation: undermine non-favorable results
  3. biased production: ghost writing

The primary motivation for doubt injection is the conservation of power, the preservation of a business model or simply, sustaining a source of income.

  • conservation of power: e.g. discrediting political opponents
  • business model: e.g. defending the belief that a business model is not disproportionally detrimental

Examples: Railroad companies funding climate skeptic think tanks, tobacco companies funding biased research, pesticide producers churning out propaganda.

Dunning-Kruger effect, when the less informed think they know best..

There is a very interesting phenomenon called the Dunning-Kruger effect. This idea postulates that people who have very little knowledge about a topic, will be extremely confident about what little knowledge they have. This confidence then sharply decreases as more knowledge is obtained and awareness increases on the intricacies and fastness of the subject at hand. The more you know, the more you realise how much you do not know.

Those who know more are less confident about their knowledge, image source

The implication is that if you are first in spreading information regarding any topic, those that are ignorant will be confident about this knowledge. As the receiver is ignorant of the topic at hand initially and may have no reason to doubt the source, the information is more likely to be accepted as true.

"The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge "— Stephen Hawking

There is another implication. Given biases such as anchoring bias and conservatism bias, the follow-up articles that are factually correct may not be accepted because it goes against the initially received information. To change one’s mind on the factual correctness of news is then a matter of overcoming cognitive dissonance. Cognitive dissonance puts factually correct news at a disadvantage if it is preceded by factually incorrect news. These so-called belief echoes persist over time, regardless of (even immediate) corrections.

So, being first matters.

Advertisement slang

The 'Mad men' have their own terminology that is very enlightening. Let me elaborate some of them.

Clickbait: snippets of distracting information that activate curiosity and lead to paid content

Clickbait is a pejorative term describing web content that is aimed at generating online advertising revenue, especially at the expense of quality or accuracy, relying on sensationalist headlines or eye-catching thumbnail pictures to attract click-throughs and to encourage forwarding of the material over online social networks. Clickbait headlines typically aim to exploit the “curiosity gap”, providing just enough information to make readers curious, but not enough to satisfy their curiosity without clicking through to the linked content.[1][2][3]

From a historical perspective, the techniques employed by clickbait authors can be considered derivative of yellow journalism, which presents little or no legitimate well-researched news and instead uses eye-catching headlines that include exaggerations of news events, scandal-mongering, or sensationalism.[4][5] Wiki

BuzzFeed, Gizmodo, theatlantic, bbc.com, or almost any free source for online information will likely see pictorial article recommendations layed out in blocks or strips with enticing images and a headline that sparks your curiosity. The curiosity cliffhanger is referred to as the ‘curiosity gap’. Often completely unrelated to the actual content, and often with no relation of the headline or the image with the underlying content. The blocks/strips of advertisements mixed with actual articles will be preceded by a statement like 'recommended by…' or a variation thereof.

Typical phrases contain the following snippets: top lists, surprising facts, largest/smallest/fastest/prettiest/deadliest, trending now, just in, something shocking, something disturbing, you have to see this, you will not believe what happens next, etc.

Facebook announced in August 2014 that they would tackle the clickbait issue algorithmically, and again in August 2016. This of course, is a form of algorithmic censorship. In fact, it will categorically censor any article that is written in a style similar to actual clickbait articles.

Native advertisement: a type of advertisement that is embedded in the content

Described by the ad providers as a non-disruptive means to engage the customer, it basically is advertisement blended into the content.

While many brands counted on traditional display ads in the past, they’ve come to realize that native ads garner much higher CTAS. In fact, reading a native ad headline yield 308x more time of consumer attention than processing an image and banner. — Outbrain

Taboola, Outbrain, Gravity, Revcontent, Newsmax and also Google adsense are business-centric services that aim to maximize the number of impressions and the conversion rates of articles. This particular means of advertisement is relying more and more on clickbait-type announcements. I refer to articles as advertisement for the very simple reason that informative articles are presented in the form of advertisements and do not necessarily enrich the content or are even related to the content. They are promoted similarly as advertisements for products or services and their placements are based on maximization of click-through-rate, and not maximization of relevance. This means for instance that clickbait advertisement is shown relatively often, as this type of advertisement is more effective.

Of the 1 billion user milestone, Adam Singolda, Taboola CEO, told Real-Time Daily: “We believe there is a ‘winner takes it all’ market when it comes data. It’s either you know the person behind the screen, or you don’t. Knowing if someone is a video fan, or if someone tends to subscribe to things, are binary questions that enable publishers to drive true personalization on their sites.”

Singolda explained that while Facebook has amassed a huge amount of data, Taboola’s goal is to draw from its own “trove of information about how people consume content across the Web to empower publisher partners to leverage personalization technology and free, anonymous, actionable user data to build audience, engagement and revenue.” — Adam Singolda, Taboola CEO (source)

Companies like Outbrain and Gravity do more than just display ads, they also provide recommendations of the on-site content, i.e. part of the information feed is outsourced because (good) data scientists are expensive. It is easy to see how this can create an information asymmetry: suppose that this information feed is outsourced to relatively few companies, they can then decentrally nudge a large portion of the online populus towards specific concepts and ideas.

To feed the algorithmic suggestions of these adproviders, huge amounts of personal data are continuously being stored and processed, what is not stored can be bought from data brokers. As the click-through-rate, and with that the revenue, increases with more accurate recommendations and targeting, personal information itself has monetary value since that is the fuel for the recommendation system. This added value is the single most important reason that such data is stored, processed and traded on international markets, representing over $200 billion in 2016.

Native advertisements, or advertisements that are not distinct from the content they accompany, were frowned upon not that long ago. Google adsense required from the publishers that they clearly indicated that their advertisements were indeed, advertisements and not neutral/unbiased content. This went as far as coloring schemes that had to be distinctly different from the regular search results. There would be occasional checks, and repercussions if you did not comply. Now the two largest search engines, Google and Bing, apply obfuscated advertisements on their own page, appearing inline as both the first and the last results.

Ad-providers/publishers have been tempted by the higher cost-per-click and conversion rates of native advertisement to violate the basic rule that advertisements and content should be strictly seperated in order to avoid confusion and deception of the visitor.

An explanation for this move towards content-based marketing may be the shift in focus of Google's ranking algorithm, from keyword-based to content-based in August 2013. This new algorithm demoted websites with little original content and opened a market for automatically generated content. Using Markov models, or by simply randomly concatenating the content of existing websites, or even by applying online translation of content from foreign-to-native language. This fake content would facilitate the display of advertisements on so-called parking pages. Another shift that took place in 2013 was a range of algorithmic changes for the Facebook newsfeed that rewarded good click-through rates with higher rankings. Combine this with the rise of Upworthy.com that was ridiculed for it's clickbait headlines and then copied by competitors because such headlines were actually very effective in increasing click-through rates and you basically have the ground zero for fake news and clickbait.

Edit: fast forward to 2023, convincing, organic, fake content can now be generated automatically with large language models.

Another explanation is that consumers seem to accept native advertisement as long as the ads are somewhat relevant. This user-acceptance is, however, influenced by the same psychological mechanisms as what is used for marketing. Gradual exposure will simply increase the user-acceptance of personal data-sharing over time.

Sponsored content: advertisement guised as a neutral descriptive article or an editorial

Also called an advertorial. This is very closely related to native advertising, but more focused on providing information than on persuasion, basically it is an infomercial. This is quite common for free mass media, also the paper kind, you might find such a sponsored article in the Metro newspaper for instance. Although this type of advertisement goes further in terms of intrusion it is (1) much less common than native advertisement, (2) it is more distinct from the source content so the user is more aware that he is looking at an advertisement.

I already mentioned adplatforms such as Taboola and Outbrain and the way they integrate recommendations with the content. As part of their algorithmic recommendations they add advertorials to the selection. In fact, these advertorials can even dominate the selection, which is only obvious if you scan for the term ‘sponsored content’. Again, the adproviders/platforms have gradually moved away from the principle of clearly distinguishing advertisement from content, for the obvious reason of increased revenues.

Remember, this is a profit-driven business. If showing three pictures of half-naked people, a monkey on a jetski and an advertorial for penis enlargement is demonstrated to maximize ad-revenues it happily displays that, even if the content is e.g. about a church renovation. The reason, again, is that these ad-providers are business-centric and the algorithms maximize simple revenue-based metrics.

Subliminal messaging: displaying a message such that it is not consciously perceived by the receivers

Subliminal messages are by definition non-transparant as the user is unaware of an advertisement being presented. Even though the message is unconsciously perceived, the perception triggers an instinctive response.

Subliminal advertising is forbidden in some western countries, e.g. the United Kingdom, and for a very good reason: by exposing citizens to subliminal messages their perception of reality is subconsciously and involuntarily altered. Hence it is a fundamental violation of the right to self-determination and a violation of the right to a personal life.

In essence, most of the advertisements I have discussed, are forms of subliminal messaging, even if the ad itself is visible, because the neural processes that determine your response are physiological in nature and occur subconsciously. For example, simply being confronted with a brand in combination with a positive sentiment will increase the likelihood that you will have a positive sentiment when confronted with that brand later on and even the exposure itself, regardless of the association, will increase that likelihood. The reason that common advertisements of say the billboard type are accepted is that it is clear to anyone viewing the advertisement that it is meant to convey some kind of promotional information with the intent to persuade you. Even if there are subconscious processes that steer your preferences at the very least you are aware that this might occur. With native advertisements ad providers are dodging the proverbial legal bullet by mentioning the fact that an advertisement is presented when it is clear that the way in which they present this notification is too inconspicuous to create awareness with the user; an example is shown below, here in the top-right part of the advertisement reel you see an indication that these links refer to advertisements. Besides the small size of the indicator, it is well known that in graphical interfaces the top-left attracts the most attention and the top-right probably the least, also the competing terms From The Web are bold faced and with a larger font-size. This ties in closely with the aforementioned information addiction, a habit which develops inconspicuously.

Ad reel on theatlantic.com

Even more inconspicuous is the conflation of on-site articles with actual paid links under the umbrella term “promoted links” as is done in the following example on theguardian.com;

Ad reel on theguardian.com, with no explicit mentioning of the commercial nature of the suggestions

This poses a moral hazard, because the uncontrolled response following this unconscious ad-exposure forms a reduction of our autonomy. When you consider each individual instance, this reduction of autonomy is benign but when the user is repeatedly exposed to these inconspicuous advertisements it can be used to alter behavior, on a large scale. Realise that these ad platforms, Revcontent, Gravity, Taboola and Outbrain can reach billions of users. It is easy to imagine that the ad platforms can be swamped with click-bait type fake-news headlines where a state actor funds the initial clicks such that the ad networks pushes the headlines through their network.

Location-based advertisement: targeted advertisement in the context of your current location

Using a technology that triangulates the signal strength of your bluetooth or WiFi signal your location in shopping malls, streets, or in stores is tracked, stored and sold. This basically means that your phone’s mac-address is logged and processed. The resulting behavior is sold to data brokers, marketeers and shop owners, for advertising and for shop layout/inventory optimization. In fact, this happens in realtime so that just-in-time marketing can take place. Location-based advertisements are a form of targeted advertisements.

Targeted advertisement and behavioral targeting:

Basically the commercial or ideological application of the psychological phenomena that I discussed earlier to more effectively push or nudge a user towards certain behavior; i.e. confirmation bias, framing, priming and cognitive consonance. A more and more common practice right now is just-in-time marketing, marketing that is pinpointed to your very specific context in a very specific moment in time. This type of marketing relies on up-to-date information such as

  • your consumer characteristics
  • your recent consumption behavior
  • your location, direction of travel

Just-in-time marketing relies primarily on data that may be considered highly personal depending on the context. However, there is so much data available on your preferences that, combined with the available demographic data (such as your age, occupation etc.) and your contextual data (such as your location) that something like psychographic hacking becomes a possibility. The idea is that given enough of the right information it can be inferred what type of information can be offered to nudge you in a particular direction. So behavioral targeting combined with just-in-time marketing is not just an invasive means of tracking but it is also an effective tool to manipulate you.

V-count demo, a combination of just-time-marking and demographic data

The function creep is obvious, I leave the details for the reader.

Control paradox

Offering users control over their privacy settings gives a false sense of trust and leads to complacency when it comes to actually protecting your personal privacy. The paradox is that when you are offered no control you will be more vigilant about your privacy, but when you are offered control, that in itself is likely to lead to an acceptance of less privacy, even though it is based on an opt-out scheme. Facebook, Google or any other social media platform are well aware of this. This is also related to the idea of creating so-called online safe spaces, where you can trust the content as it is presented to you. The fallacy of a trusted relationship with inanimate, non-sentient corporations is the result, not just of evil-doing manipulators but of a long developing relationship between consumers and consumption suppliers who have increasingly relied on more and more advanced persuasion techniques. The rise of digital information addiction that is discussed in this text is perhaps the most extreme result of these persuasion techniques.

Malvertising: the injection of malware through advertisements

From the Wikipedia-description:

“In 2012, it was estimated nearly 10 billion ad impressions were compromised by malvertising.”

How does this work? Either by malicious scripts that are activated once the ad is loaded or by using the media-interpreter as the transporter of a script that is hidden in the medium . The advertisements you see are types of digital media-formats. Each format has a type of interpreter that translates the binary content to the rich media that is displayed on your screen. Rich media formats such as .swf (i.e. Flash) is notorious for this reason, as it can also execute the malicious scripts. A media format such as .GIF can be used to transport malware onto your system and exploits in .JPEG and .TIFF have been used in the past to execute scripts.

The ad-platforms and the ad-publishers are partially to blame for displaying malicious advertisements, since they are responsible for vetting (or rather for not vetting) the ad-providers.

Leakware: a type of ransomware that hijacks your computer or software

Why is this in the list of definitions? Because this form of ransomware uses your personal information as leverage and because the infrastructure that is built to gather your personal information is, and cannot be, fully secure. Examples of large scale publicized hacks are

  • Yahoo, login information for 1 billion emails
  • Linkedin, passwords of 117 million users
  • JPMorgan, account information of 100 million customers
  • AdultFriendfinder, login information for 340 million accounts
  • Cloudfare; leaking encryption keys for large sites such as Uber, OK Cupid, Fitbit and about 3000 others sites, for several months

and so much more, captured in some awesome graphics on the following website:

Another use for personal information is demonstrated by the ransomware Spora that changes the ransom depending on whether they think you are a businessman or not.

Spyware: software and hardware that is designed to covertly gather personal information

We should note that adware, web beacons and tracking cookies are actually types of spyware as hardly any web beacon or tracking cookie is explicitly approved.

The following documentary by Al-Jazeera illustrates why the average consumer should be weiry of sharing their real identity online, and actually sharing any type of unprotected sensitive information. In a nutshell, not only is your personal information worth money, there are dedicated developers of spyware to steal your private communication and sell it to the highest bidder.

Leakware, malware and spyware are all exemplifications of an inevitable function creep pushed forward by the monetisation of personal information. This monetisation, and the infrastructure that facilitates this monetisation, starts with the advertisement industry and more particularly the combination of data brokering, ad publishing and real time bidding.

Cookie wall: the concept of a cookie-acceptance requirement to access a website or a service

Direct examples are abound, at least if you go to websites for companies located in the EU. On the one hand it is an annoying result of legislation aimed at protecting privacy and on the other hand it demonstrates that such legislation is not enough to protect privacy because the choice to either accept cookies or have no site access at all is not really a choice.

A good indirect example is the often obligatory access to private information by cell phone apps; from your agenda and your contacts to your messages and even to access over your camera and microphone. In the case of Facebook this goes two ways, firstly you need to accept full access to your private information and secondly you need the app to get access to the webbased interface on difference machines.

Google leaves little choice…

The European Commission wants to go one step further. Instead of the requirement for individual websites to have opt-in cookies there will be a central browser-based cookie-switch. If the user decides to reject cookies in the browser, he will simply be denied access to websites that claim to require cookies. This means that to have access to individual sites that claim to need cookies one has to temporarily turn on cookies for all sites. Not surprisingly, the European Commission announced the regulations as a boost to the data economy. The new ePrivacy directive may still be amended or rejected following suggestions from the European Parliament, however the European Commission is not legally bound by the advice of the EP.

The ePrivacy proposal is not just about harmonisation and improved enforcement, it is also about making it easier to use/trade personal information:

New business opportunities: once consent is given for communications data — content and/or metadata — to be processed, traditional telecoms operators will have more opportunities to provide additional services and to develop their businesses. For example, they could produce heat maps indicating the presence of individuals; these could help public authorities and transport companies when developing new infrastructure projects.

Simpler rules on cookies: the cookie provision, which has resulted in an overload of consent requests for internet users, will be streamlined. The new rule will be more user-friendly as browser settings will provide for an easy way to accept or refuse tracking cookies and other identifiers. The proposal also clarifies that no consent is needed for non-privacy intrusive cookies improving internet experience (e.g. to remember shopping cart history) or cookies used by a website to count the number of visitors. — source

A proper regulation starts with the distinction between first and third party cookies, where the latter is not necessary for technical reasons and may result in personal data being stored, processed and sold by said third parties. Already, the first-party cookies, necessary for a normal user experience, can be placed without permission and if the proposal of the EC would go through unaltered it would de-facto do the same for third-party cookies.

In May 2018 two European laws will have come into effect; the GDPR and the above mentioned ePrivacy regulation. Whereas the ePrivacy regulation is de facto a victory for tele-marketeers GDPR at least puts severe restrictions on the storage and transmission of personal information, and requires accountability and transparency on the processing of personal information. However, it is explicitly stated in deliberation 47 that direct marketing forms a legitimate interest to process personal data;

The processing of personal data for direct marketing purposes may be regarded as carried out for a legitimate interest. — source

It is exactly this purpose, the use of personal information to drive targeted advertisements and to generate leads, that has created the backbone of the personal information market. Also, by making explicit that basically anything can be stored as long as certain requirements are met the threshold to actually ask for permission is lowered, i.e. the net effect of the GDPR is that more information will be stored and traded. The only difference between the old and the new situation is that now, massively storing personal information is perfectly legal.

Lobbyism in practice? source

At the same time the commercialisation of online information has led to the development of legal constraints that will hinder the freedom to receive and share information. Most extremely this will lead to a so-called link tax and upload filter in the EU: literally a taxation on references to commercial websites, and an a priori filter on media files that are uploaded to websites to prevent copy-right infringements, completely disregarding the possibility of satire, an aim specifically exempt from copyright laws in national copyright laws. The link tax, upload filter, the PSD2 directive (to allow any payment service to share your payment history with other payment services), the GDPR directive, the ePrivacy directive and a formalisation of the requirement for media platforms to remove ‘terrorist content’ within an hour of a removal request from “national competent authorities”, should, in light of the earlier EU code of conduct against hatespeech and Germany’s hatespeech-removal regulation, raise red flags.

These EU measures should be seen in the context of the goal to create a single digital market. I.e. the EU strives for homogeneity and clarity, predictability and accountability, and finally, transparency and control.

This is the ultimate exemplification of the commercialisation of information: an international governmental institution that creates a body of legislation to streamline the process of trading information.

Cookie syncing and Super cookies: Cookies that are persistent through the synchronisation of individual cookies

How does this work? Take for instance, the theatlantic.com, suppose that you are reading an article. While you are reading the article there is periodic communication between the following addresses (February 3, 2017):

  • ping.chartbeat.net: 1x1 .img file in the response, and request-query parameters such as the source, the address, identifiers, the genre and the author.
  • googleads.g.doubleclick.net: cookie identifiers.
  • edge.simplereach.com: keep-alive response, and request-query parameters such as the title, the author, the genre and of course identifiers.
  • quantserve.com: real-time bidding for ads, content information and identifiers.
  • krxd.net: a pixel.gif in the response, and request-query parameters with my browser, my operating system, my country and province/region, article information.
  • adnxs.com: my IP-address in the reponse as X-Proxy-origin, user identifiers and cookies.
  • scorecardresearch.com: information about the articles and identifiers.
  • nexac.com, openx.net, jivox.com, c3tag…and many more, believe it or not.

You might refer to these communicators as beacons, all of which are 3rd-party cookies, meant to receive and broadcast your user activity and to facilitate the publishing and auctioning of advertisement space. The basic operation is that the host (e.g. theatlantic.com) performs an HTTP request that contains consumer/meta/content information and then it receives either an empty response or a small image with some identifiers in the url (probably to establish a trail of breadcrumbs).

Connecting with 11 websites through the browser led to a connection with 73 third-party sites. Tool used here is Redmorph (for chrome), Lightbeam is a similar tool for Firefox.

In the more extreme case special HTTP-headers are injected in your HTTP-requests. This can take place due to (malicious) software that re-routes your traffic through a proxy (a man-in-the-middle attack), or directly at the internet service provider. The latter was basically the exploitation by an advertisement company of a permanent header called X-UIDH that was injected by the ISP Verizon, who themselves used this header to track customer behavior. Another option, albeit illegal, is to sniff out your username from your browsers password manager:

Let me write this out a bit clearer, whenever you visit a website that contains such beacons there is periodic communication between multiple third party servers that register what you are watching or reading, and that potentially synchronises the cookie with other cookies based on your IP address, your email-address used to login, or identifiers that persist over multiple websites and multiple sessions because you are using e.g. a generic 3rd party login script (like the login script from Facebook and Google). This implies that, in principle, all of your online activity in relation to websites which contain such beacons can be centrally monitored and it does not matter whether you delete your cookies since persistent identifiers such as IP addresses are used.

This data regarding your online activity is actively processed by data brokers and marketing companies to build a personal profile that is either sold to 3rd parties (like telco-providers) or used directly for recommendation services.

Basic tip, frustrate any man-in-the-middle (MITM) type of injection by only allowing traffic to/from HTTPS-sites:

Stateless tracking ("Fingerprinting") :

Supercookies can be described as stateful trackers, requiring back and forth communication to establish an identity. There is also stateless tracking that establishes the "fingerprint" of a users device/software. Where supercookies can at least be detected, stateless trackers are passive. In this case fingerprints are a unique combination of characteristics that define your browsing context, this includes your browser details (from installed apps to resolutions and extra font-types), your ip-address and your operating system.

According to Mayer & Mitchell, 41st Parameter/AdTruth, BlueCava, and Iovation use fingerprinting to track users. This technology has found another usage by malware distributors because they can use fingerprinting to detect so-called honeypots, the machines used by malware investigators.

We have reached the point where it is practically impossible to avoid being tracked, why? Because all these individual bread crumbs left by third parties on different websites send information to only a few central Real Time Bidding (RTB) platforms where it used to estimate the best prices for ad clicks/views and data brokers where the information is centrally resold.

These RTB platforms and data brokers and the subsequent hording of personal information is a direct result of an economic push for personalised advertisements.

What are the issues with automated newsfeeds?

Filter bubble: the concept of being shielded from opinions and information that do not resonate with your own personal beliefs, see this Ted talk by Eli Pariser.

The filter bubble is supported by anecdotal evidence and increasingly by empirical data which suggests that we receive information from a decreasing variety of source through social media and that cross-cutting information is even suppressed. Other research shows that this can generate echo chambers, wherein like minded individuals reinforce their preconceived ideas. This Pew survey seems to suggest that people on Facebook are exposed to a diverse set of opinions. However, the fallacy here is that the mere concept of diversity is dependent on the context and the individual. If individuals state that they have received diverse opinions they are considering diversity from their own frame of reference. A similar fallacy is at work in the paper by Boxell et al. plus they consider a time period up to 2012 when the use of recommendation engines for news articles was not at all common. Most importantly, in the case of Boxell et al., they consider completely different age groups and it is well known that political inclination changes with age and in fact becomes more polarised with increasing age.

Pew

Furthermore, we need an objective diversity measure from only one frame of reference for the simple reason that then and only then are we able to study it’s evolution over time and can results by compared quantitatively with other research. A recent publication denies the existence of a filter bubble merely because compared to non-social media users, social-media users were more likely to be exposed to news from two sides of the political spectrum. This completely ignores the evolution of those figures for the different ideological subgroups. The hypothesis of the filter bubble, and it’s underlying mechanism suggests that the ideological distribution becomes more polarised, i.e. more flat in the middle and higher towards the edges, i.e. the evolution of the following distribution can make or break the filter bubble hypothesis. The below figure shows that more polarised information is shared more often which is to be expected due to the importance of salience on user response.

Alignment of opinions among information sharers on Facebook, source

Another important point that is often missed in the filter bubble discussion is that the filter bubble is the result of personal preferences of the individual. If the individual has a broad preference of topics, his or her filter bubble will obviously be larger. This means that when you define different groups based on their efficacy with social media you should make sure that you are not making a pre-selection in terms of broadness of interest.

Another Pew report contradicts the assertion of an overall diverse exposure and shows that media landscape for conservatives and liberals is qualitatively different. Another fallacy, or weakness, of these and other reports is the static nature of their results. When studying the presence of a filter bubble one should not look at the present state of the exposure and it's diversity measure but rather at the evolution of those metrics. Most importantly, the filter bubble does not constitute an information sphere with an impermeable border, but it does constitute a semi-permeable border that is more likely to allow information that increases cognitive consonance.

A rather crucial aspect that is often overlooked is the fact that we are already in a possible filter bubble. So to identify this effect, if it exists, in the general population we need to have a reference group that is unaffected by the possible sources for the filter bubble creation.

One obvious hypothesis that has not been rigorously tested in filter bubble research is the effect of age as a proxy for the relative efficacy of social media versus traditional news sources. For instance, one might expect that with an increasing relative contribution of social media w.r.t. traditional news media the information consumer becomes more susceptible to fake news and alternative facts, e.g. 44% of the millennials versus 6% of the 55+ age group believes in the flat earth theory.

If one is exposed to a narrow band of opinions that consonates with one’s own opinions, the individual’s measure for diversity will likely be affected negatively. I.e. even if someone thinks he/she is receiving a diverse set of news items, it may actually be very (and increasingly) biased for an outsider. Hence the use of surveys, still the method of choice for social scientists, cannot by itself lead to evidence for or against filter bubbles.

Two distinct communities based on political retweets, the left/right leaning prediction is 87% accurate, Conover et al.

A side effect of these filter bubbles is that citizens can be easily identified as belonging to any particular political group, say a group of dissidents, by for instance the government, insurance companies or potential employers.

From being surrounded by opinions that resonate with your own opinion to being surrounded by dissenting opinions was hard to bear for groups of left- and right-wing voters who had to swap newsfeeds on Facebook in an experiment by the Guardian. It seems reasonable to suggest that such an effect of feeling resentment when exposed to dissenting opinions becomes stronger when one is inside the filter bubble for a longer time. I.e. the longer your viewpoints are unopposed the more you resist dissenting opinions.

“For too many of us, it’s become safer to retreat into our own bubbles, whether in our neighborhoods or college campuses or places of worship or our social media feeds, surrounded by people who look like us and share the same political outlook and never challenge our assumptions.

The rise of naked partisanship, increasing economic and regional stratification, the splintering of our media into a channel for every taste — all this makes this great sorting seem natural, even inevitable. And increasingly, we become so secure in our bubbles that we accept only information, whether true or not, that fits our opinions, instead of basing our opinions on the evidence that’s out there.” — Barack Obama

Further empirical research is needed to determine the prevalence and severity of the filter bubble, but the fundamental mechanism to create it is already given by the interplay of cognitive biases and the modus operandi of revenue-driven recommendation engines. The question is; how is the mechanism that enables a filter bubble counter-acted by neutral news platforms, open discussion and the day-to-day interaction with non-like-minded people (say at work). For filter bubble research it is necessary to

  • define cross-topic diversity metrics
  • apply a fixed frame of reference to measure the polarity of opinions
  • track the polarities over time
  • segment the measuring/survey groups by type of news consumption, and extremity of the personal ideological inclinations
  • distinguish between active information retrieval and passive information retrieval
  • focus on cognitively sensitive groups, i.e. young adults, adolescents, unemployed

Also, and this is very important, researchers in this field should realise that the application of automatic recommendation engines to online news articles is fairly recent. Scaleable algorithms for online (as in live) news recommendations only appeared in literature from about 2007 and were likely adopted by industry a few years later.

The use of automatic recommendation engines can create a positive feedback loop because it is linked directly to cognitive consonance. The proposition ‘the internet will lead to more polarisation’ implies that effective personalised communication is by definition the dominant information source and this has not been the case yet.

What will happen for instance if working at a distance becomes more prevalent, or indeed if unemployment soars due to massive automation, and more and more time is spent online? What happens if the traditional mass media, with human editors, also start to apply recommendation engines? What happens if the algorithms behind the recommendations become 100% accurate? I.e. it is absolutely crucial to, in the very least, hypothesize limit cases. What happens if social media platforms get increasing dominance over our information intake, increasing the control span of one central news supplier who also happens to know our personal affiliations?

At the same time we should keep in mind that this filter bubble is probably more or less permeable depending on the level of cognition and consciousness that is associated with the information. It will be much harder to create an effective political filter bubble than to create a filter bubble regarding books, video games or movies.

François Chollet lists several techniques (or exploits as he calls them) to purposely create political filter bubbles:

  • identity reinforcement
  • negative social reinforcement
  • positive social reinforcement
  • sampling bias
  • argument personalisation

and to combine all of these to effectively manipulate individuals you would need information centrality, information control, personal information, knowledge of social connectivity and advanced AI.

I.e. the filter bubble does not exist, yet, except for specific a-political niches, but the foundation is being laid as we speak.

You only need to convince 10%

As said, one of the misconceptions surrounding the mysterious filter bubble and the echo chamber is that it revolves around the behavior of the mean/moderate citizens. The idea that observing moderate citizens will shed light on the filter bubble is however an ineffective approach as it will produce the least pronounced effects (due to significant exposure to diverse opinions) and most importantly, it diverts from the more pertinent rise of extremism in fringe groups that are susceptible to radicalization through filter bubbles. A malicious actor might only need to manipulate 10% of the most susceptible citizens into believing falsehoods for the falsehoods to take hold of the other 90%:

Combined with the notion that political extremism is associated with a likely adoption of conspiracy theories regarding the opposing groups and given the high likelihood that this is a bidirectional, self-reinforcing effect it is easy to see how a malicious actor can steer ideo-political narratives on a large scale.

Information polarisation: the idea that gradually, a person is exclusively exposed to a specific world view through a reinforcing feedback mechanism

microscopic: If you combine the echo chamber with an algorithm that suggests new articles merely based on the increased likelihood of clicking on articles that have been seen previously you get an increasingly narrow information slit.

macroscopic: due to the bandwagon effect, the importance of salience and the narrowing information slit of individuals on a macroscopic level the information slit will also become smaller. This is self-enforced by the effect of ranking on click-through-rates and the dependency of ranking on popularity.

Convergent thinking: the idea that the combination of the effects of wisdom of the crowd, echo chambers, confirmation bias and filter bubbles has a diminishing effect on the diversity of opinions and the effectiveness of pluralism.

As a proxy the convergence of taste is probably easier to demonstrate.

.doi=10.1.1.399.6701, Lui et al.

The above picture displays a network of books sold connected to books that are suggested, the squares indicate the neutral books, the diamonds indicate the conservative books and the circles indicate the liberal books; the different colors indicate two communities as identified by a clustering algorithm, clearly the liberal and conservative groups are inside a ‘bubble’ .

Function creep: the idea that enabling a functionality for the purpose of doing good can shift to the purpose of doing bad unwittingly.

A good example of this is Gmail. Google has an infrastructure in place to monitor your emails, determine the relevant products, allow for real-time bidding and for placing ads. Without any stretch of the imagination this infrastructure can be used by the government to monitor emails for security purposes and it makes the argument against such monitoring much weaker. As the email user has already agreed on the use of personal information for commercial advertisements why not use it to protect national security? In fact is has been used, by commercial 3rd parties who could mine through all your email content.

One might argue that the interests of the big information-driven technology companies align with the interests of the intelligence agencies. Shoshana Zuboff introduced the term military-informational complex, to describe this alignment of commercial and governmental interests and the drive towards 'perfect control'.

We know from the documents released by Edward Snowden that this in fact has happened, and is most likely happening at this moment.

Another example is Facebook, who recently made a censorship tool to be able to enter the Chinese market, who is to say that this exact same tool will not be used by other governments and that Facebook will not use it for their own commercial benefit, for instance to safeguard access to other markets?

The biggest potential function creep of all is the creation of an infrastructure, a set of methodologies and algorithms to monitor, categorize, evaluate, judge and manipulate citizens

Case in point: the large-scale, systematic use of software exploits to hack into communication devices, most recently the Vault 7 leak from Wikileaks exposed, not only the CIA actively collecting and applying the exploits but also a market exchange of sorts for personal information involving other intelligence agencies such as GHCQ, NSA and cyber arms contractors. As more and more personal information is being shared between more and more devices, it becomes both more attractive to maliciously use (or even create) exploits and at the same time the number of exploits increase due to the larger number of devices.

Case in point: the combination of behavioral targeting and machine learning technology applied to the unholy task of nudging citizens to vote for a particular candidate using data that was illegally mined through facebook apps. Whether this had an actual effect is difficult to measure but you should keep in mind that particularly for the elections in the United States there is a only a margin of a few percent between each candidate, with a horse race for each state and as we discussed, we are also influenced sub-consciously. So the claim that for instance fake news has had no influence on the elections based on a survey regarding the recollection of received fake news is at best scientifically dubious and at worst naive since it implies that people are aware of the external influences of their state of mind. The public debate following the apparent abuse of personal data mostly ignored the most shocking fact that it was a collateral effect of facebook’s business model, which is the exploitation of personal data to sell targeted ads.

Data analysis: you are doing it wrong

Case in point: Sesame credit, sold as a transparent credit scoring system that promotes transparency and honesty, is really the precursor for a social credit system. Is this inconceivable in the west?

Technological progress poses a threat to privacy by enabling an extent of surveillance that in earlier times would have been prohibitively expensive.

— U.S. v.s. Garcia, 2007

No it is not. In a fragmented form it is probably already in place.

Other examples are Google DeepMind’s automatic lip-reader and public face recogniser FindFace used on Vkontakte, a technique which in one form or another was implemented on Facebook: it is easy to imagine the surveillance possibilities this gives to malicious actors. Besides the possibility of disclosing your location, your identity or the identity of your friends using facial recognition, machine learning can also be used to infer your sexuality, mental health or your criminal tendency. One of the focus points of this article is the use of microtargeting to steer political propaganda. With no stretch of the imagination the information used to target you can be used to attach your face to a political affiliation.

Let me spell it out for you, this technology enables anyone with minimum technical knowledge to scan through publicly images, recognize your face automatically, attach metadata to it and even features regarding your sexuality. Not clear enough: suppose I take a frontal picture of a random person on the street, suppose I am the most perverted sadist on the planet and I want to abuse lonely women, I see a woman, take a picture, and get an information feed of her personal information including address.

Forward to 2020, the above scenario is very real and not hyperbolic anymore:

and available to the masses via:

More clarity required? OK, what about Google getting in the insurance business? Yes, that also means, health insurance.

The use of detailed and intimate personal information to determine insurance premiums will undermine the solidarity principle. Remember that this personal information was originally shared to cater personalised advertisements. In fact, this personalisation has been presented to you as an argument to agree with the terms and agreements.

Imagine (edit: 01–01–2020) that we can just buy your up-to-date location information from some 3rd party:

The Evercookie is a JavaScript-based cookie built by Samy Kamkar that was used (or at least investigated for use) by the United States’ National Security Agency for tracking internet behavior on the Tor network.

Leaked presentation slide from the NSA

It is not hard to imagine that, in the mean time, a technology like stateless tracking has ended up on one of those slides.

A more benign example of function creep is that of a shift from dynamic pricing to personalised pricing, and indeed price discrimination.

Perhaps the most explicit and cynical example of function creep is Palantir that is built on technology developed in Silicon Valley for completely other purposes. Palantir works for the United States government, perhaps they worked on the disposition matrix, that was used to help determine kill targets;

In Germany the sentiment in government circles is shifting towards a broader censorship apparatus (expedited to private parties) under the guise of curbing hate speech followed by more invasive control of communication devices:

De Maizière also wants the security services to have the ability to spy on any device connected to the internet. Tech companies would have to give the state “back door” access to private tablets and computers, and even to smart TVs and digital kitchen systems. — source

This hatespeech censorship may at first seem agreeable depending on your point of view but keep in mind that these measures and the apparatus put in place to enable this censorship will stay in place, even when the (genuine) hatespeech has died down. Given that the government is tasked with protecting the public order, and given that any form of strong public dissent may lead to a disruption of said order, it is very convenient to have a governmental means to end such disruption. A means that does not require the use of the security forces, but simply the removal of opinions and the muffling of a public debate. Never underestimate the power of convenience.

Machine learning model: an inferred simplified description of reality that allows for the approximation of classifications based on observational data.

Take for example Facebook's automatic face recognition software that is able to automatically detect your face and your friend's face. This model will not be re-trained from scratch every time you go online, hence it is persistent between the training periods. This information is stored in bulk and processed in bulk. The same holds for Google, that keeps track of your emails, your search queries and your browsing history. This data is stored and analysed periodically.

In fact, this same principle holds for all trained models. In other words, even if your personal data is not stored explicitly, your model is. That is, the model from which can be inferred who you are and perhaps what your sexuality or political preferences are based on minimal online information, such as your 'like'-behavior, is stored and available for application.

Privacy legislation should involve not just raw behaviorial and personal information, but also the models from which personal information can be inferred.

Modelling of individuals leads to their dehumanisation

The most illusive and possibly the most important aspect of the algorithmification of our societies is the fact that it reduces our experience with the ‘other’ to an interaction with abstracted representations, as opposed to direct interaction with the real person. Representations based on a limited set of features, with biases and other model limitations. This abstractification of human relationships leads to an under representation of the truth and creates a band of possibilities to interpret the ‘data’ as pleased and with that it opens the possibility to project personal (or societal) prejudice. I would argue that this abstractification underlies the polarisation that has been creeping into online communities and indeed society at large in the last decade.

Echo chamber/circle jerk: the notion that one is more likely to communicate with people that are likeminded

The reason is incredibly simple but robust; by avoiding non-likeminded people you avoid questioning yourself. This ties in closely with cognitive dissonance: it requires energy at the physiological level to restructure your thought patterns, whereas you feel a positive sensation when your ideas are confirmed (consonance).

This research states that selective exposure to information (which will naturally be the case in a filter bubble) can also generate echo chambers. I would like to re-iterate that classical media (like newspapers) are not self-enforcing cognitive consonance based biases, whereas online interactive media, clearly are.

Groupthink: the idea that within a group, the desire for conformity and harmony leads to dysfunctional decision-making, self-censorship and intolerance to dissidence

This ties in with the chilling effect, were the fear of expulsion from the group leads to self-censorship.

Tribes, filter bubbles, echo chambers, all of these represent communicative ecosystems that represent a group that is related to your own identity. To some extent you will already be engaged in groupthink processes. Most of these groupthink processes will be benign, from friend groups to fan pages, and will not be detrimental to the diversity of opinions you receive and accept. It is easy to see how online forums or chatgroups predicated with a particular and specific ideological stance can quickly turn into a breeding ground for extremism. A good example is the online spread of Islamic fundamentalism or nationalist extremism in closed webspheres.

Chat bots and the automated sock puppets: smart agents that interact with humans/respond to events

I distinguish two types at present:

  1. Customer facing chat bots that facilitate large scale, low cost, customer engagement and feedback and complaints handling.
  2. Chat bots that respond to news events and public statements to have a maximum reach and impact of ideological propaganda.

The second variant can be used commercially, it is easy to see how: whenever there is a large news event that underlines the need for your product you start a mini-campaign on social media in relation to this news event. The second variant can also be used politically/ideologically. Analogous to the commercial application, an ideological or political actor can activate bots whenever the ideology or politics can be positively associated with current events. Whereas the commercial application merely has a small risk of diffusing news dissemination, the political/ideological application can effectively be used as automated propaganda.

This is demonstrated by chat bots on Twitter during the last US presidential elections. A significant percentage of the tweets sent in relation with Trump and Clinton originated from chat bots.

Onur Varol: “In this visualization of the spread of the #SB277 hashtag about a California vaccination law, dots are Twitter accounts posting using that hashtag, and lines between them show retweeting of hashtagged posts. Larger dots are accounts that are retweeted more. Red dots are likely bots; blue ones are likely humans.” businessinsider.com

The power of these chat bots is that they can be deployed at scale and they can be online continuously. At this time (2016/2017) there are about 50 million Twitter bots. Beyond the realm of America politics, it seems that preceding the Brexit referendum more that thirteen thousand twitter bots were distributing pro-Brexit propaganda.

The use of ‘fast’ media such as Twitter and Facebook avoids discussions that exposes the true artificial nature of the actor and makes viral growth more likely. Making matters worse, people do not read past the headline before sharing the information with friends, and are influenced by it themselves. This means that ingesting fake news or propaganda by using multiple chat bots is relatively easy.

There will be another variant of the chatbot, the automated sock puppet: Chat bots with a fake human identity that actively engage in online discussions to further a political/ideological agenda. This will be a natural evolution of applied artificial intelligence as AI researchers are striving to pass more and more advanced Turing tests. I estimate that to prevent this, absolute transparency of online users will be propagated by state actors, e.g. by strongly coupling a personal identity directly to a unique online identifier.

An example of how this can be employed is the recent stance on net neutrality by the FCC. Millions of people sent emails to the FCC either to support or to denounce the position. It turned out that a large part of the denounce-emails were actually created by bots, sent through hijacked email accounts.

Self-enforced truth, or rank-based bias: the idea that through link/recommendation based ranking information becomes authoritative in a self-enforced manner

What does this mean? The more an information source is displayed through search engine results and automated feeds, the more likely it is shared, which increases the likelihood of it being displayed to others, and the more likely it is accepted as fact due to shear commonality. This is basically an effectuation of the bandwagon effect, a type of cognitive bias.

News feed item ranking effect on CTR , source

In a digital era where content consumption increasingly originates from search engine results and automatic suggestions there is a point in time for each new website or app that relatively few people have used it and yet it needs to be found or suggested to others. This means that either the content providers and search engines have the responsibility to pre-select these possible winners or the product owners have to invest money in promoting their products. The more dependent companies or individuals are on these rankings and automated suggestions the more they are willing (and even required) to spend on advertising and the more they are reliant on special persuasion techniques to attract new customers. This development also has the effect that news media become more and more dependent on large network hubs such as facebook and google to disseminate information. This dependency and the complete lack of algorithmic transparency undermines the viability of dedicated news platforms and centralises information dissemination to non-democratic actors. Counter intui

A good example is Facebook changing it’s ranking algorithm such that promoted posts are favored over non-promoted posts, without inter vision with the stake holders.

I.e. the mere power to generate initial selections (seeds you might call them) of top sites or top apps can determine the outcome of what is and what is not a successful service or product. This power, which is basically a form of control over the supply of information creates a demand for behavioral targeting and basically presents an economic tollgate for newcomers.

The same holds to a lesser degree for ideas. If the distribution of information is primarily dependent on search engine rankings and automated feeds, idealists will have to employ the psychological persuasion and addiction tools discussed earlier and will perhaps even have to pay the information gatekeepers to attract a reasonably-sized audience.

It will no longer by sufficient to rely on the power of the idea itself if the viral spread of information requires the cooperation of large commercial entities. In fact, it flies in the face of a free, open and neutral internet.

One obvious result is the creation of online monopolies, even without the network effect (that at least explains the dominance of current social media platforms). The mechanism is already explained, and can be simply stated as

Popularity feeds popularity until it achieves exclusivity

which means that the window of opportunity to become a major player in online social-media markets is limited to the period of infancy of those markets.

To counter this, the importance of popularity and cost-per-click on ranking should be lowered in order to enable a larger pool of initial seeds to grow virally. I would even propose to completely ban financial incentives as weights for ranking news and otherwise informative articles. Alternatively I would suggest that search-focused platforms apply techniques like serendipitous rank-shuffling combined with some form of multivariate testing to avoid killing the long-tail.

Algorithmic discrimination: the idea that algorithmic decision-making processes can be discriminatory and/or can lead to self-enforcing

The obvious form of algorithmic discrimination is caused by the enforcement of discriminatory bias that is available in the data used to train the algorithm. There are easy fixes to this problem. To start with, the sensitive features (ethnicity, gender, religion, etc.) can be distributed uniformly over the classification, this means that the sensitive features themselves have zero predictive power but can still be used to distinguish clusters. Another fix is to equalize the true positive or true negative rate across the sensitive features with a threshold classifier. Another interesting direction is multi-objective optimisation where the target prediction is optimised in parallel for each minority representation with a penalty term for differences between the predictions for (otherwise) similar samples.

This does require the acceptance by policymakers that on aggregate the performance of these algorithms will decrease, because the amount of data will effectively decrease due to the forced uniform distribution and the initial distribution over the sensitive features will likely have had predictive power which is now lost, or because the optimization procedure is not fully focussed on accuracy. However, if the assumed feedback effect of such algorithmic discrimination (ethnic profiling for instance) holds true, in time the algorithmic performance will start to increase.

Another form of discrimination is price discrimination. The use of algorithms to optimize the probability of a sale or a click can also be used to offer user-specific prices to increase the average revenue per user. I.e. not only will that lead to people of low income being offered cheaper products, which in itself can be ethnically/racially discriminatory, but also people of higher income can receive similar products for higher prices, which is not only a violation of basic consumer rights but can also be ethnically/racially discriminatory. The taxi-company Uber has blatantly stated that it is applying price differentiation based on A/B testing to maximize the price per ride based on the specific routing of a ride: i.e. travelling to more wealthy neighborhoods will cost you more money. Another well known example is the travel site Orbitz adjusting it’s prices upwards for Apple users. Not only are these a text-book example of price discrimination which has implications for the rights of individuals, it also has societal implication as it strengthens existing socio-economic inequalities.

Other examples are: paying more because you have less access to competitors (retail price discrimination), paying more because you pose a higher risk for non-payment/devaluation based on your demographic characteristics (credit & risk based discrimination), paying a higher premium because you have a different risk profile (undermining risk pooling/solidarity) or being excluded from job opportunities because of your age.

This is closely related to function creep where the facilitation of one purpose (recommendation of products) leads to the facilitation of another purpose (price differentiation).

Growing information appetite: The idea that to maintain profitability and competitiveness in a data driven economy, continuously more data must be gathered, and with more features from which more information must be extracted

The 'data driven economy' is reliant on training data to generate models and insights that are sold to, for instance governments, commercial departments and ad publishers.

To maintain profitability of this new commodity, more and more detailed personal data is required to extract more (accurate) personal information. Similar to any market maturation, there is either an increased emphasis on economies of scale, and/or there is more product diversification, where in this case your personal data is the product.

So to stay competitive as a data/insights provider not only more data must be gathered, but also more features. In other words, the inevitable consequence of a data driven economy that treats data itself as a commodity is a financial push towards lower privacy standards. Either through lobby groups, or by propagating low privacy standards in a normative sense. For instance, by offering premium discounts on your health insurance if you start wearing health trackers, by filling out a question form for your general practioner on the site of the health insurance company, by kindly asking you to hand over your payment history for a mortgage loan, by storing your personal health information in the cloud or by enabling location sharing among friends.

A more direct demonstration is personal data mogul Facebook buying more, and more detailed information about it's users to feed into the ad recommendation engines. Why? To facilitate an increase in revenue, as is expected by the shareholders.

Information asymmetry: You know very little about the people that know a lot about you and you do not know what they know about you.

Because you do not know what personal information is (and has been) gathered by whom you cannot defend yourself against a possible misuse of that information and what is more, you cannot demand the removal of that information. This clearly violates the right to be forgotten.

Given that large corporations have access and control over this information in combination with a government that can in principle force the handover of this information, we in principle have a society-wide imbalance of power. It is easy to see why a government would push the so-called data economy; they are basically outsourcing and enabling mass surveillance with little to no democratic scrutiny.

There is another aspect, often overlooked; a deterioration of the online negotiation position of consumers. Simply knowing who a particular consumer is will enable an estimate of the purchasing power and the individual demand. This undermines one of the fundamental principles of the free market and leads to the aforementioned price discrimination.

Chilling effect: The idea that due to a fear of persecution and social repudiation people refrain from exercising their freedom of speech.

From available online data it can be inferred that you are you, even without explicitly mentioning your identity, you will have left breadcrumbs that lead back to your personal identity. If this is not a commonality now, then, given the drive towards more data gathering, it will become so in the future.

This realisation/awareness leads to an inevitable chilling effect whereby online information dissemination becomes increasingly benign in terms of government critique, level of controversy and predictability and therefore becomes easier to control. Anonymity and privacy are key in the exercise of the freedom of speech.

We allow humour, satire or social commentary related to these topics, and we believe that when people use their authentic identity, they are more responsible when they share this kind of commentary. Facebook community standards

Being more responsible can easily be translated to being more constrained. That in itself is not a bad thing as far as it refers to being more nuanced in terms of argumentation and restrained in terms of personal attacks. The problem is that the standards for these self-constraints are vague and ambiguous at best, are determined in a non-democratic manner and are enforced without transparency and without the possibility to give a rebuttal.

Which in light of an unknown degree of dismissal or even repudiation by your peers will lead to self-censorship.

What can this result in?

Unfiltered broadcasting and generation of fake news

For instance, a prime minister allegedly violating a dead animal. This unconfirmed tabloid article went viral on social media without any editorial filtering. Misinformation often comes in the form of click-bait articles, aimed to trigger our curiosity. Even though reading the content might betray the falseness people often do not read much beyond the headline. This dissemination can happen on all non-edited social media platforms, and indirectly on search engines also. During the United States elections fake news was mostly in support of the conservative candidate, but there is no reason to believe that this is specifically related to any political ideology, and it has already been demonstrated it's usefulness for the progressive left. The psychological mechanisms that facilitate the success of fake news are a human characteristic, and not some conservative or liberal tendency. Then, acknowledging and recognising the effectiveness of fake news to change the perception of the voters, it becomes a tool to either obtain or maintain power.

An indirect effect is that as ‘fake news’ becomes a house-hold term critical journalism can be easily dismissed as such, especially since a large part of the populus obtains their news through less established news outlets. As the credibility of the different news platforms is difficult to verify, a general mistrust of mass media ensues. From that perspective, the eagerness of governments and large media corporations to create fact-checking platforms should be regarded with the utmost suspicion as it basically forms the stepping stone to a large scale consolidation and control of news sources. I already mentioned an example in the introduction; Germany has recently put a law in place that aims to curb not just hatespeech but also fake news by obliging social media platforms to remove such content with a threat of high fines. Given the complexity of defining hatespeech and fake news it is inevitable that this will not take place with the utmost prudence. In fact, even as facebook has several thousands of employees dedicated to filtering out such content, it is impossible to thoroughly check the billions of messages that are posted everyday.

I suspect that due to selectively mistrusting news sources that show dissonant information, people will tend to flock towards news sources that confirm their opinions, regardless of their actual credibility. I.e. information polarisation becomes more severe and the effectivity of pluralism decreases.

A concrete direct example of the risk that fake news poses is the fact that a Pakistani minister responded in earnest to a fake tweet from the Isreali government regarding nuclear aggression and of course there is pizzagate, where an man armed with a submachine gun response to fake news regarding a pedophile organisation, Hillary Clinton and a pizza restaurant (sounding like the meme it is).

In 2017 there will be several important national elections that can shape the future of the European Union, namely the general elections in the Netherlands, Germany and France, all of them founding nations of the EU. Already, fake news is being directed at Angela Merkel, and it is expected that the Dutch elections will be targeted as well, and in fact all countries that are allied to the United States. If you wonder about the truthfulness of these latter references in light of the earlier discussion you have already underlined the importance to treat this problem seriously.

There is another threat on the horizon. The fake-news toolbox will be expanded with technology that enables mimicking your voice or your face, or even you entirely. Already this has been made accessible to the general public through software called fakeapp that allows you to create your own fake videos.

Face2Face
VoCo

This comes on top of the ongoing development of machine learning techniques to generate news articles.

The normalisation of censorship

Disturbing as the possibilities for massive manipulation are, the counter-effect is perhaps more alarming:

  • Automatic determination of truthfulness based on meta data and semantic characteristics: what about those false positives?
  • Truthfulness based on connectivity and so-called domain-authority: again, what about those false positives?
  • blacklisting and whitelisting of non-/trusted websites: dissenting opinions are not limited to approved platforms, are they?
  • Notice-and-takedown procedures, legal requirement to take down information once it is flagged as false (e.g. by the government): will the information platform have enough incentive, enough means and enough time to thoroughly check these claims?

all of which are forms of algorithmic censorship, this also holds for the automatic filtering of abusive content. This cannot be solved with more technology only if we stick to the dramatically failed paradigm that information-bites should always aim to please us and that we should consume more and more data. There is more at stake here than the revenue of an ad company or the average level of pleasure someone receives. There are deeper, more profound human values, that cannot be encapsulated in the probability that you make a purchase or click on a banner. Enforcing the use of encrypted data streams with HTTPS and DNSSEC to prevent the hijacking of information flows mitigates only a subset of the possible abuse cases. The same applies to blockchain technology as a means to ensure information authenticity. Why only a subset? Because actors with enough means can manipulate the blockchain, can copy and recreate the original information and send this with other HTTPS certificates, and if need be can create fake websites that contain the manipulated media with it's own blockchain entry and with full DNSSEC and HTTPS support. One possible solution is to monitor news, including rich media, check for items that are highly similar, detect the most salient/distinguishable differences and then based on media prevalence weighted by source credibility identify the version that is most likely correct.

An a-posteriori true-or-not check can be done using collective intelligence, by simply monitoring whether the readers think the information is true or not. However this has the risk to naturally filter out information that is strongly dissenting as dissenting opinions do not resonate with the general population and using fact checkers may result in exactly the opposite, where the increased credibility is used to disseminate false information.

Although the above cartoon oversimplifies a dynamic reality it does provide some inspiration for metrics that can be used to evaluate news content. For instance, it is possible to estimate the so-called polarity and the degree of subjectivity based on the text alone and much more can be done if this is related to other articles on the same topic. Whatever the solution might be, the new media platforms have the obligation to minimize unnecessary algorithmic censorship and maximize user relevance.

The worst outcome, is not that the citizens are misinformed by automated political propaganda but that they mistrust all media and are not informed at all.

If everybody always lies to you the consequence is not that you believe the lies, but rather that nobody believes anything any longer. — Hannah Arendt

This scenario, in which citizens are not informed at all by independent news sources is preceded by a phase in which citizens are primarily informed by directed propaganda that feeds on fear and prejudice, propaganda that goes uncontested since the recipient resides in a (self-chosen) filter bubble. Arguably, in this phase of post-truth, fascism can come to rise quickly:

Commercialisation of information dissemination

During the United States elections in 2016 it was revealed by Buzzfeed that over 100 pro-Trump sites were created and hosted in Macedonia, Europe. Why? Facebook will automatically display ads relevant to the displayed information on your timeline. Supposedly the expected click-through rate, cost-per-click and expected number of displays of pro-Trump advertisements was high enough to warrant setting up more than a hundred websites that disseminated false pro-Trump information on facebook. This feeds on the concepts of cognitive consonance that increases the click-through rate and the filter bubble that is created by facebook’s personalised news feed.

News itself has become a commodity because it functions as a carrier/facilitator of sponsored messages. The effectiveness of such sponsored messages increases roughly

  • the closer the news content is to the content of the sponsored messages
  • the more incentive the news content provides to consume products/services.

I.e. there is a dependency of the money earned with advertisements and the news content they accompany. This undermines the role of news media to act as a neutral monitor of the government, societal issues and international affairs.

In extremis this can lead to the situation that advertisements are not served with the news but the other way around.

To underline my point, the commonly used comment platform Discus introduced sponsored comments in April 2014. Basically, this sponsored comment is placed on top of a thread that is placed near a relevant article. This is the transparant variant of a more dubious practice: payed comments, i.e. a sponsored message which is guised as an opinion (remember the importance of transparancy?). This ties in with the sponsored content and native advertisement that I discussed earlier.

Another example that underlines this point is the site theodysseyonline.com that has thousands of students writing clickbait articles with sponsored content under the guise of journalism. This 'clickbait'-factory rewards writers based on the monthly views of their articles.

Similarly, now with Facebook as the perpetrator, Facebook consciously decreased importance of news from news outlets in favor of news from facebook friends, to increase the readership. To make matters worse, it seems people have a hard time distinguishing between real news and fake news, and may even like fake news over real news. The latter is easily explained by the fact that fake news is engineered for maximum effect, being more akin to targeted advertisement (and clickbait) than to actual news items.

What is more, the majority of adult Americans get their news from social media, by no stretch of the imagination this can be assumed to hold for all countries with a similar market penetration of social media actors.

Spammergate exposed River City Media as a professional spam distributor but also disclosed that about 1.4 billion sets of emails and personalia had been obtained, either by hidden online forms or through online black markets.

Through offers such as credit checks, education opportunities, and sweepstakes, this spam operation has gathered and conglomerated a database of 1.4 billion peoples’ email accounts, full names, IP addresses, and often physical address. There is evidence that similar organizations have contributed to this collection. An active market exists for trafficking in these types of lists for illegitimate purposes. — source

A side-effect of the increased prevalence of ad-based and ad-driven information distribution is the increased influence of marketing practices over free speech. From at least two directions: 1. the requirement to invest a substantial amount of money in the successful dissemination of your opinion and 2. the content limitations that are imposed by online ad-publishers, e.g. Google not displaying advertisements next to 'controversial' content.

A more direct example from the political sphere is Facebook directly participating in online political campaigns by providing dedicated (payed) support to political candidates, as for instance Duterte in the Philippines or the Afd in Germany. Hence, the commercialisation of information dissemination entails propaganda of any kind.

Commercialisation of personal data

The logical requirement for targeted advertisements is the incorporation of personal information in the ad platform. This personal data, obtained through cookies and online profiles, is part of a roughly $200 billion online advertisement market. As for the black market, in 2016 the market for cybercrime was $450 billion in size, this involves the exchange of malvertising software as well as well as the exchange of personal information.

So, your information is worth money. The information you unwittingly give is not only stored and processed but also resold to other parties, either in raw or in aggregated form. What is your personal data used for?

To facilitate this data brokerage there are over 4000 worldwide data brokers, Equifax, Towerdata, Acxiom, Experian, Epsilon are some of the largest, collectively storing information from billions of individuals with hundreds of datapoints per individual. What kind of data is stored, processed and sold?

Remember what I wrote about function creep and inference? The use of consumer data is not limited to retailers, what you buy, and what you search for online can statistically be indicative of healthy or unhealthy behavior. I.e. by inference the use of this data creeps from retailers to health insurance companies. You are most likely not even aware of the information that is extracted from the data you generate.

Also, have a look at these great articles

In Europe (the EU to be exact) the PSD2 (short for Payment Services Directive) will come into effect in the near future. This directive has the aim to harmonize the payment systems across the EU, horizontally, and vertically. This basically means that:

Through AISP’s, third parties will be able to extract a customer’s account information data, including transaction history and balances.

Yes, third parties, e.g. Facebook, Google, Amazon, Paypal, etc. can access your transaction history, if they get the right license, and no they do not need to be banking institutions.

Connecting your relationship dots..and the privacy effect of personalised advertisements

You must be wondering how this affects you, the average Joe, who has nothing to hide, and nothing to fear because technology is your friend (right?). Especially those handy relevant advertisements that are served up every time you go online. Well, I have got news for you, they don’t just know who YOU are, they know who you are connected to, and how you are connected.

The data brokers know, not just who they are serving ads to, but also who your housemates are, and by inference if that housemate has a special relationship with you. The only required ingredient is a shared IP address, or a shared account of any type.

This goes beyond establishing a relation between you and your direct partner. There is no technical reason to think that given all the traffic coming from a particular ip-address the data brokers are not able to discern you from your partner and your children.

The 'requirement' of algorithmic censorship, long live community standards

As Facebook or any other international social network, wants to serve automated content, on a large scale, internationally, providing rich media for teenagers in Bangladesh and elderly in Canada, it must use a broad brush when it comes to content constraints. It is becoming increasingly clear that social media platforms are not free-speech platforms.

There is the matter of nudity which triggers alarms whenever it is detected in images, even when it's societal, historical or cultural context allow for a non-sexual interpretation: examples are the censorship of a 'naked' statue of Neptune (facebook) or the censorship of an iconic Vietnam war image (facebook) and various examples for instagram. What causes these false positives? Two words, the safe space that social media platforms want to create for their users. For instagram

“We want Instagram to continue to be an authentic and safe place for inspiration and expression… Respect everyone on Instagram, don’t spam people or post nudity.”

and similary for Twitter

and Facebook

We want people to feel safe when using Facebook. For that reason, we’ve developed a set of Community Standards,…

There is of course a problem with the idea of, on the one hand creating a safe space on the individual level, and at the same time creating a platform that allows everyone to directly or indirectly interact with each other. The individual safe space for all requires continuous censorship and biased news selections, hence:

The safe space is a filter bubble.

Instagram literally wants every Instagram user to respect every other Instagram user, based on what norms and what level of sensitivity? A safe space for whom? Generation ‘snowflake’ that has been conditioned to think they have a right not be criticized? The religious fundamentalists who despise any criticism as blasphemy and use freedom of speech as a vehicle to abolish it? The alt-right and red-left that cannot bear the sight of each others points of view? Are we truly too blind to see that there is no such thing as a ‘safe space’ if the truth can no longer be discussed without fear of being judged or labelled?

It is in my view naive to think that such a vague constraint will lead to anything but a chilling effect where the only true safe space is characterised by the deafening silence of opinions that are never heard because people are too afraid of being ostracised, publicly shamed or worse.

What the social media actors likely want to achieve with the ‘safe space’ objective is a maximisation of time-on-site and perhaps a lowering of your guard. Quite simply, if you truly see the main page of a social media site as your ‘safe space’ you are more willing to trust the content, and with that the article and ad suggestions. I.e. social media platforms likely instill a form of fake intimacy to increase the likelihood of engagement with promoted material and on top of that, you are more likely to share (more personal) information.

The safe space that the large social media actors should be creating is a space in which people feel confident they can have an open discussion on any topic without the restriction of prejudice, bigotry or condemnation. This inevitably requires moderators, and moderators are expensive. It is much cheaper to work with a notice-and-takedown principle, similar to search engines when individuals want to be ‘forgotten’. This requires cheap labor all over the world to cover all the time zones that react as soon as possible to flags being raised by offended individuals or legal entities who feel that their rights have been violated. So this implies hasty decisions by non-experts who probably reside in a culture and legal system completely different from the person that expressed the opinion and the person that was offended. In 2017 facebook wants to have 7500 employees working on reviewing offensive content. Google has 10000 employees working on the same for their video content. On the other hand this also implies that extremist, racist views are not dealt with as long as no flag is raised and thus extremist groupthink can develop freely. At least racist views within an online community on a social media platform can be handled without censorship; It is easy to imagine that an algorithm can detect a concentration of like minded racist individuals on say facebook. If facebook has the choice between either criminalising this community or nudging the community in a different direction then obviously the latter option is preferable if this community has not been flagged yet for hatespeech. Simply disbanding such communities will drive them into closed forums where moderation is no longer possible.

So to enforce such vague standards human processing has to take place, triggered by the complaint of any user. Thus, the so-called ‘community standards’ of social media platforms basically mean that the lowest tolerance to dissidence, critique and vice becomes the norm. Obviously this global human-processing approach is not ideal, and it is expensive, enter the next step, machine-based filtering. The Google spin-off factory JigSaw is developing Conversation AI, which is

..designed to use machine learning to automatically spot the language of abuse and harassment.. — source

I.e., a dedicated, automated tool to recognize a specific tone of voice and intent. Recall that the occurrence of function creep is unavoidable and in this case quite obvious; if one can recognize ‘abusive language’, then surely the recognition of certain ideological tendencies is a next step. More than just the development of AI to recognize certain types of language, the designated use of this technology is in the large-scale deployment on social media platforms: i.e. ideal for any malicious actor that wants to perform an ideological segmentation analysis on the population. Cynically enough Google employs the crowd to help create this technology, with over 5000 participating teams.

Twitter, although having less restrictions on the actual content, brought online censorship to a new level by purging alt-right accounts after the election of Donald Trump, supposedly because it is cracking down on hatespeech.

In case of Facebook, the community standards are quite liberal in that they promote discussion and offer tools to avoid distasteful or offensive content. The problem here is they provide no definition of hate speech, and there is no history of prior hate speech cases on Facebook (jurisprudence?), so from the perspective of the user, censorship based on community standards is arbitrary and the tools to avoid distasteful or offensive content will only strengthen the filter bubble.

In a recent response to the censorship and fake news controversy Zuckerberg wrote the following:

The guiding principles are that the Community Standards should reflect the cultural norms of our community, that each person should see as little objectionable content as possible, and each person should be able to share what they want while being told they cannot share something as little as possible. The approach is to combine creating a large-scale democratic process to determine standards with AI to help enforce them.

The idea is to give everyone in the community options for how they would like to set the content policy for themselves. Where is your line on nudity? On violence? On graphic content? On profanity? What you decide will be your personal settings. We will periodically ask you these questions to increase participation and so you don’t need to dig around to find them. For those who don’t make a decision, the default will be whatever the majority of people in your region selected, like a referendum. Of course you will always be free to update your personal settings anytime. — M. Zuckerberg

Which is an improvement, with two caveats: (1) The personal determination of what is, and what is not, acceptable will strengthen the filter bubble. (2) the information selection will take place automatically so you don't know what you did not see. Let's give Zuckerberg some time to live up to these words, in the mean time he should start to realise that he can no longer claim that Facebook is just a technology company that happens to moderate the occasional racist video. Once you start to moderate, you are broadcasting to your users that you are responsible for the content, and then your 'censoring guidelines' will pile up, quickly:

My advice to Mark Zuckerberg; take lessons from Wikimedia, Reddit and StackOverflow with regard to community-built content, where moderators are not employees but site-members. Appoint moderators per group/page and assign them responsibilities. Let users take ownership of their 'mini-platforms'.

I wrote about the companies that provide information sharing platforms and their ambition for 'safe spaces', but there is more. As I said in the introduction laws are being drafted that force the platforms to indiscriminately, arbitrarily and extra-judicially decide on the acceptability of free speech. In Germany, such a law has been passed; in particular it requires social platforms to remove hate speech within 24 hours after it receives a report, with fines up to 50 million dollars.

A natural tendency to favor propaganda and hatespeech

Hatespeech tends to be spread virally by the supporters of such hatespeech, viral within a closed community, but still, most search engines and real-time-bidding platforms will not be aware of that. It will be a spike in interest, a trending topic, a hot page, call it what you will but without the search/ranking algorithm dissecting the page in terms of it's hatefulness, evaluating the context and applying algorithmic censorship, the promotion of popular hatespeech is inevitable and unavoidable. The same holds for the propagation of any ideological niche or for example the alternative news sites that propagate, not just conspiracy theories about political and societal events but about common perceptions of reality, i.e. flat earth, climate change, etc.

Fake news is a business model and is part of the more generic fake information economy. I discern the following niches

  • conspiracy theorems: geopolitical, mystical, extraterrestrial
  • alternative science: from intelligent design to flat earth
  • alternative actualities: from political news to societal news
  • doubt injection: from(outlying) favorable reports regarding pathogenic chemicals to rumours about a company merger.
  • celebrity innuendo (i.e. gossip): perhaps the most well known niche and still largely an analog venture.

Compare this with traditional 'soft' media selection where salient features and topics are favored over more nuanced issues. So there are two major reasons that propaganda will spread more easily on social-media, first, it will resonate strongly within communities that support the message, second, the content is extremely salient as it pertains a specific controversial stance on current societal issues that easily allures a response, which in turn results in a higher click-through rate/time-on-site. I.e. there is a natural tendency for these algorithms to favor polarising advertisements over more neutral advertisements as it creates more engagement.

Google might argue that it needs more information about the people that spread the contents, or that it can put websites on blacklists but then we arrive at an earlier point: this will lead to false positives, and is basically another form of algorithmic censorship when it was the algorithmic nature in the first place that enabled an artificial viral distribution. Also, there is a slight problem with the description I gave, the salience and higher click-through rate holds for controversial statements in general, so also those controversial statements that are not per se inflammatory or hateful but rather societally relevant and perhaps even necessary from the perspective of effective free speech will be flagged. Stifling hatespeech is very close to stifling democratically necessary dissidence. One can even argue that dissidence is initially received as hatespeech from the eyes of conformists who have internalised their concepts of normal. I.e. we should be very careful in loosely applying the term hatespeech. Hatespeech is not equal to sharing opinions that completely contradict the belief system of someone else or even of the society at large.

The common believe is that viral marketing is caused by a natural cascade of increasing reach, humans providing individual advertisement to their peers, inspiring the peers of their peers to do the same and so on. Is that still the case if users are actively steered in the direction of what their peers have watched or liked? What I have discussed so far is not just the mechanism that enables filter bubbles and echo chambers, it is also the mechanism and indeed the infrastructure by which ‘viral’ campaigns can be jump-started at will. The same holds e.g. for controversial topics and political scandals.

Thanks for the suggestions..

Steering of the public opinion by relatively few actors

There is a thing called the Search Engine Manipulation Effect. The SEM effect is the result of an amalgamation of the effects discussed here and can be summarized as: the ability to significantly influence the voting behavior of undecided voters by changing the ranking algorithm of search engines.

The significant effect of search engine rankings is just one example. Facebook has experimented with the relative number of positive/negative message in their newsfeeds, and it showed that the ratio positive/negative had a significant effect on the average sentiment of the uploaded posts. More recently, it was demonstrated that minor changes in the presentation of information regarding voting led to significant changes in the number of votes.

Another example assumes a more malicious actor in the form of a government that seeks to manipulate the public opinion in a so-called spinternet. In the spinternet a large media actor, or a state, wittingly spreads false information or false opinions: the method is deceptively easy

  • mechanical turks are hired to write propaganda on blogs and forums
  • fake news stories are created and peddled to renowned news sources for further distribution

The large media actor, or simply the actor that is powerful enough to steer media actors, can now control the public opinion under the guise of social media activity. A simple example is the control that Facebook exercises over it’ s newsfeed; young journalists that were employed as contractors for the sole purpose of curating the newsfeed told that conservative news was suppressed during the election period. The efforts from Google to curb fake news has led to (non-transparant) search engine changes that push fringe news sources and opinion platforms out of the scope from the average search engine user.

Your children are being exploited too

A direct result of the democratisation of mass media is that everyone can be exposed to everything, add automated ranking based on popularity and keyword matches, combined with the monetisation of views and number of subscribers and you get Elsagate:

The freedom of speech and the right to privacy are primarily of importance for the effective participation in a democratic society as a free, independent and informed citizen and for the development of a unique personal identity. This however assumes a participating, adult citizen or an adolescent. Our children have the same rights but we as adults have the obligation to guide them through their initial rites of passage and I would argue that censorship is less of an issue if the citizen is not fully participating in the democratic process yet. We should, in short, not lose track of the reasons to filter out information or to have some sort of transparent gatekeeper.

But..we need a personalised filter, right?

Surely, without any kind of personalised filter we would be lost in the huge forest of information that can be found online. There is too much for us to handle, right?

Wrong!

We do not need a filter, we need information that is indexed properly! As for the ads being displayed, it is not in the interest of the consumer that he sees unsolicited advertisements at all.

Suppose I am searching for a particular type of product, to buy from a local store. In the personalisation paradigm you simply type the product name and it will serve you results based on your location and your preferences.

Or pick whatever example you like, in general it will be something like

{identifier of subject}

where in the personalisation paradigm, for your convenience, the following attributes are inferred from your personal data (among other data sources) that has been collected by the search engine provider, either from their own data set or from data purchased from third parties:

{description of subject} think of location, price, etc.

{subject type level 1}…{subject type level N} think of genres, topics, etc.

{information retrieval purpose} think of information, consumption, etc.

So, the price for not having to specify these labels is that the search platform needs to have (processed) your personal information. Whether that is worth it depends on the added cost of having to specify this extra information. For me personally;

I do not want a search engine to feed my cognitive consonance, I want it to facilitate my curiosity.

Whatever you search, it can most likely be depicted as a tree, and you can very quickly walk through a tree, especially if you know what to look for. This does not require personalisation as much as it requires a very basic understanding of your search goals, an interactive search tree and advanced topic analysis for all indexed webpages. Combining this with an ‘intelligent’, on-demand query assistant leads to a new hybrid search engine paradigm: (1) on-demand personalised semantic search querying in combination with an (2) interactive search tree based on topics and relations.

In a collaborative effort to schematize the indexed information, and to make it easily searchable schema.org was erected by Google, Yahoo, Microsoft and Yandex, such a schematised indexation would certainly help train this imaginary system. It goes to far to discuss the details of a new tree-based interactive search engine, but I have no problem envisioning an engine without any personalisation, do you? It seems there is already scientific effort being put in developing a personal search engine, see for instance De Vries et al..

So, to be clear, Google's and Facebook's hunger for personal information is primarily meant to serve their business model, which is displaying advertisement space, and secondarily to improve the quality of their search results.

Size matters?

Facebook, Google and Amazon are mentioned several times in this text, does that mean, size matters? Yes, but only when it comes to individual exposure. Facebook, Google and Amazon are only three actors, in a large playing field of information-based companies who strive for maximum readership, click-through-rates, costs-per-clicks and return rate. The technologies they employ are however broadly used in e-commerce. These smaller companies are building inhouse tooling for recommendation, personalisation and rank. This is possible due to an influx of data analysts, the accessibility of high level machine learning libraries and the scaleability/affordability of computational capacity. This inhouse tooling is likely proprietary, i.e. closed-source and thus non-transparant. The reason is simple, personalisation technology has become business sensitive information.

Therefore it is much harder to perform such analyses over a broad range of e-commerce companies simply because there is not enough data, at the same time the combined effect of these smaller online companies might well be similar to the effect of Facebook, Google and Amazon.

Change, now. Hard lines, that should not be crossed

What rights need to be protected? What is the minimum level of protection? What should be the penalty? I.e. what are the moral constraints at play here.

My information is mine

Personalised communication should be based on data that I control. Data that external parties can access only if I agree, when I agree and how I agree. According to Pew the vast majority of netizens agree:

93% of adults say that being in control of who can get information about them is important; 74% feel this is “very important,” while 19% say it is “somewhat important.”

90% say that controlling what information is collected about them is important — 65% think it is “very important” and 25% say it is “somewhat important.”

source

Suggestion; the current client-sided cookies and any personal information now stored on site back-ends should be replaced by/upgraded to locally stored and fetched encrypted super cookies that can only be accessed on visiting the website using a temporary public key. Online central data lockers only perpetuate the idea that your personal information is a marketable good and should be dismissed.

But what do I know, I am just throwing an idea out there, feel free to post your ideas in the comment section.

I should know what they know and what they think they know

To be able to make a conscious informed decision with regard to the sharing of personal information through a particular service one needs to know

  • with whom this information is shared and if this is payed for
  • what other information is available regarding my activities
  • what aggregate conclusions they have regarding my person

Ideally, one is informed about the nature of the possible analyses that is intended, perhaps this should be presented as e.g.

  • personal predictive
  • personal historical
  • aggregate predictive
  • aggregate historical

I should know when my personal information is being used

Think emoticons, flags, alerts, anything intrusive enough to make you realise that someone is watching you. This will undoubtedly cause an initial chilling effect, but it will do something else also. It will create a demand for technology that revolves around the protection, anonymisation and control of personal information.

Any transaction of service or product for personal information should be explicit

What..? Consider pretty much any free service you sign up for has the exploitation of your personal information at it’s core. Either for data reselling or for the application of targeted advertisements. That being said, your personal information should be explicitly mentioned as being part of a transaction as opposed to being a footnote in the terms and conditions of a free service.

Terms and conditions that make sense

Starting with a concise intelligible explanation of the most important aspects concerning my individual rights.

I should have a choice

A choice between a service and no service is not really a choice. A cookiewall is an example, it being literally a digital wall standing in the way of a customer and an online service. A deterioration is in sight in the European Union where, according to this law proposal, prior consent will be required for any kind of tracking and websites are allowed to block users that have ad blockers: hence the agreement to accept online tracking is a carte blanche acceptance for all websites. What about information retrieval, should we not have a choice what bubble we reside in? The start-up Refni hopes to tackle part of that question by allowing users to choose their bubbles for information discovery. The somewhat older start-up News360 offers something similar for new delivery.

In a more general sense: do we really have a consumer choice between free services that exploit, resell and distribute our personal information for advertisement revenue or non-free services that require a direct financial compensation but respect our privacy?

No reselling of information without explicit consent

The agreement regarding the information exchange should be between the service provider and the customer. Each other party that indirectly obtains this information should be explicitly mentioned in the privacy statements and the initial agreement. I.e. no laissez-faire data reselling.

Regardless of consent, shared personal information should be relevant for the service at hand. From the viewpoint of proportionality and subsidiarity, the information should be required for a legitimate aim and there should be no other less intrusive way to fulfill that aim.

Public announcement of fake news

According to Van der Linden et al. fake news can be countered by 'inoculating' the readers with an awareness that a) fake news (regarding in their case, climate change) is being circulated, and b) what the actual scientific consensus is. I.e. the users are primed with the 'truth'.

This is relatively easy to generalise: one simply, monitors actual user responses to news, enable fake news flags, user-based news ratings, one can monitor the CTR/share rate and one can cross-reference the information with trusted news sources. If there is a strong indication for fake news you perform a manual verification and then send out an announcement to all users.

No persistent storage of personal information without explicit consent

Without explicit consent the service provider may not store your information on persistent storage media (HDD, SSD, etc.). By default personal data may only be stored in volatile storage media (RAM, CPU-cache, etc.). This means that by default, due to power outages and memory corruption, your personal data will be lost, even without legal limitation on the storage duration. I.e. over time this leads to a natural decay of online personal information.

News on social media should be identifiable and easily amendable

Suppose fake news has been spread that has negatively portrayed a political candidate. Suppose you are able to either centrally remove, replace or edit those ads/articles as far as they are still being displayed. Suppose the social media actors, have the legal obligation to remove this information, after a notification. Then, upon identifying an article as fake or partially fake it can be centrally edited. This comes with the large caveat that news can be edited a-posteriori, potentially also by the malicious actors.

To mitigate this we need

  • unique identifiers per article that are checked at the client-side: a hash for instance, or with the aid of blockchains,
  • transparent editing: any a-posteriori edits should be visible to the reader.

Of course this is a burden for the social media actors, but it also forces them to crack down on the false news content and the quality of news redaction.

Stay away from political news..

The negative effect of recommendation engines on pluralism is best demonstrated by the earlier mentioned Facebook/Trump example. In either clase it should be clear that John Stuart Mill was not referring to ideas as an economic good when he was talking about the marketplace of ideas.

Groupthink detection

The creation of filter bubbles is the pre-cursor for groupthink, which may result in extremist views being resonated and amplified by likeminded individuals. Such an escalation can only occur because there is little to no counter-narrative. If such groupthink is detected the sub-forum should be exposed to dissimilar views, and interaction with these dissimilar opinions should be facilitated.

I understand that technically this is perhaps not yet feasible, but if possible it would avoid the use of censorship, by say banning such extremist groups which will only solidify their extremist stance.

Algorithmic transparancy

By now I hope it is sufficiently demonstrated that algorithms directly affect our personal lives and even the inner workings of our democracy.

As algorithms are directly or indirectly responsible for these decisions they themselves should become the subject of scrutiny which can only be attained if they are transparant. Anecdotes of dramatic AI-decisions are piling up. Ranging from automatically evaluating (and firing) teachers to selecting contestants for a beauty pageant and dismissing patients from hospitals.

Well known is the effect of algorithmic changes of Google's ranking engine, these algorithmic changes are a vital but highly unpredictable and non transparant traffic factor for online SME's. A significantly lower ranking in search engine listings can cause a major loss in revenue. One of those algorithmic changes involved the activation of a supporting algorithm called RankBrain that basically uses personal information and anonymised search results to 'guess' what the user is searching for. This machine-learning driven algorithm tends to suggest search results that are more likely to be clicked on, and we already saw that fake news has a higher clickthrough rate than real news, so yes this algorithm actually favored fake news articles.

The effect of a Facebook.com algorithmic change on Guardian's reach, source.

Another example of the impact that an algorithmic change can have is shown above. In a matter of days a facebook update dramatically reduced online readership for the WSJ, the Washington Post, the Guardian and Mashable.

So, algorithmic changes should not come as a surprise and the algorithms should not be black boxes.

Throwing tech at it is not enough!

Automatic truth-detectors: at the price of freedom of information and the freedom of speech? Automatic truth detectors are bound to look at the normalcy of expressions, their rates of adoptions and how people respond to them. This invariably leads to an oppression of dissident speech in favor of the status quo. The reason that an online automatic truth detector cannot exist is that often, facts, can only be verified offline. This requires 'boots on the ground' in the form of investigative journalism. We cannot 'machine learn' our way out of this conundrum.

Diversity increasing recommendation engines? Who controls the dials, what is diversity? Again, algorithmic transparancy is key. But, indeed, this would be a necessary step if we are to continue with the integration of recommendation engines in our lives.

We should keep in mind that the unhindered application of technology facilitated the above issues. The transition away from human-selected centrally produced news articles to machine-selected unfiltered news has overall resulted in a much higher accessibility of information, a much lower threshold to share information and yes a much higher likelihood of being exposed to an abuse of these possibilities.

To counter this abuse we need humans again, for accountability and for a verifiable version of the truth, so cancelling out humans in favor of the demi-God called machine learning is not the best approach, Facebook. What is the best approach? My guess:

A combination of digital and analog journalism with a transparant measure for credibility attached to journalists, news sources and news articles

For now, factcheckers will do. With regard to filtering rich media content for violence, pornography and what not, perhaps here machine learning should be emphasized more, for instance to perform automatic blurring, to avoid PTSD of the human filters.

Another soft approach is empowering the younger generations through education on the various forms of information dissemination they will encounter. But also, how to dissect news, how to verify propositions, how to order and filter information, and most importantly how to apply this knowledge in active public debates.

As for automated censorship, there is no reason that social media platforms should be any less diligent when it comes to the removal of online speech then the removal of say a website by a hosting company. Furthermore, the affected individual should have the right to a rebuttal, the decision should be transparent and publicly accessible and the algorithm should ideally be open source and in the very least available for scrutiny.

A coding code of conduct?

Given that programmers and machine learners are instrumental for the infrastructure layed out in the above text, empowering them to say a firm no to their employers will certainly help to create an ethically sound IT-core. It is beginning to sink in with the Silicon Valley crowd that there is a thing called ethics. The evangelisation of this strange concept, where business actually have a responsibility towards society is pushed by few people, like Adam Alter, Joe Edelman, Tristan Harris and his organisation Time Well Spent.

Of course, the control that human programmers have will diminish with the increasing penetration of AI in coding and machine learning development.

Hence, we have to act quickly.

In the mean time I note the following key descriptors for algorithmic decisionmaking:

  • accountability: any automatic decision-making process should lead to actors that can be held accountable.
  • responsibility: this accountable (legal) person should be expected to have discrete control over this process.
  • agency: these processes should involve a human-in-the-loop approach whenever reasonable.
  • transparency: automatic decision-making processes should be transparent so that they can be audited.
  • contestable: decisions made by automated decision processes should be contestable by those affected.
  • unbiased with regard to features that are not self-created: all personal characteristics that are not within the control of the affected person should not influence the outcome of the process. This not only involves skin color, sexuality and gender but also e.g. any form of trauma.
  • fair with regard to features that are self-created: all relevant features relating to behavioral patterns should be used by reasonable measure.

We should monitor the monitors

The open source software openWPM was used by Princeton researchers to inventory the use of different types of tracking cookies. Such a tool can be used not only to check what types of cookies are used and if they are compliant with regulations but it can also be inferred what personal information is being logged. This is type of research is crucial. Their research demonstrated the wide-spread use of stateless trackers for instance and at the same time demonstrated the ability to detect these trackers using advanced data analysis techniques. The integration of such technology in privacy tools is crucial for protecting your personal data and enforcing privacy legislation such as the European GDPR.

Strict enforcement of http-header protocols

Public http-traffic should not be allowed to carry along arbitrary, unregulated http-headers and this should be hardcoded in the http(s)-protocols.

To start with, the experimental X-headers in HTTP-requests should be banned from live applications, specifically in case it can be exploited for tracking.

An alternative..opportunities!

The technology should serve us, but how? What if the algorithms are able to take care that the behavior they stimulate are in line with our goals?

So, maximisation of..

  • time well spent
  • information that is relevant for pressing societal and environmental issues
  • information that motivates and inspires, i.e. the positively enforces you
  • information that is so dissonant with our opinion that we enrich our knowledge and widen our scope but not so dissonant that it justifies our internal biases and prejudices
  • consumer behavior in favor of the so-called long tail products, an undelivered promise of recommendation engines
  • ?

Is a personalised communication service aimed at improving your life and protecting your privacy not worth a few investment dollars?

How?

  • Hybrid director steered machine-human recommendation systems
  • À la carte recommendations, i.e. targeted advertisements as a b2c-product at the sole discretion of the user
  • Diversity maximizing recommendations, both on an individual and on an aggregate level
  • User-initiated content enrichment
  • AI augmented/assisted content selection
  • Modular machine learning models
  • Non-organic-traffic detector to identify fake news
  • Continuous improvement of bot-detectors, e.g. through more advanced Turing tests
  • User awareness regarding news diversity and their personal filter bubble
  • Diversity/position awareness tools to assist human media editors
  • Newsfeed as a service: dedicated newsfeed providers that collaborate with bonafide news agencies and journalists that are responsible for ensuring the quality of the articles

Furthermore, let the age of privacy begin!

  • full, precise control over how your personal information is shared, with who and why.
  • services to cloak your sensitive information on demand
  • services to make you aware of the information that you expose to the public
  • privacy as an integral part of software/app design
  • ethics as an obligatory part of machine learning/data science curricula

And let fake news be a thing of the past (the real past..):

  • Transparent integrity/authenticity/duplicity checks for online news items and inherent source/broadcrumb information
  • Open access to distributed, persistent storage of archived news articles

So..

The way that automated information feeds and search rankings are used is damaging to the pluralism of our democratic societies, undermines our right to privacy, takes away the people’s control of the online information flow and it lays the IT foundation for any government or large corporate actor to monitor and control the population. Furthermore the increased reliance on algorithmic decision making (from pricing to information ranking) has created mechanisms that lead to arbitrary censorship on a massive scale as well as an overall decrease in information accessibility and an increase in automatic discrimination.

Mitigation lies in the development of

  • more holistic user-centric algorithms that not only optimize conversion but also take well being and time-well-spent into account, hence recommendations need to move away from the current business-centric approach
  • transparent ranking algorithms that emphasize relevance over popularity, with discrete user control over the type of ranking
  • transparant, and on-demand personalised recommendations that enrich user experience
  • online personal data protection technology to replace the use of cookies and the reselling of personal data to third parties and give netizens control over, and insight in their personal data sharing
  • recommendation engines that maximize diversity and conversion simultaneously on the mid to long-term by promoting the long tail
  • non-discriminatory algorithms
  • awareness/education among software developers with regard to privacy-by-design and diversity-by-design principles, technically and ethically
  • legal/moral/technical frameworks that allow a concurrent worldwide development of the above items, for instance absolute metrics for diversity and machine learning algorithms to extract higher level meta-data from content
  • ..and finally, legislation to keep the media companies (yes you too Facebook) in check. For instance, legislation that enforces a clear distinction between actual content and advertisements and that curbs the use of click-bait tactics.

I distinguish three main threads: transparency, user control and diversity.

Another matter is the formal responsibility of social media platforms. Not so long ago a company such as Facebook would be considered as an information intermediary, relaying information published by third parties, and allowing a fairly hands-off approach. This no longer holds, the large social media platforms are actively censoring, redacting and controlling the contents and in doing so they are in fact media companies that have to comply to the regulations, standards and responsibilities that come with that title. This automatically has the effect that Facebook becomes liable for fake news, defamation and hate speech more easily. Especially since the recent Delfi AS v. Estonia ECHR ruling, that basically gives media platforms liability over news comments. Ironically this will expedite and even necessitate the control of the information transmitted through Facebook even more. The alternative, that Facebook let's go of control, is unlikely due to prior commitments that determine their cashflow. This opens up space for a competitor that is similar to Facebook but that allows for anonymous and uncensored debates.

If a large-scale social media platform like Facebook wants to remain open/laissez-faire, it should move away from their ‘safe-space’ paradigm and hold users directly accountable for their online ‘conduct’. If not, due to liability it will be forced to redact/filter the content from users. This, in turn, will confirm their liability. This would not be a problem were it not that the financial risk of liability of a world-wide media platform with 2 billion content creators is not maintainable.

A recently announced move from facebook that seems in line with this suggestion is the application of a pay-wall for news content: most likely to have two seperate news streams, one for which facebook takes full responsibility and one unfiltered user-generated feed for which the responsibility lies with the users.

--

--