Whose Reality Is It?

A reflection upon algorithms and data mining

Whoever lives in a modern society will acknowledge this fact: big data has become ubiquitous. From socializing to entertainment, from academic research to business development, there are always some databases that are running behind the scene, and some sort of algorithms that are operating upon them to generate the so called “optimal” results. People who enjoy this kind of data world usually have the following mental model: individuals are biased and sentimental, and they can’t take countless considerations into account when making decisions. But computer is quite the opposite. It doesn’t have emotion, and it can instantly calculate on many data simultaneously to give the most accurate result. So what’s wrong with a world full of rationality and rightfulness? Well, is this mind set really true?

In Cathy O’Neil’s paper Weapons of Math Destruction, she wrote the following:

Mathematics was not only deeply entangled in the world’s problems but also fueling many of them.

When people think about computational algorithms, they often ignore one thing: the results can not be produced by simply collecting vast amounts of data. The data needs to be manipulated by mathematical rules. And these rules are designed and implemented by fallible and prejudiced human beings. So when rules have flaws and biases, the results can no longer said to be “optimal”. But the concerning part is that, the imperfection of algorithms can hardly be detected in a short time. Major issues would not come to the surface until very later. This is again, due to the fact that people believe data-driven result is trustworthy; they simply don’t question the validity of the system.

In her paper, Cathy mentioned many examples that addressed this issue. In the story of a highly-regarded teacher who is fired due to a low score on a teacher assessment tool, the correlation between student’s grades and teacher’s performance might not always be true, yet the computer does not take unique cases into account. And when it produces faulty predictions, it does not know or learn from its mistakes either because the algorithms do not have that section at all. In addition, the computer would use these results as new data sets, and continue the process to make even more mistakes, deviating even farther from the reality.

So far we’ve talked about how a flawed algorithm-driven system could create an unreal reality and how people could be mistreated as a result. But from the creator’s point of view, at least their intention is good. What happens if the creators themselves want to superimpose a constructed reality to the users and try to alter their perceptions and behaviors with non-aligned interests and even deceitful intentions?

My recent experience with google calendar could be a starter of this discussion. I booked a hotel for vacation several months ago, and one day I suddenly received a notification from my google calendar reminding me to go to the hotel. I was literally shocked by it because I did not give any permission to google to do so. The system automatically extracted all information from my email content and marked on my calendar. This means that there is a database beyond my reach that stores all the information including my locations, activities, contacts, etc. Who knows what other information is in the system! It was the first time that I felt insecure of using google products; my privacy has been severely invaded.

From then I started to realize that data collectors and algorithm creators just make too many assumptions. They think people all like to share information to each other, to only see things they are interested in, and to connect to every single device within their proximity. Online ads assume you love video games even you just bought it once, and the flashing images constantly allure you with super polished pictures and good deals; Eli Pariser told us in TED Talk that Facebook would conceal all his conservative friends’ posts on the feeds just because he clicked more on his liberal friends’ links, which may not even relate to political content. In the article We Kill People Based on Metadata, Devia Cole told us that even the National Security Agency thinks metadata collection is fine and their threat to privacy is negligible. But that is yet another huge assumption the rule makers have made. The NSA stated that the system stores everyone’s calling records but not content. So if you are calling your mom, who is calling a delivery person, who is calling a drug dealer, who is the top suspect of a terrorist attack, NSA people will think you have relations with the terrorist and you will be put into suspect list for further investigation. How bizarre is that?!

Nowadays companies and institutions are increasingly using data and algorithms to constantly define who each individual is, and try to shape each person’s life based on their digital model of him or her. As people are unconsciously being exposed to that model, viewing the world and themselves through the lens of this problematic system, they gradually turn into a person that was far away from their original vision. Just like what Eli Pariser said in his talk, “there is this epic struggle going on between our future aspirational selves and our more impulsive present selves”.

As data-driven system is playing an increasingly important role in our daily life, we really need to contemplate on the flawed nature of the system. On the one hand, there are the creators who are overlaying a constructed reality to the people based on personal interests, and they give full permission to the machine to do so. On the other hand, every system itself is defining its own reality on an unprecedented efficiency and scale, and use it to justify the results it produces. How many people would be misled or be mistreated by these two flawed layers? How many social problems would occur?

The most urgent task now is to ask system creators to have a sense of ethics and civil responsibility, and to understand clearly what kind of ill reality they and their systems are delivering. Conflicts, diversity and emotions should not be swept away by computer algorithms, and everyone who is using the system should be aware of the flawed nature of the data computing. It is only we constantly reflect upon the question “whose reality is it?”, we may find the right attitude toward data and algorithms, and use them more consciously to shape a better world.