Implications of data mining and algorithms

The rapid ascent of data mining technology and company’s realization of power of garnering data has left us many questions and doubts. As more social and commercial activities take place online, more opportunities corporations are gaining and exploiting available personal data. Our mere data fragment, without our realization, becomes helpful and useful piece which then is cooked into a big delicious plate of information by them, in a way that suits to their tastes, not ours.

First page of CBS news with an advertisement of ‘Google Adwords’ after I searched and visited ‘Google Adwords Site’
First page of youtube that shows my video history and recommendation that never attract my attention and interest

The data mining process is so smooth that we could not recognize at all, which seem to be well-structured not disturbing our internet exploring, but it means they do not allow us to simply look at it. Looking backward, I could not tell the exact time when I started to realize the fact that every clicks I have done had been collected, recorded, and analyzed by someone I’ve never knew before, and I’ll never know in my entire life. I remember that I was frightened when noticed an advertisement of a clothing brand in google webpage that I visited just few minutes ago. And the advertisement started to pop up every page I go since that day I visited. A very similar process has been happening in Youtube as well. It tracks my online movements and stores data about videos I’ve watched. Afterwards, when I enter the first youtube page, it shows lists of videos that I’ve clicked even once, or videos that I might want to watch without asking me.

At first, I regarded its purpose to be a sort of user centered system that shows exact products and services that I am interested in. Then I started to have serious reservations about their purposes. What are they doing and what have they done with this customized advertisement and contents? I have no recollection of allowing them to track my movements or to show me my internet history straightforwardly. In “Weapons of Math Destruction”, Cathy O’Neil mentioned,

Like gods, these mathematical models were opaque, their workings invisible to all but the highest priests in their domain: mathematicians and computer scientist.

How come they are making the process opaque thus making themselves the only rulers of the domain of data mining? Is it our share of being embarrassed when our private history is unexpectedly exposed to other people next to us? As one of tons of online users, how far should I allow them to collect and exploit my data?

I am a type of user sees the power of data mining but with no direct effects, since it has no relation with my livelihood. All I am doing is trying not to leave much of traces on the internet. But how about the others whose life is deeply involved in data mining? Cathy O’Neil introduces an instructor whose living was largely affected by not only inconsiderate data analyst but algorithm which is even not human being. Sarah Wysocki, a school teacher with outstanding capability and reviews from her principal and her students’ parents, got stuck in a big flaw of scoring system. The story revealed a huge weakness of the system but statisticians chose to rely on numbers instead. It is miserable that we are somehow evaluated and ruled by numerical algorithm system. No sense of humanity we could find in it at all, nor even those highest priests who manage and manipulate the system are losing humanity and counting too much on it believing it as flawless.

Also those highest priests’ argument on their purposes sound reasonable in terms of safety. The ethical problem throw us a question when it is engaged in our safety worries including vandalism and terrorism. David Cole states “NSA(the National Security Agency) and the Foreign Intelligence Surveillance Court have a USA Patriot Act provision that authorized collection of business records relevant to a counterterrorism investigation”. Their purpose in this case seems so reasonable that we can’t refute. And we are unwilling to check out the surveillance process because some of us are not even aware of the process, and because we know that the system process is convoluted enough to simply hand over the control authority to them, not knowing how the data mining process is going on.

The video, Generation Like on Frontline with Douglas Rushkoff, reveals the parts of invisible entire process of data mining especially on Facebook. Basically, the system connects things and deliver recommendation based on your ‘likes’. The thing is, only minority has accessibility to the system and control all the secret algorithms saying “Understanding how to quantify that value is huge”, “It’s all obvious and transparent”. Is this behind process really obvious and transparent? Then why there are so many questions from online users asking where those recommendations, advertisements, ‘people you might know’ on Facebook, and etc are coming from, then realizing the process is beyond their knowledge, and ending up with giving up understanding it?

A number of problem of data mining and algorithms arrive at one terminal question: it’s all about transparency. This invisible process embedded in our everyday lives dominates our choices/options, which is beyond our consciousness, which is intentionally designed and structured by high priests.


  • Cathy O’Neil, “Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy”, MIT Press
  • David Cole, “We Kill People Based on Metadata,” New York Review of Books, May 10, 2014
  • Generation Like on Frontline with Douglas Rushkoff, PBS news