“We don’t know what we don’t know.”

While Donald Rumsfeld’s statement may not have been in reference to Big Data and algorithmic curation of information on the internet, it is definitely an appropriate reference now. The amount of data that is being collected on you at this very moment is staggering: everything from where you are, to the browser you’re using, to the type of content you enjoy. These bits of data, innocuous on their own, are being fed into an algorithm that may later decide if you are worthy of a mortgage, deserving of a job, even how long your prison sentence should be. And, while the collection of this data is in and of itself ethically questionable, I am far more concerned with the decisions that are being made based on data I didn’t know was collected, using logic I will never understand.

Most of us are familiar with our online data being used to target us with ads. You visit a website, put something in your shopping cart and weeks later you’re still seeing ads for that very item on every site you visit. While this might be frustrating, it doesn’t feel particularly dangerous. But, what happens when the information collected about you ends up in the hands of people you never considered might be interested in it? Alice Marwick asks us to imagine a scenario where your health insurance knows what you’re eating from data collected through your nutrition app and can track your exercise directly from your wearable fitness tracker. What sort of selective discrimination might this allow for?

Some might argue that making decisions in this way is sensible. After all, the health insurance company needs to manage risk. But, there are larger issues at stake. What does it mean for a company to “know” you based on your data? What is lost in this algorithmic shorthand? David Cole quotes NSA General Counsel Stewart Baker as saying “metadata absolutely tells you everything about somebody’s life. If you have enough metadata, you don’t really need content.” And he’s not even talking about hard data, just the metadata that is skimmed off the surface!

Peeling back another layer begs even more questions about who designs these algorithms, and why. Algorithms are often assumed to be impartial. They allow for blind, data-driven evaluation of complex factors to make decisions that might otherwise be muddied by human bias. Unfortunately, algorithms are always “based on choices made by fallible human beings…many of those models encoded [with] human prejudice, misunderstanding, and bias” (Cathy O’Neil). Cloaked in code, algorithms allow companies, governments, organizations to target and control specific groups of people. All of this based on incomplete data, collected with computer models that “despite their reputation for impartiality, reflect goals and ideology” (Cathy O’Neil).

Aside from this discrimination human-made algorithms allow for, I am further troubled by the curated reality they present. Taking Eli Pariser’s comparison of two Google searches as an example, it becomes immediately clear the problems that can arise when algorithms curate the world so tightly. Because Google is a privately held company, the data that fuels this curation, and the ways in which it is computed will always be a mystery. More significantly, the content Google has, without consulting me, filtered from my view, vanishes into thin air. I am simply forced to accept the version of the world that is being pushed to me as if it were the truth. Fighting this “reality” is frustrating at best, impossible at worst as these algorithmic black boxes “do not listen. Nor do they bend. They’re deaf not only to charm, threats, and cajoling but also to logic..that’s part of their fearsome power” (Cathy O’Neil).

“A model, after all, is nothing more than an abstract representation of some process…the model takes what we know and uses it to predict responses in various situations…[it] tell[s] us what to expect, and [it] guide[s] our decisions.” -Cathy O’Neil

What does all of this mean for me, a human living in a data-driven world? If my model of reality comes largely from what I can find on the internet, what does it mean that my reality has been filtered for me? What does that same filtering mean for a company trying to make decisions about their target audience, a government trying to make decisions about refugees, judges trying to make decisions about prison sentencing? What happens when our model of the world is built on incomplete information, and so is everyone else’s?