Digital Dragnets: How Algorithms Judge Us

Published in

Version 1

16 min readFeb 1, 2023

Introduction

Algorithms are increasingly regulating people, increasing inequality, whilst all the while posing as neutral mathematical tools. Algorithmic authority without accountability threatens democracy. Algorithms are silently shaping and controlling our destinies — it is a myth that because these algorithms are implemented in un-emotional machines that they do not perpetuate bias or automate the status quo. As opinions embedded in code, they codify the past, distorting the truth. At the root of this problem is the modelers’ choice of objective, or measure of “success”.

Society is currently measured in terms of something quantifiable; profit, staffing efficiency, or loan-default rate, for example. Society is suffering from a quantification addiction, throwing out human insights from Big Data that can’t be expressed numerically.

Mathematical models should be society’s tools, not its masters. Data Scientists need to become intermediaries for ethical discussions that need to happen in wider society, enabling us to integrate these qualitative human insights with Big Data to yield “Thick Data”. This will help us rescue the context lost due to our blind faith in Big Data alone, resulting in an Augmented Intelligence far greater than Artificial Intelligence.

Quantification bias is the unconscious belief of valuing the measurable over the immeasurable. Big Data Analytics — the use of statistical analysis with respect to Big Data — raises some important ethical issues. These secretive “black box” algorithms have authority without accountability. Judgements made by software sort and separate us into winners and losers in this new data economy. With the admirable intention of making things fairer, in reality this results in codifying and repeating past discriminatory practices. Badly designed and un-accountable, algorithmic systems comprising Machine Learning can sometimes wreak havoc, silently automating the status quo.

Data: The Exhaust of the Information Age

“While the individual man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty.”
— Sherlock Holmes, The Sign of the Four

Ambient, large-scale surveillance is a by-product of humans’ interaction with computing. Whenever you carry your cell phone with you, you’re entering into an implicit agreement with your carrier. In exchange for being able to make and receive service, you allow this carrier to know your location at all times. This isn’t explicitly specified in your service contract, it’s simply inherent in how the cellular service works. Carriers track where you live and work, where you spend your time, and with whom, since providers know about all the other phones in your area that connect to the same cell tower. This accumulated metadata gives a very accurate picture of how you spend your time; It doesn’t rely on human memory and it doesn’t lie.

“In 2012, researchers were able to use this data to predict where people would be 24 hours later, to within 20 meters.” [13]

Location data is so valuable it is routinely sold to data brokers, who resell to anyone willing to purchase it. Companies exist whose sole business activity is profiling, and profiting from, people. But it’s not just cell phones that report on us, it’s a far larger issue. Computers we interact with are constantly producing personal data about us. Computers constantly produce data, it’s a by-product of everything they do. All our financial transaction data, and all we read, watch and listen to online. Every time we use the Internet we leave a digital footprint behind detailing how much time we spend online, from where we access it, what we buy, who we communicate with, ads we click on, and resources visited or searched for. We exchange the convenience of free services, from companies such as Google, for surveillance.

Google knows us well enough to finish our sentences. Many people, when they look for answers on Google, unwittingly disclose their “pain point”. In the short term, at least, pain can be a person’s greatest motivator. The greatest laboratory ever for consumer research and sales lead generation is provided to advertisers by today’s Internet. New sources of data exhaust are promised by the Internet of Things — inter-networked and in some cases, wearable, devices recording our every move and location, allowing for even greater profiling and monitoring, influencing and modifying our very behaviour; Orwellian Thought Policing with the added twist that we voluntarily pay for our own surveillance.

Increasingly we communicate via computers, using Facebook, Twitter, Instagram, Snapchat, WhatsApp; data is a by-product of this high-tech socialisation. It has become almost impossible to take an active part in society whilst also avoiding having personal data collected electronically. It’s almost as if strong personal privacy is an impediment to efficient day-to-day living.

Take, for instance, the example of Dr Janet Vertesi, assistant Sociology professor at Princeton University. Dr. Vertesi decided to keep her pregnancy a secret from online marketers. She took exceptional privacy measures involving the avoidance of all social media, making any online purchases of baby-related items via TOR, with any in-store purchases strictly made in cash. Even though this behaviour is all perfectly legal, ultimately she concluded this experimental opt-out of having her personal data become part of the Big Data pool was:

“…4costly and time-consuming and made her look, in her own words, like a bad citizen.” [9]

Machine Learning

This collection, storage, and analysis of data is known as Big Data Analytics. New closed-source proprietary algorithms are transforming all this data into information — after all, Big Data only becomes useful when meaningful information can be mined from it.

Today’s technology affords the capability for dangerous mass surveillance, automatically enabling discrimination based on almost any criteria including religion, race, and political belief. Without any real ability to opt-out, it is being used to control what we see, and what we can do, offering citizens no recourse, and without any checks or balances.

We underestimate how easy it has become to identify us using data that we consider anonymous. The price of data storage has made feasible the indefinite storage of all the data we produce. The potential danger of abuse of indefinitely stored, incidentally collected data is huge. Nowadays, using Data Mining and Machine Learning, it becomes possible to not only detect patterns in data, but to predict them.

In order to acquire this knowledge from large datasets, supervised or unsupervised Machine Learning techniques are employed. Supervised techniques use training data, correct examples are labelled and an Artificial Intelligence (AI) develops an algorithm for classifying new examples. Unsupervised learning — clustering — uses unlabelled input with no target, and these algorithms are designed to explore and discover hidden patterns in data.

Considerable effort is given towards detecting and preventing credit card fraud based on Machine Learning clustering methods. Credit card companies gather lots of data and use it to profile their customers. When a transaction occurs, the cluster identification is computed for that transaction. If it differs from the existing cluster identification for that customer, the transaction is treated as anomalous, which usually results in a call from the company to the card holder. These models are based on past data, and they can only predict a future that behaves exactly like the past.

Consider Amazon’s business model, predicated on acquiring their customers’ preference data, to enable algorithms to predict what else that particular customer might like and ensuring they’re given every opportunity to purchase it. A customer may freely click on the “Buy Now” button without that decision reflecting what really matters to the customer in the long term. Decisions such as this, that at first glance appear free, are not truly autonomous. Autonomous purchasing decisions are “owned” by the customer, the customer is fully committed to them, and endorses them as reflecting their deepest values.

“we value autonomy of decision, [. . . ] [it] is part of what it is to be a fully mature person.” [10]

We appear all too willing participants in this trading away of our privacy and data, for the purpose of convenience. The business model of Facebook is to maximise our convenience and anticipate our needs, all the while collecting our data. When a business profits from private information it intentionally objectifies its users. Privacy policies inform us how our information will be used — a theatrical nod to our status as autonomous human beings.

Machine Learning is not the sole problem. An algorithm can process data and conclude with a probability that a certain person might be a terrorist, or a bad hire, or a risky borrower, distilling that probability into a score, and in doing so, turn that person’s life upside down. Opaque and proprietary, these un-questioned and un-accountable algorithms operate at scale, establishing norms something close to law, defining and delimiting our lives, and turning a mere nuisance into a tsunami force. If a bank models you as a high-risk borrower, that is how the world will treat you — a deadbeat who can no longer rent an apartment, get a job, or a car.

“There’s a few things you can automate [. . . ] but most of it is to augment people. Nothing should make the decision for you, it should make you a better decision-maker because you’re getting these new inputs.”
— Roger Magoulas in [8]

The Data and Society Research Institute [1] is concerned with technology’s implications for the future, and how these algorithms and secret models can disproportionately impact the very same groups that have experienced past discrimination. One of its Fellows — Gideon Litchfield — talks of this “dark side” of Big Data, the perpetuated injustices, cascading disadvantages, and the distinct lack of “Algorithmic Accountability”:

“because people are slightly less apt [. . . ] to get credit, [. . . ] jobs, that will all feed into the data about them, which will feed into their abilities to do other things. There’s a risk of it all becoming a self-perpetuating cycle.” [8]

A map highlighting with red lines, acceptable residential areas.

Corporate surveillance is used, fundamentally, to discriminate. Categorising people, and marketing goods and services on the basis of those categories, is discriminatory. A practice known as “redlining” has been in common use since the 1960s. Banks would disapprove mortgages in minority neighbourhoods — a literal red line would be drawn on maps clearly delineating these zones. Alternatively, mortgages would only be issued to minorities looking to finance home purchase in predominantly minority neighbourhoods. Generally, redlining is the name given to the practice of denying or charging more for a good or service by using neighbourhood as a proxy for race. This is of course illegal, but it has the potential to be much more pervasive and discriminatory on the Internet; This practice of “web-lining” is made possible because corporations collect such vast data about us all and use that to compile detailed profiles. The US White House Big Data report of 2014 arrived at the following conclusion:

“big data analytics have the potential to eclipse long standing civil rights protections” [13]

This concept of Big Data becoming the means by which conclusions are justified leaves little ground for subjective response. Numbers are hard to argue against — data is seen to be better than human intuition or judgement:

“data seen to be powerful whilst human agency [. . . ] seen as potentially unreliable, inefficient, [. . . ] limited in the depth of [. . . ] analytic gaze and impartiality.” [6]

Yet algorithmic judgement is open to broader social stigma and prejudice:

“automated systems claim to rate all individuals [. . .] the same way, thus averting discrimination . . . But Software Engineers construct the datasets mined by scoring systems; They define the parameters of data-mining analyses; [. . .] Human biases and values are embedded into each [. . .] step of development.” [12]

Black Box Algorithmic Systems

Black box, converting given input to an output.

To build a black box algorithmic system requires two things:

Data (what happened in the past)
A definition of success (the objective looked for)

Algorithms aren’t objective, they’re opinions embedded in code — they codify the past. Unlike other engineering projects, failure is unheard of. Whereas a failed bridge will collapse, or a failed airplane fall from the sky, a black box algorithmic system is closed, proprietary, and can silently wreak havoc with peoples’ lives, repeating past mistakes and automating the status quo.

In 2014, the New York Times ran a story about Jannette Navarro [2], trying to work her way through college as a Starbucks Barista, whilst caring for her 4 year-old child. Her ever-changing work schedule made her life impossible, regular day-care was beyond reach. She ended up putting school on hold, the only thing she could schedule was work. Her story is typical — food service and retail workers find out about scheduling changes with less than a week’s notice, often just a day or two, thanks to automated scheduling software. Within weeks of publication, the major corporations mentioned were shamed into announcing they would adjust their practices by adding constraints to their optimisation models, yet a year later in a NY Times follow-up, many had failed to meet their own pledge of learning to live with less optimisation. Minimal staffing is baked into the culture with business models built to feed their own bottom line, and this is reflected in the objectives of their operational software. Scheduling software appears to be one of the worst offending mathematical models, creating poisonous feedback loops. Haphazard work scheduling makes school impossible for many workers, which in turn dampens employment prospects and traps them in the over-supplied pool of low-wage workers. It’s almost as if the software was designed to punish low-wage workers and to keep them down.

The banking industry is frantically raking through our personal data exhaust in its attempts to boost profits. However, unlike most consumers of our data exhaust, the banking industry is subject to government regulation and disclosure requirements, meaning customer profiling carries reputational and legal risk. American Express learned this the hard way in 2009. Looking to reduce its risk on its own balance sheet, Amex cut the credit limit of some customers, but because of regulation, the credit card giant had to send a letter to those customers explaining why. Customers who shopped at certain establishments were more likely to fall behind on payments. It was “purely statistics”, the company wrote. There was a clear correlation between shopping patterns and default rates. It was left up to unhappy Amex customers to guess which establishment had poisoned their credit and placed them in the bucket of potential financial deadbeats. Careering into a nasty recession with less credit, this lowered credit score would drive up their borrowing costs. It’s probably safe to say that many cardholders frequented stores associated with poor repayments because they weren’t swimming in money in the first place, and would you believe an algorithm took notice and made them poorer!

The root of the problem is the modelers’ choice of objective. We need to change our current definitions of “success”, which is currently measured in numerical terms: profit, efficiency, or default rate. Starbucks’ staffing efficiency is quantifiable — the cost to wider society of Ms Navarro dropping out of school is not. Humans need a sense that there is some equal ground upon which we all compete, and this is what produces and legitimises unequal outcomes. But what would be the Big Data analogue to the credit crisis of 2008? A growing dystopia with rising inequality; algorithms ensuring those deemed losers remain that way, a minority raking in outrageous fortunes and gaining ever more control over the data economy convincing themselves that they deserved it?

As individuals move through their daily lives, leaving a trail of data exhaust, and as that data is systematically gathered, analysed, and shared, it is being (ab)used to control their access to the goods and services in the capitalist economy. The power to use data points as proxies in order to filter opportunities, eligibility, prices and even exposure, is great power indeed:

“. . . authority is increasingly expressed algorithmically.” [3]

The internet you see is being increasingly tailored to what your profile indicates. Political activist Eli Pariser calls this the “filter bubble”, an internet optimised to your preference. Yet it’s incredibly harmful to never have to encounter things you don’t agree with. Panoptic sorting is a far cry from the early dot-com days where “nobody knows you’re a dog.” [4]

As Day [7] observed:

“no longer is surveillance of the individual enough, but now he or she is co-located within predictive matrixes of actions and objects through linked associations with other subjects, objects, and events in databases and their indexes.”

This presence of data mining is embedded in the commercial and public sectors, a well-established cultural presence that has influence beyond what we might expect. Big Data Analytics involves more than just targeted advertising. Decision-making in a whole range of social realms such as education, employment, health care, and policing, become possible. There is real potential to usher in new opaque forms of social sorting, and in so doing reform the social order, the very concept of Big Data becoming the means by which certain viewpoints are promoted and conclusions are justified. However, proxies often appear to work much of the time in this statistical universe; wealthy people do buy BMWs and cruises, poor people often need a payday loan. If efficiency and profits surge, investors are inclined to double down on the systems that place thousands into the correct buckets. Big Data triumphs, losers in this unregulated universe have no recourse, there’s no setting the system straight, they’re collateral damage. The guiding principle of this new economy is “the more data the better.” Yet for the sake of fairness, some of this data should remain uncrunched.

“. . . with petabytes of behavioural data at their fingertips and virtually no oversight, opportunities for the creation of new business models are vast.” [11]

Conclusion

“In 1890, Samuel Warren and Louis Brandeis published an article in the Harvard Law Review arguing for what they dubbed ‘the right to privacy’.” [10]

This is considered one of the most cited legal articles in US history, and it was precipitated by the recent invention of the Kodak camera, which was being used to photograph celebrities of the day, in unflattering situations. Warren and Brandeis worried that unfettered use of new technology had a negative affect on an individual’s inherent right to control access to private information. It seems technology has always outstripped any sense of how to use it ethically.

Technological changes occur so fast we don’t evaluate their effect or weigh their consequences. In many ways, our society is struggling with a new industrial revolution. We need to renegotiate the trade we’re making with our data, and be proactive in our dealing with new technologies. Thinking about what values we want our technological infrastructure to embody, we must address the question of what data we should be saving and for what duration. The problem is with the algorithmic objective. Changing the objective, by explicitly embedding better values into the algorithms, from targeting people, to targeting assistance, can make these digital dragnets a force for good. We cannot rely on the free market to right these wrongs. The very nature of this targeting can destroy lives in a death spiral of statistical modelling whilst keeping society’s winners from even seeing this. Big Data and Machine Learning codify the past. Only humans possess the moral imagination to invent the future. We need models that follow an ethical lead, putting fairness first — a concept that resides solely in the human mind, resisting any attempt at numerical quantification.

The free market failed to curb its excess during the first industrial revolution of the late 19th and early 20th centuries. Government had to intervene and establish new protocols, inspections and standards. This levelled the playing field for all participants. Admittedly it raised the cost of doing business, but it benefited society as a whole long-term. Mathematical models should be society’s tools, not masters. The success of today’s models is currently measured in terms of something quantifiable; profit, efficiency, or default rate. But when people visit a search engine for information on government welfare schemes, they are often targeted by predatory advertisers for superfluous services they can ill afford and high-interest loans. The wheels of commerce may be turning, but can this truly be considered success when it’s clearly a drain on the economic system as a whole, as people with ever-larger deficits become even more dependent on public assistance?

Towards algorithmic accountability ends, researchers at Princeton have launched the Web Transparency and Accountability Project [5], where software robots masquerade as online users of all stripes. By studying the treatment these “characters” receive, biases can be detected in systems from search engines to job sites. Similar academic endeavours are taking place at Carnegie Mellon and MIT. This is a vital first step in opening up to scrutiny these engines of the new data economy.

“In the past, the things that men could do were very limited . . . But with every increase in knowledge, there has been an increase in what men could achieve. In our scientific world, and presumably still more in the more scientific world of the not distant future, bad men can do more harm, and good men can do more good, than had seemed possible to our ancestors even in their wildest dreams.”
— Bertrand Russell

A blind faith in Big Data alone leads to quantification bias: the unconscious belief of valuing the measurable over the immeasurable. Having more data is not helping us make better decisions. Global data collection and use limitations in a world where everything is mediated by algorithms is a natural, and necessary, step. Throwing away data that can’t be expressed numerically is silver-bullet thinking, a belief that some simple solution exists. Something is missing from Big Data. Integrating qualitative human insights with Big Data, to give us Thick Data and an Augmented Intelligence, would help us rescue the current context loss suffered by our addiction to simple quantification of our collective data exhaust. Relying on Big Data alone increases the chances we’ll miss something, whilst giving the illusion we already know everything!

About the Author

Sharon Mitchell is an AWS DevOps Engineer at Version 1.

References

[1] https://datasociety.net/

[2] https://www.nytimes.com/times-insider/2014/08/22/times-article-changes-a-policy-fast/

[3] http://www.shirky.com/weblog/2009/11/a-speculative-post-on-the-idea-of-algorithmic-authority/

[4] https://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_you’re_a_dog

[5] https://webtap.princeton.edu/

[6] David Beer. 2016. Metric Power. Palgrave Macmillan.

[7] Ronald E. Day. 2014. Indexing it all: The subject in the age of documentation, information, and data. MIT Press.

[8] Timandra Harkness. 2017. Big Data: Does Size Matter.? Bloomsbury Sigma.

[9] Dawn E. Holmes. 2017. Big Data: A Very Short Introduction. Oxford University Press

[10] Michael Patrick Lynch. 2017. The Internet Of Us: Knowing More and Understanding Less in the Age of Big Data. Liveright Publishing Corporation.

[11] Cathy O’Neil. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Penguin.

[12] Frank Pasquale. 2016. The Black Box Society: The Secret Algorithms That Control Money and Information. Harvard University Press.

[13] Bruce Schneier. 2015. Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World. W. W. Norton & Company.