Week 2 — Subjectivity, Bias, Location & Accuracy

Published in

Data Mining the City

11 min readSep 19, 2017

Subjectivity

Across many fields, analytics have been applied spatially. A couple examples:
regional science is a field of the social sciences concerned with analytical approaches to problems that are specifically urban, rural, or regional.
Spatial economics deals with what is where, and why. But there are major critiques of these methods.

In his landmark text “Explanation of Geography” David Harvey critiques these emerging types for being devoid of theory and ethnographic perspective, and for being co-opted for political purposes. The book could also have been called “The Role of Theory in Scientific Explanation”…as Harvey recognizes that these scientific fields are also deeply interwoven with theoretical issues. Another critique of spatial analysis is that it looks at problems from a God’s Eye View, and is overly reductive of individual experience.

In history, Subaltern Studies questions the history of the masses. What happens at the base levels of society rather than what happens among the elite. It fundamentally questions who speaks for whom. When histories about masses are written by elite historians but not by the masses themselves. When a map is drawn, do the masses speak for themselves, or does an elite planner, economist or government official speak for them? Spatial Analytics borrows objective and subjective tactics, but is critiqued for a God’s Eye View perspective that can lack first person understanding because it tries to reduce trends from multiple occurrences into a model, trend or behavior pattern.

There’s further criticism in the obsession with data. You may have heard the term “quantrapreneurs” which mocks companies built on data. The quantified self also known as lifelogging, is a movement to incorporate technology into data acquisition on aspects of a person’s daily life in terms of inputs (food consumed, quality of surrounding air), states (mood, arousal, blood oxygen levels), and performance, whether mental or physical. In short, quantified self is self-knowledge through self-tracking with technology.

In a quote from a skeptic:

“Quantified self” practitioners as a group are not necessarily curious about human values or an understanding of what makes us human. They’re more interested in anything that can be measured and given a number. They believe the maxim that only the things that are measured can be improved. But I see a lot of measuring, but not much improvement….Quantifying the number of times we eat, sleep, or tweet doesn’t somehow reveal something more truthful about ourselves over just experiencing it. Are we actually learning something more fundamental about ourselves? Why do we think there’s something more true in the numbers than how I feel?”

It seems somewhat impossible to describe everything…And even when we describe and categorize the world, the boundaries can be somewhat arbitrary. How do you distinguish a cell in the small intestine from the descending colon? The cell doesn’t know that its part of that system or even the digestive system. Humans only delineate it that way. To this degree, boundaries and categories can be somewhat arbitrary in nature.

Can “reality” be described? For Nietzche and nihlists there is no reason to describe it because there is no objective order or structure in the world except for what we give it.

“Every belief, every considering something true, is necessarily false because there is simply no true world.” The perspective is that humans search and attribute meaning in a meaningless world. The phenomenon of humans perceiving correlations that don’t exist such as the faces we imagine in trees or the clouds or the circle and lines on the screen is called apophenia. In data science it is called Illusory correlation, the phenomenon of perceiving a relationship between variables (typically people, events, or behaviors) even when no such relationship exists.

So what is the distinction between “reality” and our simulation of it in our models, maps, and data. If we had all of the data points about the feel, color, location size of everything in the natural world and modeled it, what would be the difference between reality and our simulation? Simulation theory (as popularized by neuromancer and the matrix) is the hypothesis that reality could be simulated — that we couldn’t tell the difference anyways if we were in a simulation.

You may have heard of the Borges Map which describes a 1:1 scale map which has been used again and again by other authors and artists.

In 1893, Lewis Carroll, author of Alice in Wonderland, imagined a fictional map that had “the scale of a mile to the mile.”

In his passage:

“And then came the grandest idea of all! We actually made a map of the country, on the scale of a mile to the mile!”

“Have you used it much?” I enquired.

“It has never been spread out, yet,” said Mein Herr: “the farmers objected: they said it would cover the whole country, and shut out the sunlight! So we now use the country itself, as its own map, and I assure you it does nearly as well.”

Today the 1:1 map and the blend between reality and simulation is closer than ever through the Internet and Internet of things

A reading from the Artist, Hito Steyerl

The whorf hypothesis describes: reality is embedded in culture’s language and that language controls thought and cultural norms. Some languages create the capacity to discuss concepts that don’t exist or can’t be comprehended in other languages. The fact that we describe colors categorically or that we think of counting sequentially drive our capacity of how we understand those things. But many things are less clearly classified and often fit into gradient fields. You can imagine a language not made up of categorical words but made up of songs, rhythm, pitch, and tone. Instead of saying bluish green would we hum somewhere in between the pitch of blue and green? Could volume indicate intensity of saturation?

This would fundamentally shift our understanding of what is.

Words and language create a shared understanding of what something is, however a word in inherently reductionist. So we reduce at the level of the word, but we also reduce when we model a concept.

In data science we use models, trend lines and patterns to understand what happened and to predict what will happen.

Overfitting

When we have a set of data points from individual events, we attempt to understand it by creating generalized model. In machine learning this is called fitting. When we underfit a model it is too general to be useful. Our ideal spot is in the middle: where it is specific enough to be useful but general enough to be applied to other data samples. Overfitting tries to hug the data set too closely. Then when the model is applied to a another dataset, the model isn’t relevant. Because big data uses huge data sets, its useful to model around a percentage of that data to represent all of the data. However if a model is overfit to the sample data it wont apply to the rest of your data.

Overfitting is relevant in examples in your life as well: I looked for wall hooks on Amazon once, and now its recommendation engine thinks I love wall hooks. Because its overfitting to the sample data it was given. You can also think of examples of overfitting and underfitting in architecture. Modernist architecture was intended to universalize design around a standard man. But we often find that we are misfits in the models that are built for us. A modernist chair might be too large for us. Or the seat on a plane is too small for a tall person. In Goldilocks and the 3 bears, Goldilocks finds that the chairs (lets call them models) don’t all fit her.

Location and Accuracy

So there’s all these various words or mechanism to describe and track objects and their behavior. We can describe an object’s location with a unique identifier number (this is used in 3d modeling programs and is called a GUID), we can describe its color, length height and width but each of these descriptors changes the way we understand that object.

But its not just the word we record about an object that changes its definition, its definition is also defined by the tool or method that we use to describe it. You can think of our senses as a tool for measuring the natural environment. You can see, smell, hear, taste, and touch, each which describe a different aspect of a thing. Likewise we can use tools with sensors to understand and record events and to sense objects. Each tool or sensor has limited agency in what it can record.

A camera for example is limited to its resolution, what distance it is from where its recording, the range of color it can capture, whether its view is obscured etc. All of these factors will effect the outcome of what is recorded. A camera placed at a different height with greater resolution would capture very different results. An infrared camera would capture a lower range of spectral colors. However if I used a lidar sensor to detect the distance of the object I would have no information about the object’s color and I would only know its position relative to the position of the lidar receiver.

Take these examples of how location is tracked to understand how the tool impacts the data that is recorded.

GPS — GPS is made up of 29 satellites orbitting the earth.

The working/operation of GPS is based on the ‘trilateration’. The position is determined from the distance measurements to satellites. From the figure, the four satellites are used to determine the position of the receiver on the earth. The target location is confirmed by the 4th satellite. And three satellites are used to trace the location place. A fourth satellite is used to confirm the target location of each of those space vehicles. GPS consists of satellite, control station and monitor station and receiver. The GPS receiver takes the information from the satellite and uses the method of triangulation to determine a user’s exact position.

Agency of GPS: its accurate up to about 10meters which isn’t very useful for indoor positioning. It works best when line of site is direct so its much worse in brick buildings or when obscured by foliage.

Wifi pinging — With this method, a signal is sent through a wifi hotspot to a user’s device such as a smart phone, smart watch, or computer to relate what IP addresses are within range. The strength of a signal or its mere presence within a wifi network can indicate where people are located. While wifi is usually limited to individual wifi networks, larger connected networks can be used to track the movement of a device through the city. Using time stamps of when a unique user’s address shows up at various places throughout a city.

Beacons — These are small devices that work similarly to wiif pinging but a major difference is that it can track who a person is (their unique profile). It can track while not connected to the web (bluetooth) and its accuracy is much better +- 1 m. This is also sometimes referred to as geo-fencing.

Computer vision — CV is a method for analyzing imagery, often through pattern and object recognition. Computer vision can be used to identify a particular person using facial recognition; it can understand various objects like cars, trees or people, or to recognize gestures or even the mood of someone given his or her expression. These methods are often applied to video surveillance, but can also be applied to analyze stock image or video footage. While CV is the most accurate in tracking up to the inch where someone goes it has other major limitations. Its not always accurate in defining what is a peron and can often track two people as one, or thing something non-human is human. It also requires translating a 2d video into a plan view which is difficult to accurately translate.

So data is changed by how it is recorded with a tool, but also how it is translated when it is communicated and visualized. Maps cannot be created without map projections. All map projections necessarily distort the surface in some fashion. Depending on the purpose of the map, some distortions are acceptable and others are not; therefore, different map projections exist in order to preserve some properties of the sphere-like body at the expense of other properties.

When data is lots when or changed by its means of recording this is also similar to history. Think about how history is recorded. After a series of events such as a presidential election someone must record this history based on their own observations and interpretations. How might someones record of history differ if they record through an aerial camera, through a camera on the ground, or if they hear an event remotely through a radio? But even if all observations and understanding is subjective, is it worthwhile to start somewhere? History Helps Us Understand Change and How the Society We Live in Came to Be The second reason history is inescapable as a subject of serious study follows closely on the first. The past causes the present, and so the future.

Models can be problematic but they can also help us make better decisions

John Snow, Cholera

Bias

Bias has several definitions but is usually negative. We typically use it to mean systematic favoritism of a group. In Data Science, bias is a deviation from expectation in the data. More fundamentally, bias refers to an error in the data. But, the error is often subtle or goes unnoticed. How do algorithms get prejiduced?

Stephanie Dinkins is an artist focused on artificial intelligence as it intersects race, gender, aging and our future histories.

The Project: Case Studies

Analytical

Forensic 3D Reconstruction
Forensic, Eyal Weizman
Drawing Circles
Symbolizing Drawings
Exploring Quick Draw

Predictive

Google Quick Draw
Image-to-Image Demo
Prison Map, Josh Begley
Josh Begley, New York Times Front Page:

Narrative

Race and Power in America, New York Times
Josh Begley, Condolences
Year of Death

Narrative/Analytical, can be convincing and biased:

Exploratory

Exploratory visuals are intended to explore how data might be related but may not have an end goal of arguing anything in particular. For example the project on Broadway visualizes instagram images along broadway, what colors are used in the images, and how many images are posted at each location throughout the day and then its visualized side by side with the street view locations.

On Broadway

Google Draw has the most awesome/open dataset. Please someone find a way to use it. Could we re-create images of the city from people’s memories? From their drawings?

Faces of Humanity
Infinite Draw
More Google Draw
Composite Images