Populations

Violet Whitney
Data Mining the City
6 min readNov 1, 2017

Showing Size

Sometimes simply showing size or relative size of something can be extremely powerful. The website Wait but Why is especially great for their simplistic graphics that show our world back to us quantitively.

lLets make an evenly spaced grid that can dynamically adjust to the number of a population we have. For the sake of simplicity, lets say we have 400 people that we’d like to visualize as circles in a grid similar to what is shown in the above diagrams.

1 Based on this script below can you make the size of the circle adjust based on the population size, i.e. big circles when the population is smaller, and larger circles when the population is bigger? If you finish early visualize a specific number or statistic and make the circles interactive.

size(500,500)populationSize = 400
numberCells = round(sqrt(populationSize))
gridSize = width/numberCells

for row in xrange(numberCells):
for column in xrange(numberCells):
x = gridSize * row
y = gridSize * column
ellipse(x,y,10,10)

But if we wanted to do something that’s less rigid? How about visualizing trees in a forest, or people in a town square?

size(500,500)
populationSize = 10000
scaleDotSize = 3.6
for i in xrange(populationSize):
x = random(0,width)
y = random(0,height)
e = ellipse(x,y,scaleDotSize,scaleDotSize)

2Use some of the code above to compare 2 or more populations. If you finish early, can you replace the circles with an image to represent each population?

Inferring with Sample Data

Baye’s Theorem

We rarely have a large amount of sample points to make high resolution models that can give us exact answers to our questions. Instead we usually experience life with a small sample of data which we have to infer from. We wait at a bus stop, see that a few people have been standing around, and we try and infer based on how long people have been waiting whether the bus will be coming soon enough to wait at the bus stop.

Mathematical formulas like Baye’s Theorem can be used to describe the probability of an event based on prior correlated evidence about that event. This is commonly used in statistics since we often can’t know everything about an entire population.

Understanding the general distribution of a model can also help us predict what future events might happen. Normal distribution, also known as the bell curve or Gaussian is a population of data that is more heavily distributed around a specific number. For example people generally die around 78 in the United States and the numbers sort of fall off around this. Grades might also follow a normal distribution (the bell curve) with most students getting B’s or C’s with some achieving higher or lower grades. In uniform distribution, the probability of something happening always remains uniform. For example every time I roll a die I might have the same likelihood of it landing on the number 6.

Left: normal distribution, Right: uniform distribution (rolling dice)

Randomness

Why do we use random? Randomness helps us solve problems where there is an infinite or large number of possibilities to sample from in order to understand a whole model. If we have a population of 1,000,000 people and we want to understand generally what age they are choosing to buy homes we might randomly survey 100 in order to efficiently sample and understand the whole. Likewise an urban designer might randomly sample street level views from a digital model in order to gain an understanding on the quality of the environment they are planning.

How certain is certain enough? How broadly must we cover an area before we generally understand a model?

Randomness is also used in evolution and genetic algorithms. Random variations in a genome produce variety which can make a species with a unique mutation more adept to surviving in changing environments.

We can also use random to quickly simulate behavior within a range using random(min,max)

def setup():
size(500,500)
print random(50,100)
def draw():
x = random(0,width)
y = random(0,height)

noStroke()
ellipse(x,y,10,10)

3 Use random to change the color of the ellipse and the size of your ellipse. If you finish early, use random to create even more variety and intricacy of you visualization.

Using data to make your population

So thus far we’ve just used Python to manually describe what our population looks like, but what if we want to describe it using existing data?

4 The file below includes data from the 2011 real-estate index. The file below creates a random population as well as imports values from the real-estate index, but they are not yet linked. Can you visualize the data by the number of stories? If you finish early import your own data-source or sort the population by multiple features.

Download the Processing file and data here.

Number of stories from left to right

Making classes in a population

When we want to describe behaviors of various types of objects in a population, this is an indication that we may want to make a class.

Some friends and I have been particularly interested in cooperative investment and real-estate tools that would help visualize how to pool money together for investment.

Using online real-estate listings like this one, how might we start to use our population models to describe what is happening?

5 Can you use the Processing sketch below to update the investment model with every keyPressed (like the image in the top right)? Can you add another text line that has the mortgage amount and update that with every keyPressed?
Download the
following link.

--

--

Violet Whitney
Data Mining the City

Researching Spatial & Embodied Computing @Columbia University, U Penn and U Mich