Visualizing Inequality with Deep Learning

7 min readJan 7, 2020

We used deep neural networks to analyze 100,000 images of Santiago de Chile and built a visualization with the results.

Co-written with Tomás Ramirez

The original version of this post (in spanish) is available here.

Since the week of October 19th, Chile has found itself immersed in a so-called “social outburst”. Millions of people throughout the country have protested in order to request structural reform to the way Chilean society works.

The objective of this post isn’t to delve into the discussion about what is happening (there are many other sources for that), but rather to offer a small contribution to the understanding of the causes behind this conflict, using state of the art technology.

More than one million people protested on October 25th. Source

In the research we are carrying out in PUC’s Artificial Intelligence Lab (IALAB PUC) for the Millenium Institute Foundational Research on Data’s Explainable Artificial Intelligence project, together with UC Engineering’s Transportation and Logistics Department, we have been studying the modelling of urban perception (we will explain what this is a little further below) with deep learning.

This gives us the possibility of analyzing information from thousands of images much more efficiently than a human could, which allows for insights which would be impossible to obtain manually. We made use of this tool to analyze the perception of ~120,000 images of Santiago.

Dataset.

The urban perception problem consists on automatically rating the sensation given visually by a landscape based on a certain attribute (e.g: safety). Place pulse (Dubey, et al. 2016), is a crowd sourced dataset with approximately 1.2 million user responses, each consisting of a pair of google street view images, an attribute and an user generated label containing in which of the pictures the user perceives the attribute more intensively.

The place pulse platform for collecting the data. Source

Thanks to place pulse the perception problem can be treated as a pair-wise ranking problem, and we can use techniques from that domain to train a deep network that learns to rank images according to attribute perception.

Architecture & training.

Since this is the first stage of our research we did a similar architecture to the one presented by Dubey, et al, but we ditched the classifier section of the network, since we are only interested in ranking and according to our experiments, the training was faster and more stable that way.

The resulting architecture is very simple and consists of an ImageNet pretrained Conv Net (we experiment with AlexNet,VGG, DenseNet and ResNet) which is fed into 2 fully connected layers with a final scalar output. For training we feed the two images into the network, and use the

The key part of the training is the loss function, which is taken from the pair-wise ranking problem approach. We start with a traditional margin ranking loss given by:

Where x₁ and x₂ are the image inputs, y is the label (-1,1), f represents the model output and m is a constant for numerical stability. Is important to note that this function does not provide the intended result when the label represents a tie ( y = 0 ), and this happens fairly often in the dataset, so to be able to take advantage of this extra data we add a second loss term that forces tied images to be ranked similarly:

We add this two components for the final loss.

We train one model per each of the six attributes on place pulse, we use SGD and we augment the data with random flips and crops. After the training we use the models to analyze the images of Santiago, It’s important to note that this images weren’t used for training.

Visualizing and analyzing the results.

In order to evaluate the results, we built a visualization of these in a map of Santiago (check it out here!). If you know the city, It’s enough to look at the map for just a little with any of the 6 attributes to register qualitatively that the results make a lot of sense.

Visualization for the Wealth attribute and a sample of 5,000. Green is very wealthy.

If you couldn’t see the results on the interactive map you can see a general overview in the image below.

Visualization for the 6 attributes with a sample of 50,000. From left to right: wealthy, depressing, safety, lively, boring and beautiful.

Those who are familiar with Santiago will be able to observe a correlation between the areas with higher income per capita — towards the northeast of the city — with the perception of qualitative attributes of the city which are considered positive.

Santiago behaves similarly to many Latin American cities, where an identifiable high-income sector extends in a cone shape from the historical city center towards the city’s exterior. The neural networks show that Santiago’s segregated development has not only provoked the shift of goods, services and socio-economic indicators, but has also taken with it the beauty, perceptions of safety, joy and liveliness of spaces in the city.

Distribution of mean income per household in Santiago. Correlation with the image analysis is evident. Source: Encuesta Origen Destino 2012

Digging deeper into the results’ explainability

Today explaining why a neural network model responds the way it does, is an unsolved research problem and highly relevant in the field. As part of this project, it is of great value to understand what things define that an image looks, for example, “more depressing”.

To complement the model’s predictive capacity, econometric techniques were used to, as much as possible, make the neural network’s decision process be more interpretable. To this end, two other networks were used to extract human understandable features of the image. We used the Tensorflow object detection tool and the SegNet semantic segmentation model.

Examples of semantic segmentation. Source

Then an econometric model was estimated with the variable for each criterion, thus obtaining information about the effect of, for example, the presence of trees in the perception of safety.

Fuente: Rossetti, T., Lobel, H., Rocco, V., & Hurtubia, R. (2019)

The results in the table indicate per column how said parameter affects perception (statistical significance in parentheses). For example, in the second row, the first value indicates that the variable buildings has a negative effect on beauty, quantified at -0,0983. The results on this table do not have tangible units of measure, which is why it isn’t possible to grant an absolute value, but what’s interesting is that it allows us to compare the values with other variables. From this study we can highlight certain interesting findings, for example that the presence of cyclists and pedestrians has on average a positive effect on the perception of safety and liveliness. They even allow us to compare the magnitude of each variable’s effect, for example how the presence of vegetation and cyclists is the most relevant for determining a place’s beauty.

What we are up to now

Our research group has reproduced Dubey et al.’s experiment in order to include new variables that will permit us to associate perception to a user’s characteristics. Our project is called Wekun, a word from the Mapudungún language that means “outside”. In it we ask perceptions on which place seems the best for walking, which place seems the best for living and we keep the questions about safety, beauty and wealth. Additionally, we incorporate a new section where we ask for socio-demographic information about the one who answers. With this we have managed to detect differences in the perceptions between men and women, and between pedestrians and cyclists in contrast to drivers.

In the figure, the places with greater decrease in perception of walkability according to women is shown in red. The center of the city and main axes of vehicular mobility is highlighted in red.

Differences in perception of walkability from women in comparison to men’s. In red the zones perceived as less “walkable”.

This research is part of the work carried out by PUC Engineering’s Transportation and Logistics Department and PUC’s Artificial Intelligence Lab (IALAB PUC), which is part of the Millenium Institute Foundational Research on Data.

We would like to thank everyone who has historically worked on this research, particularly Tomás Rossetti, Hans Lobels, Víctor Rocco and Ricardo Hurtubia, authors of the research that laid the foundations that made this post possible.