Urban Data Visualization: We will know as much as you collect and show
New technologies and cultures extend a variety of ways of understanding cities, and all of our activities are recorded through digital devices. In addition to data such as credit card and phone location, platforms such as Google silently record our activity. Moreover, in the scientific community, research papers on new analytical methods and data processing are published every day, and data released by the government provide a variety of contexts for our activities. The idea of understanding cities through data emerged from considering the digital traces that we leave as ways to examine cities in a digital environment.
Urban data visualization plays an important role in interpreting and communicating urban data. In particular, it helps to put the analyzed data in the correct context and provides feedback during the analysis phase. Thus, urban data visualization has a major impact on data scientists, policymakers, and citizens who understand and change the city. However, it takes a great effort to process the data to a level that we can understand. The simple, well-documented datasets provided by for-profit companies, such as Google, are not readily available in cities, and much urban data is not even yet exist. Even if the data are available, they are not easy to obtain and, in some cases, the development of new analytic methods is required in order to collect the correct data. It is important that a data visualization designer considers how to actively intervene in the process of collecting and analyzing urban data.
The MIT Senseable City Lab has conducted research into the way in which data can reveal cities and the patterns of citizens living in them. In particular, as a way of discovering cities’ possibilities, the lab have used ways in which we transform urban data into a visible form; this is known as urban data visualization. I will introduce recent projects carried out by MIT Senseble City Lab, Treepedia, and Underworlds to show examples of how we address urban data that is hidden deeply within a city, or has not yet been exposed to the available level, via data visualization. I will also briefly mention precautions that a data visualization designer must take during a production phase.
Treepedia: Analyzing infrastructure data — Google Street View
Trees mitigate high temperatures, perform services such as storm water retention, provide a natural respite from traffic and noise, and have even been associated with lower mortality rates in urban areas. The roots of trees stabilize the ground and help prevent floods in the event of heavy rains and storms. Although some cities recognize the importance of trees and develop active vegetation policies, others do not.
The Treepedia project used Google Street View to compare the vegetation level of cities of the world. Google Street View is a street image dataset that Google provides to most urban areas and has been used by many urban researchers since 2010, because it has the advantage of recognizing various parts of cities on a street-level. Using these data, the researchers of the lab devised a computer vision algorithm that can automatically digitize the greenery of a city, thereby creating a unique metric called the Green View Index (GVI), which can compare the vegetation of numerous global cities. This created the basis for a comparative analysis of all cities that have Google Street View.
In order to visualize the GVI, I divided GVI points into red and green colors and mapped points of each Google Street View. At the city level, the points can be interpolated to visualize the image of the entire city. A comparison of 26 cities that have been analyzed to date can be viewed on the following website: http://senseable.mit.edu/treepedia. In particular, it is important to note that a city can be contextualized through a citizen’s resident city, and then GVI can be compared to other cities in each citizen’s perspective. This not only stimulates everyone, from the citizens, who directly see visualization, to policymakers, but also clarifies the required action to take after identifying areas lacking green spaces on the map. Indeed, after Treepedia was published, daily newspapers in several cities gave immediate reactions; The Strait Times celebrated the fact that Singapore has the highest GVI (29.3%), and Le Monde was alarmed that Paris has the lowest (8.3%).
Walking through the streets of Singapore, many big, dense trees are noticeable, but in the middle of downtown Paris, it feels as if there is no such greenery. However, only data visualization can draw and compare these perceptions at a global level. This process, from crawling through data obtained via Google Street View, to processing, analysing, and visualizing the results, can be named urban data visualization.
Underworlds: Exploring new urban data
Sewage is one of a city’s most important infrastructures. However, in reality, the world underneath the toilet or sink is of no interest to people. Nevertheless, the important point to note is that when we wash our hands or do our laundry, we also flush the traces of life patterns down to the sewer. Unfortunately, it is difficult to develop an information system that records and analyzes sewers, because sewage has a complicated structure to sample.
The Underworlds project (http://underworlds.mit.edu) began with an assumption that if we can read the biological data from sewers, we will be able to observe the individual microbiomes at the city-level. Thus, the goal of the project is to create a real-time urban health monitoring platform. To make this possible, six MIT laboratories began collaborating, including MIT Sensible City Lab and MIT Alm Lab.
Sewer researchers have traditionally visited a sewage treatment plant to acquire water. However, this is problematic; a sewage treatment plant is not only distant from a residential area, but also cannot read the human life patterns that we expect to find in the sewage system, because the microbiome has died or has become deformed during the time when sampled. In fact, the end-of-wastewater treatment plant in Boston is located in Deer Island, approximately 30 kilometers from the downtown of the city. Therefore, visiting this plant and sampling wastewater means collecting distorted data from Boston. In contrast, we used the network analysis of sewage to take sampling points that can secure a diverse population profile, and to sample wastewater for understanding the city.
This direction requires far more complex processes than projects that visualize the results of processing or involve the analysis of existing data, such as Treepedia. Since there are no digitized data, the creation of a system that can secure that data is required. This means that we need robots that can sample wastewater in a stable and homogeneous manner, and biotechnology systems that can rapidly digitize the results. In addition, from a data visualization designer’s perspective, this should not only be a visualization of the results, but also a product that can help researchers continue to make decisions, and thus influence analysis and future research.
Therefore, with Noriko Endo, a PhD candidate of Civil Engineering, we visualized the algorithm to analyze the sewer network of Cambridge to select sampling points. This was to simulate the areas that a robots sample could represent and in which people live. In the US in particular, where there are diverse races and income disparities, it was important to assess diverse areas to accurately represent a city. Therefore, the visualization can be an interactive tool by which researchers can select each sampling point, thereby observing any changes in the catchment area and connected population profile.
Data visualization includes all the processes of collecting and analyzing these data, not just beautifully showing results at the design phase. Designers should work closely together with data scientists in the field of analytics or data collection, and they should sometimes directly face hard technology. When these processes continue, the power of data visualization becomes stronger and the characteristics of the results that data visualization produces will also change.
Crossing the line of disciplines to fight bias
Data visualization is basically a “represented entity”. Omission and distortion occur from the projection system of the map, which we generally believe as a truth. There is no way to completely avoid this, particularly, if some data are missing from the urban environment, or the collection of data represents “ordinary views”, the perspectives and traces of omitted people will be missing. These urban visualizations that utilizes data from“ordinary views” may be an incomplete or incorrectly represented entity.
Kate Crawford, a scholar who has a keen eye for Big Data Discourse, made a major inquiry into this inherent bias. In an attempt to analyze the floods that hit Australia’s Queensland region in 2010, she collected Twitter data, which gave clues regarding how victims found shelter and food. However, when she examined the data to identify where these tweets were written, she noticed that most of them were posted in Brisbane (the capital of Queensland), which was not actually flooded. In fact, this study collected data from the wrong individuals. If the data visualization designer draws conclusions from the data without any consideration, there is a possibility of producing meaningless results because there is no context.
Greater care should be taken in this respect in the sense that data visualization is ultimately completed by the designer, because most data visualization appears “neutral”. Catherine D’Ignazio, who constantly have talked about how to guard against bias in data visualization, stated that “data visualizations wield a tremendous amount of rhetorical power. Even when we rationally know that data visualizations do not represent ‘the whole world’, we forget that fact and accept charts as facts because they are generalized, scientific and seem to present an expert, neutral point of view” (D’Ignazio, 2016).
There will be numerous changes in data visualization, just by assessing what type of data are collected, and what are the characteristics in the analysis phase, from the perspective of a designer who visualizes cities. Metadata, such as whether data are hidden, are owned by a particular stratum, or are blocked by a company or government, will provide context surrounding the data, thereby assisting the design of data visualization, such that it is more persuasive. Such data visualization will enable a deeper understanding of, and action in, the cities in which we live.
Boyd, D., Crawford, K. (2012). CRITICAL QUESTIONS FOR BIG DATA. Information, Communication & Society, 15(5), 662–679. http://doi.org/10.1080/1369118X.2012.678878
Claudel, M., Nagel T., Ratti, C. From Origins to Destinations: The Past, Present an Future of Visualizing Flow Maps, Built Environment Vol 42, No 3.
D’Ignazio, C., (2016). What would feminist data visualizations look like?, Retrieved from https://civic.mit.edu/feminist-data-visualization
Nabian, N., Offenhuber, D., Vanky A., Ratti, C. (2013) Data dimension: accessing urban data and making it accessible. Proceedings of the institution of Civil Engineers.
Tan, A. (2017, February 23). Not a concrete jungle: Singapore beats 16 cities in green urban areas. Strait Times, pp. 10–11. Singapore. Retrieved from http://www.straitstimes.com/singapore/environment/not-a-concrete-jungle-singapore-beats-16-cities-in-green-urban-areas
Vanky A. Make Data Make Sense: The Importance of Visualization in Data Analytics. IQT Quarterly Spring 2016 Vol 7 No 4.