Extending visualization skills with geospatial data

Sami Yamouni
DataLab Log
Published in
6 min readMar 13, 2020
Photo by Kyle Glenn on Unsplash

Let’s imagine a scenario in which you are a data scientist concluding an important project. Now you must present a summary of your work to a client and a C-level, both of them being business oriented people with little technical background. In a short time, with your team composed of very talented data scientists and data engineers, you prepare a nice powerpoint presentation and a demonstration using state-of-the-art interactive visualization packages in a python notebook. The meeting ends. No questions. The client did not appear very excited. Good machine learning metrics and complex python dashboard did not have the desired effect.

What did go wrong?

You did not show your results properly to your audience. Besides the subject that may be too technical, the form of presentation is essential to catch stakeholders’ attention. And even if there were no data visualization specialists in your team able to represent your results, the project deserved a better care for this aspect.

This kind of situation is unfortunately common with machine learning projects. And, in my opinion, data scientists might take the first steps toward the extension of their skill set to create more sophisticated data visualizations. This post is an attempt to provide tools to build web-based interactive visualizations using javascript, html, and python. In particular, I will consider open sanitation data of all Brazilian municipalities to introduce map visualizations. The code is accessible in this Github repository.

Data description and preparation

This data visualization uses a database from IBGE (“Brazilian Institute of Geography and Statistics”) [1], containing sanitation information for each Brazilian municipality from 2017. From this file, I chose to consider only two features:

  • Sanitation policy existence. Three status are reported in the database: “Yes”, “No” and “In preparation”.
  • Disease occurence associated with basic sanitation, in the last 12 months. Fourteen diseases are listed (Diarrhea, Leptospirosis, Worms, Cholera, Diphtheria, Dengue, Zika, Chikungunya, Typhus, Malaria, Hepatitis, Yellow_fever, Dermatitis, Respiratory_system_disease) and for each of them, four status are reported: “Yes”, “No”, “Not informed” and “No data”.

In order to display sanitation data on a map, I also downloaded a polygons database of all the Brazilian municipalities from [2]. Both datasets were then combined in the same geojson file, adding sanitation features in geojson properties for each municipality, as shown in the code below.

Dashboard Architecture

To build this dashboard, I was inspired by the tutorial of [3] who built an interactive data visualization platform for the Kaggle “Talking Data Mobile User Demographics” challenge. To do so, the author used different Javascript libraries such as DC.js and D3.js for chart definition and Leaflet.js for map visualization. The use of Leaflet.js lead me to this project as I was interested in using an open source library for map representation.

In the current project, I adapted the architecture of [3] to have the following structure:

  • input: source sanitation and municipality geojson files
  • notebooks: python notebook for data preparation
  • static: contains Javascript file graph.js where all charts are created as well as Javascript and css libraries
  • templates: resulting index.hmtl
  • app.py: routine for server creation, rendering html pages and serving data. When running python app.py in your terminal, the data visualization will run at gate 5001 of your localhost. The geojson data file containing all useful features is accessible on the route “/geojson”.

Map and charts

Before starting to code, the first thing you need to think about is the design of your data visualization. There are many examples available online. The figures below display a previous drawing of the current dashboard and the final dashboard. The objective was to explore different types of charts such as an interactive map, a spider chart and a bar chart.

Drawing of the data visualization
Final data visualization

To build the front-end, I adapted a pre-built template from keen.io, as suggested by [3]. This framework provides a list of different easy-to-use dashboard structures. In the snippet below, you will see the structure of the html page starting with the CSS libraries declaration, going on with the page divisions containing the map and the different charts, and concluding with the JavaScript source files. An important tip to remember is that the order of JavaScript files matters!

In the following, further details regarding each chart is given with code samples.

  • Interactive map:

As mentioned earlier, I adopted Leaflet framework for interactive maps. It is a JavaScript library that possesses a lot of plugins to customize your visualizations. It is possible to access a great variety of base maps (see [4]) from which I chose an OpenStreetMap layer. The map is centered on Brazil, more specifically at the latitude/longitude coordinates [-14.5, -54] with a zoom of 4 to be able to observe the entire map.

Interactive map using choropleth

After exploring many examples, I chose the choropleth map to represent the geographical data [5]. A choropleth map displays geographical subdivisions, polygons in the present study, coloured by a numerical variable. In the current visualization, the colour code represents the total number of observed diseases per municipality. When hovering the map, municipalities are highlighted, and the box legend at the top right modifies the information dynamically, showing the municipality name and the corresponding total number of diseases. When clicking on the map, we can zoom in the municipality of interest. And to come back to the original view, you can click on the home button at the top left. This is the only new element I have added in the choropleth reference model.

  • Spider Chart:

Spider or radar charts are two-dimensional charts able to display multivariate data along different axes starting from the same point. I used the work of [6] and [7] who developed RadarChart.js function with legend. Herein, I chose to display, for the whole country, the percentage of disease observation when considering the sanitation policy existence. In the figure below, ratios for municipalities with and without sanitation policy are displayed in yellow and red, respectively. Municipalities where the policy is in preparation are represented by green dots.

Highest values appear in the cases of Diarrhea, Worms and for the group of diseases transmitted by mosquitos, such as Dengue, Zika and Chikungunya. For all of them, the presence of sanitation policy has a positive impact, decreasing the number of cases. Unfortunately, I do not have any explanations regarding the increase in disease rates when the sanitation policy is in preparation.

  • Bar chart:
Sanitation policy existence chart

This is the last chart of the data visualization. This row chart uses DC.js Javascript charting library that leverages the Javascript libraries crossfilter.js and d3.js, to build interactive data visualization. It shows the data broken down by sanitation policy existence in Brazil. In the code snippet, you can see the different inputs (‘SanitationDim’ and ‘SanitationGroup’) as well as the required parameters for the figure size, variables ordering and color.

Conclusion

In this post, I presented how to build an interactive data visualization with geospatial data by combining Javascript libraries such as DC.js and Leaflet.js. The current data visualization has been developed using data from 5570 Brazilian municipalities. Note that if you consider studying problems with more data, I suggest to adapt the geojson layer and create another route for data serving. This dashboard can be adapted to any kind of contexts using different geospatial data. It is also easy to redefine the dashboard structure and charts modifying the html file and the graph.js file.

This project required some efforts to learn new languages such as html and JavaScript. Anyway, I believe that this is not an insurmountable obstacle for any data scientist.

Please, feel free to ask anything if you have any questions!

Thanks to João Rafael Macedo, Gabriela Tunes, Ricardo Santos and Beatriz Albiero.

References

[1] https://www.ibge.gov.br/estatisticas/downloads-estatisticas.html

[2] https://github.com/tbrugz/geodata-br

[3] https://github.com/adilmoujahid/kaggle-talkingdata-visualization

[4] https://leaflet-extras.github.io/leaflet-providers/preview/

[5] https://leafletjs.com/examples/choropleth/

[6] http://bl.ocks.org/nbremer/21746a9668ffdf6d8242

[7] https://gist.github.com/micahstubbs/a772306d6fd49874ec92#file-thumbnail-png

--

--