Let’s imagine a scenario in which you are a data scientist concluding an important project. Now you must present a summary of your work to a client and a C-level, both of them being business oriented people with little technical background. In a short time, with your team composed of very talented data scientists and data engineers, you prepare a nice powerpoint presentation and a demonstration using state-of-the-art interactive visualization packages in a python notebook. The meeting ends. No questions. The client did not appear very excited. Good machine learning metrics and complex python dashboard did not have the desired effect.
What did go wrong?
You did not show your results properly to your audience. Besides the subject that may be too technical, the form of presentation is essential to catch stakeholders’ attention. And even if there were no data visualization specialists in your team able to represent your results, the project deserved a better care for this aspect.
Data description and preparation
This data visualization uses a database from IBGE (“Brazilian Institute of Geography and Statistics”) , containing sanitation information for each Brazilian municipality from 2017. From this file, I chose to consider only two features:
- Sanitation policy existence. Three status are reported in the database: “Yes”, “No” and “In preparation”.
- Disease occurence associated with basic sanitation, in the last 12 months. Fourteen diseases are listed (Diarrhea, Leptospirosis, Worms, Cholera, Diphtheria, Dengue, Zika, Chikungunya, Typhus, Malaria, Hepatitis, Yellow_fever, Dermatitis, Respiratory_system_disease) and for each of them, four status are reported: “Yes”, “No”, “Not informed” and “No data”.
In order to display sanitation data on a map, I also downloaded a polygons database of all the Brazilian municipalities from . Both datasets were then combined in the same geojson file, adding sanitation features in geojson properties for each municipality, as shown in the code below.
In the current project, I adapted the architecture of  to have the following structure:
- input: source sanitation and municipality geojson files
- notebooks: python notebook for data preparation
- templates: resulting index.hmtl
- app.py: routine for server creation, rendering html pages and serving data. When running
python app.pyin your terminal, the data visualization will run at gate 5001 of your localhost. The geojson data file containing all useful features is accessible on the route “/geojson”.
Map and charts
Before starting to code, the first thing you need to think about is the design of your data visualization. There are many examples available online. The figures below display a previous drawing of the current dashboard and the final dashboard. The objective was to explore different types of charts such as an interactive map, a spider chart and a bar chart.
In the following, further details regarding each chart is given with code samples.
- Interactive map:
After exploring many examples, I chose the choropleth map to represent the geographical data . A choropleth map displays geographical subdivisions, polygons in the present study, coloured by a numerical variable. In the current visualization, the colour code represents the total number of observed diseases per municipality. When hovering the map, municipalities are highlighted, and the box legend at the top right modifies the information dynamically, showing the municipality name and the corresponding total number of diseases. When clicking on the map, we can zoom in the municipality of interest. And to come back to the original view, you can click on the home button at the top left. This is the only new element I have added in the choropleth reference model.
- Spider Chart:
Spider or radar charts are two-dimensional charts able to display multivariate data along different axes starting from the same point. I used the work of  and  who developed RadarChart.js function with legend. Herein, I chose to display, for the whole country, the percentage of disease observation when considering the sanitation policy existence. In the figure below, ratios for municipalities with and without sanitation policy are displayed in yellow and red, respectively. Municipalities where the policy is in preparation are represented by green dots.
Highest values appear in the cases of Diarrhea, Worms and for the group of diseases transmitted by mosquitos, such as Dengue, Zika and Chikungunya. For all of them, the presence of sanitation policy has a positive impact, decreasing the number of cases. Unfortunately, I do not have any explanations regarding the increase in disease rates when the sanitation policy is in preparation.
- Bar chart:
Please, feel free to ask anything if you have any questions!
Thanks to João Rafael Macedo, Gabriela Tunes, Ricardo Santos and Beatriz Albiero.