Visualize ISTAT data with Microsoft Power BI
Visualize Data with Power BI
The ISTAT (Italian National Institute of Statistics) is the main producer of official statistics in Italy including the census of population, economic censuses and a number of social and environmental surveys and analyses
As always let’s apply CRISP-DM Process (Cross Industry Process for Data Mining) to tackle the problem:
- Business Understanding
- Data Understanding
- Prepare Data
- Data Modeling
- Evaluate the Results
- Deploy
Business Understanding
The goal is to get some insights about the regional distribution of population in Italy and how it is changed by year
Data Understanding
ISTAT let you select from different data topics: in this case we are looking for Population and Households — Population — Resident population on 1st January — All municipalities
We are not interested in dividing population by age or marital status so we have to select the filtering to all on this fields
Let’s consider the last 5 years
The data from ISTAT can be downloaded in CSV or Excel format
Prepare Data
Export the data in both Excel and CSV format because in CSV file the relationship of State — Geographic Area — Macro Region — Region — Province — Comune is lost so we have to rebuild it
Through some work with Python is possible to rebuild the hierarchy State — Geographic Area — Macro Region — Region — Province — Comune exploiting the indentation present in the ‘Territorio’ column of the Excel export file
Italy is composed by:
- 3 Geographic Area: Nord, Centro and Mezzogiorno
- 4 Macro Region: Nord-ovest, Nord-est, Centro, Sud and Isole
- 20 Region
- 110 Province (number reduced through the years)
- 8100 Comune (number reduced through the years)
In the CSV file we have 40700 records and the columns are:
- ITTER107: territory identifier
- Territorio: territory name
- TIPO_DATO15: KPI identifier
- Tipo di indicatore demografico: KPI name
- SEXISTAT1: sex identifier
- Sesso: sex name
- ETA1: age identifier
- Età: age name
- STATCIV2: marital state identifier
- Stato civile: marital state name
- TIME: year identifier
- Seleziona periodo: year name
- Value: integer value
- Flag Codes: flag for temporary data
- Flags: flag explanation
The data preparation has been done with a Jupyter Notebook and it consists in the rebuilding of the relationship between the different levels of the location dimension
Data Modeling
After importing the data I have normalized the tables
Location dimension:
- State
- Geographic Area
- Macro Region
- Region
- Province
- Comune
Time dimension:
- Year
To use the drill up/down functionality I have build a Location table with all the names of the location dimension and same for the Time table
Measures created:
- Total Population: sum of the population value
- LY Total Population: Last year total population
- Total Population Var: Total Population - LY Total Population
Evaluate the Results
All visualization in the report are enabled to drill up/down on the location dimension
Italy has -0.53% people in 2019 compared to 2015 and the effect is more relevant on south and the islands
Most of the people live in the Nord-ovest macro region which is composed by: Lombardia, Piemonte, Liguria and Valle d’Aosta
Lombardia is the only region with an increase of people in recent years
Using the Q&A capability of Power BI is possible to ask questions in more or less natural language
Mantova is the city were I was born so of course I had to try the Q&A functionality on it
Deploy
Using Power BI is possible to publish the report on the online Power Bi service
Outro
I hope the story was interesting and thank you for taking the time to read it. The code for this project can be found in this github repository and on my Blogspot you can find the same post in Italian. Let me know if you have any question and if you like the content that I create feel free to buy me a coffee.