Visualize ISTAT data with Microsoft Power BI

Visualize Data with Power BI

Simone Rigoni
Analytics Vidhya
4 min readMar 20, 2020

--

https://www.open.online/wp-content/uploads/2019/04/istat-litalia-e-fuori-dalla-recessione-nel-primo-trimestre-pil-a-02.png

The ISTAT (Italian National Institute of Statistics) is the main producer of official statistics in Italy including the census of population, economic censuses and a number of social and environmental surveys and analyses

As always let’s apply CRISP-DM Process (Cross Industry Process for Data Mining) to tackle the problem:

  1. Business Understanding
  2. Data Understanding
  3. Prepare Data
  4. Data Modeling
  5. Evaluate the Results
  6. Deploy

Business Understanding

The goal is to get some insights about the regional distribution of population in Italy and how it is changed by year

Data Understanding

ISTAT let you select from different data topics: in this case we are looking for Population and Households — Population — Resident population on 1st January — All municipalities

http://dati.istat.it/?lang=en

We are not interested in dividing population by age or marital status so we have to select the filtering to all on this fields

Let’s consider the last 5 years

The data from ISTAT can be downloaded in CSV or Excel format

Prepare Data

Export the data in both Excel and CSV format because in CSV file the relationship of State — Geographic Area — Macro Region — Region — Province — Comune is lost so we have to rebuild it

Excel data

Through some work with Python is possible to rebuild the hierarchy StateGeographic AreaMacro RegionRegionProvinceComune exploiting the indentation present in the ‘Territorio’ column of the Excel export file

https://upload.wikimedia.org/wikipedia/commons/thumb/a/ae/Regioni_of_Italy_with_official_names.png/350px-Regioni_of_Italy_with_official_names.png

Italy is composed by:

  • 3 Geographic Area: Nord, Centro and Mezzogiorno
  • 4 Macro Region: Nord-ovest, Nord-est, Centro, Sud and Isole
  • 20 Region
  • 110 Province (number reduced through the years)
  • 8100 Comune (number reduced through the years)
CSV data

In the CSV file we have 40700 records and the columns are:

- ITTER107: territory identifier
- Territorio: territory name
- TIPO_DATO15: KPI identifier
- Tipo di indicatore demografico: KPI name
- SEXISTAT1: sex identifier
- Sesso: sex name
- ETA1: age identifier
- Età: age name
- STATCIV2: marital state identifier
- Stato civile: marital state name
- TIME: year identifier
- Seleziona periodo: year name
- Value: integer value
- Flag Codes: flag for temporary data
- Flags: flag explanation

The data preparation has been done with a Jupyter Notebook and it consists in the rebuilding of the relationship between the different levels of the location dimension

Flowchart made using draw.io
Location relationship rebuild

Data Modeling

After importing the data I have normalized the tables

Location dimension:
- State
- Geographic Area
- Macro Region
- Region
- Province
- Comune

Time dimension:
- Year

To use the drill up/down functionality I have build a Location table with all the names of the location dimension and same for the Time table

Data Model

Measures created:
- Total Population: sum of the population value
- LY Total Population: Last year total population
- Total Population Var: Total Population - LY Total Population

Evaluate the Results

All visualization in the report are enabled to drill up/down on the location dimension

Total Population by Macro Region and Year

Italy has -0.53% people in 2019 compared to 2015 and the effect is more relevant on south and the islands

Population distribution by Location

Most of the people live in the Nord-ovest macro region which is composed by: Lombardia, Piemonte, Liguria and Valle d’Aosta

Total Population Var by Region and Year

Lombardia is the only region with an increase of people in recent years

Using the Q&A capability of Power BI is possible to ask questions in more or less natural language

Q&A question

Mantova is the city were I was born so of course I had to try the Q&A functionality on it

Deploy

Using Power BI is possible to publish the report on the online Power Bi service

Outro

I hope the story was interesting and thank you for taking the time to read it. The code for this project can be found in this github repository and on my Blogspot you can find the same post in Italian. Let me know if you have any question and if you like the content that I create feel free to buy me a coffee.

--

--