Air pollution: looking after my hometown

I grew up in a seaside city in the hot sun of the Mediterranean, a natural beauty called Taranto, in Puglia, Italy.
During my school days the beach and rock music offered escape from some harsh realities. In the last decades Taranto citizens have been constantly under pressure due to a struggling local economy accompanied by indecisive politics, which would put people in front of illogical and impossible choices. Such as the infamous “health vs jobs” debate, exacerbated by the proximity of the industrial hub: bigger than the city centre, located just few miles away, feeding the region with one hand while polluting with the other.

Strong wind causing dust storm near the industrial area https://www.facebook.com/588204948220930/photos/a.588244811550277/632018663839558

There’s no need to add personal opinions or point of view on the matter: the pollutants discharged by the factories in the area, most notably the ILVA steel plant, lately purchased by ArcelorMittal, have been linked to a dreadful cancer epidemic, officially reported by the Italian Department of Health. 
Data speaks for itself.

Fast-forward some 20 years to last Xmas, my Spotify playlist randomly picked “Good Times Bad Times”, by Led Zeppelin, the first song of their eponymous debut album, that goes:

In the days of my youth I was told what it means to be a man
And now I’ve reached that age, I’ve tried to do all those things the best I can

Believe it or not, that song triggered good vibes and I ended up finding some time to keep on trying to do all those things, the best I can: using my knowledge, skills and, guess what, the app I have been building for years to support a noble public cause. 
(Seriously I should have done this already… but building skills and tools takes time… you know.)

Monitoring the air quality

How can I exploit data, analytics and visualisation in this case? 
Surely I need some data to start with, public, thank you. So I found out Arpa Puglia, the organisation monitoring and researching environmental pollution in the region, release daily validated data about air quality, measured by several control units. Bingo.
The easiest thing I can do at this point (which I did) is to create, set up and host a public monitoring dashboard, refreshing every day/week when new data is provided, a sort of sentinel, for everyone to play with to visually explore data and extract insights. Hopefully super-easy to use so data can be accessible to anyone, pure data democratization.

I am hosting the first version of the dashboard on a public demo server at https://bit.ly/TarantAir .
Feel free to embed it anywhere on your website, like I did it at http://tarantair.rf.gd/ , you just need to use the following HTML snippet:

<iframe width="100%" height="100%" src=”https://omniscope.me/internal/Pollution/TarantAir.iox/r/Report/" allowfullscreen></iframe>

or just embed it on your blog or service that supports Embed.ly, as I’m doing here below just by pasting the link to the dashboard.

Have fun and drop me a comment with your feedback. But now, hey, carry on reading below :)


The dashboard is (currently) made of 5 tabs. You can switch between them using the top-left dropdown menu next to “Report” text.

1. A summary tab, showing at a glance all pollutants concentration (average) measured by all control units over the selected date range, ordered by most polluted area, with a map and filters available to help you drill down / explore the data.

2. A pollutant tab, that allows you to analyse the trend of a selected pollutant, like its concentration in different areas over time.

3. A control unit tab, that allows you to analyse the trends of all pollutants measured by a control unit over time.

4. A observations box plots tab, to show min, max, median, lower and upper quartile pollutant concentration for a selected control unit.

5. A raw data tab, to show the raw data collected from Arpa Puglia and used by this project.


Am I missing something? Where’s the data?

As I was setting up and exploring the data I suddenly remembered taking a picture last time I was in Taranto, nearby the industrial area. 
Trust me, #nofilter

A sort of hellish Mad Max scene, where orange clouds were glowing against the dark sky, a sweet “welcome home”.
The picture was taken at 10.30pm on the 2nd of November, so I was super curious to see evidence in the measurements! 
Let’s check the Polycyclic aromatic hydrocarbon PAH (IPA in Italian) concentration measured by the Cokeria and Tamburi control units..

But…hold on… where’s the data?
It’s missing. Yes, there are no measurements from the 2nd to the 4th of November for both Cokeria and Tamburi.
I am not implying data has not been published on purpose to hide pollution peaks. We all just missed an opportunity here. An opportunity to clarify for instance if those orange clouds looming over are nothing to be scared of. Right?
I have contacted Arpa Puglia scientific dep., looking forward to their reply.

The point is, we all need information, to be transparent, and to clear the air (pardon the pun…)

If you have any comments, ideas on how to improve this dashboard or to add more data science behind this, please come forward and drop a comment. 
For science!
e.g. we could correlate measurements with weather forecast and webcam images and respiratory illness incidence.

Happy 2019 then, wishing everyone to spend some of their precious time to look after our planet, looking forward to… being there, soon.


24/01/2019 EDIT
I have added 2 more tabs to compare the pollutants measurements and their distributions per month / year using box plots and bars. 
The dataset now has daily observations data since January 2016.
https://bit.ly/TarantAirCompareYears

09/02/2019 EDIT (in Italian)
Ho creato la versione in italiano per rendere il report interattivo fruibile dai cittadini di Taranto e provincia o da chiunque ha difficolta’ con l’inglese
Ecco il link: https://bit.ly/QualitaAriaTaranto