Visualization of Air Pollution (Using Folium)

Nitin Vashisth
Analytics Vidhya
Published in
4 min readApr 19, 2020

--

Photo by Anne Nygård on Unsplash

Air pollution has been a major problem at this globalisation stage. Different countries are pushing hard to pump growth into their economy without sustainable approach. Similarly it can be seen in one of the city of Korea called Seoul. Seoul Metropolitan Government helped to collect different measures of air particulates like NO2 (Nitrogen dioxide), SO2 (Sulphur dioxide), CO (Carbon Monoxide), PM2.5 and PM10 (Particulate Matter).

The motive of the article is to provide exposure over different visualisation tools and further to take inference out of it. Let’s start with data exploration of the dateset provided by Seoul Metropolitan Government to Kaggle as public dataset. To download the dataset, click here.

As usual, we start reading the csv file through pandas.

read_csv from pandas

During the exploration we found that, there are few values are -1, which could be reason of faulty appratus which take those air pollutants reading. Hence we decided to impute those value with mean values. This can be done with easily with the help of scikit library “SimpleImputer”.

Right: “-1 value check”, Left: “Use of SimpleImputer Library to replace with mean value”

We plotted the line graph for the different emission keeping x-axis as the date and y-axis as the emission type (NO2, SO2 etc). Seaborn is the great library to plot these graphs.

Lineplot for gases over the timespan

We also plotted the correlation matrix, to check which gases are dependent on others. It was observed that, the SO2, NO2, and O3 are highly correlated. It can inferenced that, these gases help each in the increase and decrease of individual other gases.

Photo by Tatiana Rodriguez on Unsplash

In time series data analysis which involves latitude and longitude, it always good view the data in a map. This functionality can be explored with the help of great library called “Folium”. It is higly intituitive to use and it is open source. To start with, there are some pre-processing is required with respect to the dataset.

We have to create few column as hour, days, weeks, months and years which will help us to plot the lattitude and longititude on to the maps.

Column for hours, months, days, weeks, months and years

Next we create function with some default parameters:

  1. default_location — It takes latitude and longitude as parameter which will be display in the map.
  2. control_scale — It enables or disables the map at given zoom level.
  3. zoom_start — It specifies how much zoom is required as map loads.
generateBaseMap() function with some default parameters

Now we need create a list with all latitude, longitude, PM2.5 (can be any gas here) and year. Then we pass all these information into the generateBaseMap function which will plot the latitude and longitude on the map. Below is the code for reference:

List of latitude and longititude with corresponding PM2.5 values year wise
Plot of PM2.5 for year 2017, 2018 and 2019

Here we can infer that, the pollution is on continuous increase since 2017. Also the change in color towards red depicts that, “PM2.5” values are reaching more towards at “very bad” severity condition.

We also plotted month wise for year 2017 in order to look, how pollution changes in a year.

Map plot for year 2017, month wise

Again it follows the same pattern with continuous increase in each month.

Conclusion

From the correlation matrix it was observed that, there is strong relationship between SO2, NO2 and O3. It will be fun to checkout the further relationship among these gases. Also from further reading (source), the gases released from different factories converted into Ozone (O3) in presence of sunlight. From the lineplot above, the there is always increase in O3 in summer than winter. It makes it obvious again, the presence of sunlight causing increase in O3.

Hope you liked the blog. Please do clap, share and comment! Stay tuned for my next blog.

--

--