Time Series Analysis of Air Pollution

Kexin Zhai
Spring 2019 — Information Expositions
3 min readApr 30, 2019

Air pollution is always an issue we care about. Air quality influences various aspects of human’s daily life. As the development of society, air quality will get worse if there is no action taken to deal with this problem. If there is an effective way such as laws or policy to control this issue, air quality might improve as the technology applied to it. In this analysis, I was using a dataset called US Air Pollution Data From 2000 to 2015 from data.world to explore the trend of air pollution in major cities. This dataset includes more air pollution data in big cities than air pollution data in small places. Thus I was planning to focus on analyzing one major city — Los Angles in which is having long-term air pollution issue.

This dataset provides us with an Air Quality Index of four different gases: NO2, O3, SO2, and CO. After processing data cleaning, filling in missing values, I did basic data visualization exploring the trend of four gases in 5 years. AQI number in 0 to 50 is good air quality. In 51–100, air quality is moderate. For the graphics below, we can see NO2 AQI were at a moderate level from 2000 to 2003 as the other three gases air quality index remaining below 50. For the Ozone Air Quality Index graphic, since 2012, the air quality index number was increasing due to the dryness in Los Angles. Heat, dry air trigger unusual air quality alerts since ozone is produced under this situation.

Ozone Air Quality Index from 2000 to 2015
Carbon Monoxide Air Quality Index from 2000 to 2015
Nitrogen dioxide Air Quality Index from 2000 to 2015
Sulfur dioxide Air Quality Index from 2000 to 2015

Next, I decomposed the time series to identify seasonal, tread, and residuals of Los Angles air quality index data focusing on NO2 gas.

Then I proposed statistical models using linear regression.

For the graphics below, the green and the red line were predictions for future air quality index of NO2. The red line was at the level of good air quality as the green line was various around the end line of good air quality standard. This graphics might not be a good example of predicting future air quality index.

Next step: as close to the end of the research, I noticed this dataset has two different 2015 data of Los Angles Air Quality Index. It requires extra time to fix the dataset problem.

--

--