Week 6 — MAKE WORLD GREEN AGAIN!
In this week, we have worked on finalizing the code part and we tried to extend our dataset for pollutants other than NO2. This way we aimed to extend our design for predicting the AQI of the day with help of the peak values estimated. We were only able to add CO because the raw data for SO2 wasn’t suitable for our project.
WHAT IS AQI?
AQI -air quality index- is a number calculated for measuring how much the air is polluted. AQI is directly related to unwanted ratios of molecules in the air. In short, it gives us an idea of air quality and we can say that higher AQI means more polluted air.
OVERVIEW OF THE APPROACH
As the summary, we can say that we have approached the problem in two different ways.
1- Classification Problem
1.1 — Peak Value Level Estimation: Neural Network, Decision Tree (Week 5)
1.2 — AQI Estimation: Since the discrete case is not a sensible approach to this problem, it is not an appropriate way to estimate the AQI level.
2- Regression Problem
2.1 — Peak Value Estimation (First Phase): Linear Regression, Polynomial Regression (Week 4)
2.2 — AQI Estimation: Polynomial Regression, SVM (This Week)
The model used for AQI estimation takes the pollutant densities -peak values of NO2 and CO in our project- as input. It outputs the predicted AQI of the day in the regression model and it outputs the predicted level of AQI of the day in the classification model. The wide of each level is 15. In our dataset, this range corresponds to 7 levels. So the minimum AQI belongs to level 1 and maximum AQI belongs to level 7 in our dataset.
Firstly, we tested this part with a dataset consists of peak values of NO2, peak values of CO and AQI of days. For predicting AQI -continuous case-, when we trained an SVM model using this dataset, it gives the accuracy of 0.96 and the regression model gives nearly 1.0 with mean-squared error 0.18. For predicting AQI level -using 7 levels-, the SVM model gives the accuracy of 1.0. From these results, we concluded that if we succeed at estimating the peak value, estimating the AQI of the day will not be a problem.
Then we fed these models with estimations from the first phase. The results weren’t as high as the before. As it’s seen from the plot above, the best accuracy gets close to the worst case of the first phase. The source of the mispredictions here is the error in the input data. Since the input of this part is the output of the first phase, the error in the first phase directly misleads the estimations.
In order to train the model for this kind of errors, we used a train set consists of the estimated peak values instead of a train set consists of true peak values. The progress made in SVM model with this idea is shown in the table below. The Polynomial model didn’t make progress with help of this idea.