AIN 311 Machine Learning Blog 5: Analysis of Algorithms for Each Season

Published in

AIN311 Fall 2023 Projects

4 min readJan 1, 2024

Welcome to our medium blog. In this blog, we will observe the machine learning algorithms we discussed in previous blogs by analyzing them separately for each season. We will interpret our algorithms one by one for each season, making comparisons on both an annual and seasonal basis. So, let’s get started.

SVR For GT & PT08 Features:

Linear Regression For GT & PT08 Features:

Decision Tree RegressionFor GT & PT08 Features:

Random Forest Regression For GT & PT08 Features:

KNN For GT & PT08 Features:

When we look at all the graphs, we consistently achieve higher accuracy values in the spring season, while we always obtain the lowest accuracy in the winter season. There may be a more scientific reason behind this, but as data scientists, we may not have direct access to it. Therefore, we wanted to examine this phenomenon within the framework of data science and determine whether we can conclude in this regard. As a result of this decision, we examined the correlations between CO data and other variables on a seasonal basis and distribution metrics. Here are the results:

Correlations of Features With CO and Distribution Metrics Given Respectively

When we look at the distribution metrics, we cannot obtain a clear result. However, when we examine the correlations, we can say that the features in the winter season have a lower correlation with CO compared to other seasons. This could be a reason for our models performing worse in the winter season, but, of course, it is not the sole factor. Considering factors such as the relationships between gas particles, environmental factors, temperature effects, etc., a better theory could be proposed. However, as we are not experts in this field, our interpretations may be incorrect. Therefore, let’s finish the entire blog by examining the annual results of the models.

By looking at the tables given above, the GT features can predict CO amount way better than the PT08 features

Result

In our extensive analysis of five different machine learning models applied to GT and PT08 data for predicting CO levels, a consistent trend emerges, showcasing the superior effectiveness of GT data in the calculation process across all models. This proficiency is attributed to GT’s direct reference to an elemental entity, ensuring precise measurements. Utilizing GT data for predictive modeling results in heightened accuracy. Seasonal scrutiny reveals a distinct pattern, with spring forecasts exhibiting elevated R2 scores and reduced error values, contrasting sharply with winter forecasts, which display diminished R2 scores and higher error values. These seasonal variations may arise from the dynamic correlation between CO gas and other atmospheric constituents, fluctuating with seasonal changes. While further investigation is needed to pinpoint the precise cause, we speculate that the observed divergence signifies varying air pollution levels in winter and summer. This nuanced understanding sheds light on the intricate interplay of seasonal dynamics in predicting and understanding CO levels.

As we wrap up this series of five blogs documenting our weekly progress on this project, we reflect on the journey that unfolded with each passing week. From the initial stages of conception to the fine-tuning of machine learning models, our collaborative efforts have woven a narrative of exploration and discovery. Each blog has been a chapter in our shared exploration of the intricacies within the data, the challenges faced, and the insights gained. As we bid farewell to this project in our final installment, we acknowledge the collective dedication and passion that propelled us forward. This series stands not only as a documentation of our technical endeavors but also as a testament to the power of collaboration and the pursuit of knowledge in the dynamic field of machine learning. We look forward to future endeavors and the continuous evolution of our skills in the fascinating realm of data science. Be safe and sound and enjoy life. See you in next projects!

Written by İzzet Ahmet & İlbey Gülmez

AIN 311 Machine Learning Blog 5: Analysis of Algorithms for Each Season

Written by Izzet Ahmet