P05 — Week 6 — EDA and Machine Learning models

Bengü Barış Balkan
AIN311 Fall 2023 Projects
2 min readDec 28, 2023

This week, we focued on the machine learning algorithms and their evaluations.

As we mentioned last week, our data was ready to use in machine learning models. However, we didn’t have much insight over the data. This week, we started with the exploratory data analysis.

First, we wanted to see if there’s any outliers. So we used a boxplot to detect them.

Figure 1: CO2 boxplot

As seen in the Figure 1, we found two samples in laboratory (left) and real world measurements (right). We dropped those samples from the data.

Figure 2: Pruned CO2 distributions

After pruning the outliers, we headed to normalize our data. Firstly we used Min-Max scaler but since our data has high range in most of the features (Fig. 3), we switched to Standart scaler.

Figure 3: Data Distributions

Then, we checked for highly correlated features (Fig. 4), since they could affect the models’ performances.

Figure 4: Correlation matrix

In our data, steering axle width and other axle width features had almost perfect correlation. So, we’ve dropped the other axle width feature to avoid any unwanted bias.

Applying Models

To test our data, we’ve used several models. To be exact, we’ve used:
1- Linear regression
2- KNN regressor
3- Multivariate regression
4- Ridge regression
5- Lasso regression
6- Support vector regression
7- Desicion tree regressor

Results

Among all the models, KNN regression gave the best result with 34.12 root mean squared error score.

Figure 5: Models’ RMSE scores

Next week, we’ll be focusing on polishing our models and our project’s final report.

--

--