P05 — Week 6 — EDA and Machine Learning models
This week, we focued on the machine learning algorithms and their evaluations.
As we mentioned last week, our data was ready to use in machine learning models. However, we didn’t have much insight over the data. This week, we started with the exploratory data analysis.
First, we wanted to see if there’s any outliers. So we used a boxplot to detect them.
As seen in the Figure 1, we found two samples in laboratory (left) and real world measurements (right). We dropped those samples from the data.
After pruning the outliers, we headed to normalize our data. Firstly we used Min-Max scaler but since our data has high range in most of the features (Fig. 3), we switched to Standart scaler.
Then, we checked for highly correlated features (Fig. 4), since they could affect the models’ performances.
In our data, steering axle width and other axle width features had almost perfect correlation. So, we’ve dropped the other axle width feature to avoid any unwanted bias.
Applying Models
To test our data, we’ve used several models. To be exact, we’ve used:
1- Linear regression
2- KNN regressor
3- Multivariate regression
4- Ridge regression
5- Lasso regression
6- Support vector regression
7- Desicion tree regressor
Results
Among all the models, KNN regression gave the best result with 34.12 root mean squared error score.
Next week, we’ll be focusing on polishing our models and our project’s final report.