Unveiling the Green Revolution: Predicting CO2 Emission and Fuel Economy of Vehicles for a Sustainable Future

Akalazu Clinton
4 min readJun 7, 2023

--

This is a continuation from my previous post on fuel economy prediction.

Features Selection for Co2 tailpipe prediction

our aim was to use the car properties/features to predict co2 emission. We selected numerical variables with the best correlation and categorical variables.

Features: cylinders, displ, drive, fuelType, make, model, year, trany, VClass, co2TailpipeGpm.

Data documentation can be found here.

Exploring Numerical Features

From the charts above, we can see that displ (engine displacement in liters) has a strong positive correlation with Co2TailpipeGpm.

cylinders also show a strong correlation with co2TailpipeGpm.

Exploring Categorical Data

We explored the categorical variable, aggregating by average co2TailpipeCpm for each of the category.

Emission by Make

The 2 barplots show Car makes with the most average co2Tailpipe emission and with the least average co2 tailpipe emission.

co2Tailpipe Emission by Vehicle

The 2 barplots shows vehicles with most co2TailPipe emission and least co2 tailpipe emission respectively.

co2Tailpipe Emission bu Fuel Type

co2Tailpipe Emission by FuelType

Data Modelling

We utilized 70% data for training and 30% for validation. Also, we stratified the data by Fuel_type to ensure the distribution is preserved in train and validation data.

We training the data using 4 algorithms: RandomForest, K-Nearest Neighbor, Decision Tree, Artificial Neural Network, Extratree Regressor.

We utilized the R² scoring metric to evaluate the models and also used Gridsearch crossvalidation to fine tune hyper parameters for each of the model.

Decision Tree

Regression plot for Decison Tree

K-Nearest Neighbor

Regression plot for KNN

Random Forest

Regression Plot for Random Forest

ExtraTree Regressor

Regression plot for ExtraTree Regressor

Artificial Neural Network

Regression plot for ANN

The Extratree Regressor with an R² score 100% on train data and 99% on the Test data is the best Algorithm for our model.

Feature Importance

Feature Importance

This barplot shows the features and their importance to the model.

Conclusion

Through this project, we (I and Osuagwu Oluchi)unraveled the fascinating realm of CO2 emissions and fuel economy in vehicles. Our analysis identified the vehicles with the highest fuel economy and CO2 emissions, shedding light on the crucial factors contributing to these extremes. By understanding these factors, we can work toward developing more sustainable transportation solutions that promote fuel efficiency, reduce CO2 emissions, and pave the way for a greener future.

As we continue to explore and harness the power of machine learning, we have the potential to revolutionize the automotive industry and drive positive change in our efforts to combat climate change. By prioritizing fuel efficiency and reducing CO2 emissions, we can create a more sustainable and environmentally friendly transportation landscape for generations to come.

--

--

Akalazu Clinton

MS Informatics | Data Science and Analytics | Machine Learning