Unveiling the Green Revolution: Predicting CO2 Emission and Fuel Economy of Vehicles for a Sustainable Future

Akalazu Clinton
6 min readJun 5, 2023

--

In our ever-evolving world, the need for sustainable transportation has become increasingly crucial. As concerns about climate change and environmental impact grow, it is essential to understand and identify vehicles that not only offer better fuel economy but also emit lower levels of CO2.

In this post, I and Osuagwu Oluchi delve into a machine learning project that aims to predict the CO2 emission and fuel economy of vehicles. By leveraging the power of data analysis and machine learning techniques, we embark on a journey to uncover insights that can guide us toward a more sustainable future.

Through data analysis, we aim to answer these key questions: Which vehicles exhibit the highest fuel efficiency, enabling drivers to go the extra mile while reducing their carbon footprint? Which vehicles emit the lowest levels of CO2, contributing to cleaner air and a healthier environment? What are the features that determine high C02 tailpipe emission and Fuel economy in vehicles?

By examining the intricacies of CO2 emissions and fuel economy, we strive to empower individuals and communities to make informed choices and contribute to a greener and more sustainable planet.

Data source: https://www.fueleconomy.gov/feg/ws/index.shtml#fuelType1

codes: https://github.com/Kalazclint/Co2-Emission-pROJECT

Data Cleaning and Processing

The data consist of 2 tables vehicle.csv and emission.csv which we joined by concatenating. Columns with above 20% missing data were dropped.

Feature Selection

This project involves prediction of fuel economy and emission, and from the documentation of this data set (https://www.fueleconomy.gov/feg/ws/index.shtml#fuelType1), there are two fuel types for each prediction, fuel type1 and fuel type2, however fuel type2 was dropped because it has over 93% missing values. So, we worked with features related with fuel type 1 and used the fuel type input variable because it is a combination of both fuel type one and two. There are two co2 as target features, co2 and co2TailpipeGpm, from the documentation, they have same meaning but they actually have different data inputs, so we view them individually and used just one.

A view at the 2 co2Tailpipe

It can be seen that co2 has -1 data inputs.

The amount of carbon dioxide that is emitted into the atmosphere by a vehicle’s combustion of fossil fuels is measured by its CO2 emissions, which technically cannot be negative.

However, a vehicle could have “negative emissions” in the sense that it uses carbon capture and storage (CCS) technology to lower the quantity of CO2 in the atmosphere.

There aren’t any vehicles on the market right now that use carbon capture and storage (CCS) technology to achieve negative emissions.

Therefore, we dropped the co2 feature and used the co2tailpipe target co2 emission.

For Fuel Economy

The feScore feature is the target variable. This is a fuel economy score assigned to each variable. It ranges from 1 to 10, where a vehicle with 10 rating is most economic and vehicle with 1 rating is the least economic. The goal was to predict Fuel economy mainly from vehicle properties. So, features selected were mainly vehicle properties.

Features for Fuel Economy prediction

Dealing with Outliers

Columns boxplot

Exploring Numerical Variable

From here we can observe:

  1. Lower Engine cylinders have a high feScore showing high fuel economy and higher engine cylinders have low fuel economy. We can conclude that the more the cylinder in a car, the lower the fuel economy
  2. Lower Engine displacement (displ) show higher Fuel economy (feScore) and vice versa.

Let’s make further plots to verify our findings:

feScore vs Cylinders

The boxplot further shows that Cylinder size inversely varies with Fuel economy.

FeScore vs Displ

This also shows that Engine Displacement caries inversely with Fuel economy.

From the plots, small station Wagons, compact cars and midsize cars have the highest average fuel economy, while Vans, special purpose vehicles and SUVs have the lowest average fuel economy.

CNG have the highest average fuel economy amongst fuel-type while Midgrade have the lowest average fuel-economy amongst fuel type.

Fuel Economy and Vehicle Make

The barplot show car make and their respective fuel economy. We can see that, Smart has the highest fuel economy while Audi have the least fuel economy.

Fuel Economy by Vehicle

The barplot above shows car make and model with average Fuel economy score of 10, which implies the highest fuel economy.

The barplot above shows vehicles with Fuel economy of 1 which implies the lowest fuel economy.

Model Development

Categorical variables were encoded using LabelEncoder and Numerical variables were scaled using the standard scaler

After separating the data into train and test data, we split it into training and validation data, and stratified by FuelType to ensure eliminating bias in our model.

3 Machine Algorithms were used to model our data (Random forest, Decision Tree, ExtraTree Regressors) and the R² score was utilized as our evaluation metric. We tuned the hyperparameter using grid searchCross-validation. Below is a snippet of our code.

Decision Tree

Random Forest

K-Nearest Neighbor

ExtraTreesRegressor

Feature Importance

Feature Importance

The barplot above shows the most important predictors of Fuel economy. The top predictors of fuel economy are Annual fuel cost, Engine displacement, cylinders, and EPA vehicle size class. Our Exploratory analysis explains how these features relate with fuel economy.

Continue with our insights from modeling Co2 emission here

--

--

Akalazu Clinton

MS Informatics | Data Science and Analytics | Machine Learning