Car Price Prediction Using Machine Learning

Nivitus
8 min readAug 11, 2020

--

Hello Everyone My Name is Nivitus. Welcome to the Car Price Prediction Tutorial. This is another Machine Learning Blog on Medium Site. I hope all of you like this blog; Fine I don’t wanna waste your time. Let’s get ready to continue our Journey.

So far so good, today we are going to work on a dataset which consists information about the Car Names, Year, Selling Price, Actual Price and other aspects such as Fuel Type etc.

When we work on these sorts of data, we need to see which column is important for us and which is not. Our main aim today is to make a model which can give us a good prediction on the price of the Car Price based on other variables. We are going to use Linear Regression and some other ML algorithms for this dataset and see if it gives us a good accuracy or not.

Table of Contents:

· Overview

· Motivation

· Understand the Problem Statement

· About the Dataset

· About the Algorithms used in

· Data Collections

· Data Preprocessing

· Exploratory Data Analysis(EDA)

· Feature Engineering

· Data Cleaning

· Feature Observation

· Feature Selection

· Model Building

· Model Performances

· Prediction and Final Score

· Project Deployment

· Output

Overview

In this Blog we are going to do implementing a scalable model for predicting the car price prediction using some of the regression techniques based of some of features in the dataset. In other things about we will see it in upcoming parts …

Motivation

The Motivation behind this blog I am always love to know the car models and its types. One day I was see an article in google it just about the top 10 power full and big budget cars in the world. At that time I got an idea about why shouldn’t I do this project for predicting car prices based on the features like fuel type and transmission type. That’s why I decided to write the blog.

Understand the Problem Statement

Don’t get confuse about in this project problem statement. It’s actually very simple note this we are going to predicting the selling price from the present price based on the features. It’s seems like the car was sold in a particular price from like age of cars and some other features. We’ll see about all over the things in upcoming section.

About the Dataset

In this Dataset I got from the kaggle. As well as here I mentioned some of the things about the dataset like features. The goal of this project is to create a regression model that is able to accurately estimate the price of the car given the features.

Data Overview

Data Overview

1. Car_Name — Denotes Name of the Cars

2. Year — Denotes Year of Bought

3. Selling_Price — Denotes Price of sold

4. Present_Price — Denotes Current Price

5. Kms_Driven — Counts how many number kilo meters driven in a car

6. Fuel_Type — Denotes types of the fuel

7. Seller_Type — Denotes Seller type

8. Transmission — Denotes types of the transmission in a car

9. Owner — Denotes how many number of the owners already kept the same car.

About the Algorithms used in

The major aim of in this project is to predict the car prices based on the features using some of the regression techniques and algorithms.

1) Random Forest Regressor

Machine Learning Packages are used for in this Project

Machine Learning Packages Used in this Project

Data Collection

I got the Dataset from Kaggle. This Dataset consist several features such as Name of the cars, Selling Price, Present price and Fuel type and so on. Let’s know about how to read the dataset into the Jupyter Notebook. You can download the dataset from Kaggle in csv file format.

Code for collecting data from CSV file into Jupyter Notebook!

# Import libraries

import numpy as np

import pandas as pd

# Import the dataset

df = pd.read_csv(“train.csv”)

df.head()

Dataset

Data Preprocessing

In this Car Price Dataset we need not to clean the data. The dataset already cleaned when we download from the Kaggle. For your satisfaction I will show to number of null or missing values in the dataset. As well as we need to understand shape of the dataset.

# Shape of the Dataset

print(“Shape of the Dataset”,df.shape)

Shape of the Dataset (301, 9)

# Checking the Null or Missing Values

df.isnull().sum()

There is no Null Values

Exploratory Data Analysis

In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.

Data Description

Note: Here the Features are only Numerical values left are Categorical values. We’ll see about it in the upcoming section.

Data Information

Feature Engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

Note: Here we are going to extracting the new feature from the

Existing features for predicting the output. That’s the technique will prove your domain knowledge. Here as well as we are going to do handing the categorical features in the car price dataset.

Data Cleaning

# Handing the Categorical Values

final_df = pd.get_dummies(final_df,drop_first=True)

Note: After cleaning the dataset our final dataset look like

final_df.head()

After Got the Feature Extraction

Note: Here the one of the feature is Age of Cars its driving from given below method.

Age of Car = Current Year — Bought Year

For Example,

6 = 2020–2014

Correlation of each Features

Correlation of the Dataset

Feature Observation

# Plotting the heatmap of correlation between features

plt.figure(figsize=(10,10))

sns.heatmap(final_df.corr(), cbar=False, square= True, fmt=’.2%’, annot=True, cmap=’Greens’)

Heatmap for Cars

sns.pairplot(final_df)

Pairplot I think you can’t see any thing in this plot images

import seaborn as sns

#get correlations of each features in dataset

corrmat = df.corr()

top_corr_features = corrmat.index

plt.figure(figsize=(13,13))

#plot heat map

g=sns.heatmap(df[top_corr_features].corr(),annot=True,cmap=”RdYlGn”)

Another Heatmap it’s only for Selected Features

sns.set_style(‘whitegrid’)

sns.countplot(x=’Fuel_Type’,data=df)

Bar Char for Fuel Types of the Car

sns.set_style(‘whitegrid’)

sns.countplot(x=’Seller_Type’,data=df)

Dealer OR Individual

sns.set_style(‘whitegrid’)

sns.countplot(x=’Transmission’,data=df)

Transmission Type of the car

sns.distplot(df[‘Kms_Driven’].dropna(),kde=False,color=’darkred’,bins=40)

Range of Kilometres Driven

sns.distplot(df[‘Present_Price’].dropna(),kde=False,color=’darkblue’,bins=40)

Present Price Plotting

Feature Selection

Feature Selection is the process where you automatically or manually select those features which contribute most to your prediction variable or output in which you are interested in. Having irrelevant features in your data can decrease the accuracy of the models and make your model learn based on irrelevant features.

# Lets try to understand which are important feature for this dataset

from sklearn.feature_selection import SelectKBest

from sklearn.feature_selection import chi2

X = final_df.iloc[:,1:]

y = final_df.iloc[:,0]

from sklearn.ensemble import ExtraTreesRegressor

model = ExtraTreesRegressor()

model.fit(X,y)

print(model.feature_importances_)

[0.38268003 0.04197867 0.00119488 0.07688097 0.22028522 0.011101890.12807213 0.13780622]

# Important features for Car Price Prediction Dataset

feat_importances = pd.Series(model.feature_importances_, index=X.columns)

feat_importances.nlargest(5).plot(kind=’barh’)

plt.show()

Important Feature for making model

Model Fitting

Random Forest Regressor

Note: Here we have use gridsearch cv for better prediction.

It’ll be take some time
Prediction Points

Model Performance

sns.distplot(y_test-predictions)

Selling price range

plt.scatter(y_test,predictions)

Predicted Points

Prediction and Final Score

Final Prediction and Output

Project Deployment

In this project I already deployed in one the cloud Platform which is Heroku. Here I’ll give my project demo you can check it out. If you don’t know about Heroku platform just click here.

Output & Conclusion

Thank you so much for reading my blog !!!

From the Exploratory Data Analysis, we could generate insight from the data. How each of the features relates to the target. Also, it can be seen from the evaluation of three models that Random Forest Regressor performed well.

I Hope all of You Like this blog. If you wanna say more about in this blog just contact me.

Name: Nivitus

Mobile Number: 9994268967

Email: nivitusfdo007@gmail.com

You can Ping me on these

LinkedIn
Github

--

--

Nivitus

AI Engineer | Jetson AI Specialist | Computer Vision | Deep Learning | Want to be a part of Robotics & Self Driving Car | I’ll teach here what I’ve learned :)