Predicting Price of Used Phones: Data Driven Approach

5 min readFeb 17, 2024

We live in a world that is driven by technology and electronic devices as gadgets have become a part of our daily life. It is near impossible to think of a world without smartphones or tablets. New device pop up in the market every single day which lures us to buy them but old phones are tough to let go. Which is why, we tried to understand if selling old phone and buying a new could be a good deal or not.

Data Collection:

Data are the most important part of this project
We scrapped data from various online secondary market which were extremely unstructured and had a lots of missing values
Most of the phones sold in secondary market were Apple products which is why data isn’t as diverse as it had to be.
We used scrapy to scrape the data.
One major problem with the data is that it wasn’t an absolute value, i.e the numbers were equivalent to what users wanted and not what it might have been actually worth. So, the price is more of a market expected price rather than actual price.

2. Feature Engineering:

Null value elimination:

Some of the Null values were eliminated by taking the average of the column and adding in the null row. It was done for Display Size in some of the data. We checked the brand and then depending on it, we eliminated the value.

Some Null values in Earphone, Warranty, and Charger were removed by simply replacing them with a 0 value.

Encoding:

Label Encoder was used in order to change the brand to brand integer and Model to model integer so that processing could be done. We converted Apple to 6, Oppo to 0, and so on to process the data.

String to Integer Conversion:

As we can’t process string data, we had to convert them to integers in order to process the data. For example, we had the front camera in the form of “5MP” which had to be changed to string. For doing this, we replaced “MP” with “ ” and changed the type to string. Similar processes had to be done for Battery, Back camera.For Pricing values and other values which had to be converted to a numerical value, we simply used excel. For example, the price of data was in string which was converted to numerical by changing the column to numeric data from excel.

Dropping Unnecessary feature:

The above-shown diagram is an example of a graph drawn between various features and their importance in our model. We rigorously implemented the above graph including other graphs like heatmap, and pair-plot to analyze which data were important and which are not. Analyzing such graphs, we decided to drop the following features:

Front camera
Back camera
Resolution
Display
OS
Model

3. Model Building:

a. Linear regression:

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. In linear regression, the goal is to find the best-fit line that can explain the relationship between the variables.

y = b + w1x1 + w2x2 + … + wnxn

y — Selling Price [Dependent Variable]

x — Various features [Independent Variable]

b,w1,w1 … wn — parameters to be estimated for best fit.

When we implemented Linear Regression Model in our data, We found the MAPE value as 23% which was higher than we wanted it to be.

b. Gradient Boosting Algorithm:

It is an ensemble method that combines multiple weak models to create a strong model that can make accurate predictions on new data.

When we implemented Gradient Boosting Algorithm in our data, Changing parameters and the result is listed below:

n_estimators = No of weak Models

Learning_rate = Step size to update weight

max_depth = Maximum depth of decision tree

subsample = percentage of training data used by model

n_iter_no_change = No of iteration with no change for stopping

n_estimators=5000

learning_rate=0.01

max_depth=3

subsample=0.3

n_iter_no_change=20

The MAPE observed was 14%

Changed n_estimators to 1000

The MAPE observed was 13%

n_estimators=2000

learning_rate=0.01

max_depth=5

subsample=0.4

n_iter_no_change=50

The MAPE observed was 12.24%

n_estimators=2000

learning_rate=0.03

max_depth=5

subsample=0.4

n_iter_no_change=20

The MAPE observed was 8.9%

n_estimators=1000

learning_rate=0.01

max_depth=5

subsample=0.4

n_iter_no_change=20

The MAPE observed was 9.01%

c. Random Forest Regression:

Random forest regression is a machine learning algorithm that is used for regression problems. It is an extension of the decision tree algorithm that uses a collection of decision trees to make predictions.

When we implemented Random Forest Regression Algorithm in our data, Changing parameters and the result is listed below:

n_estimators = No of weak Models

n_jobs = This parameter specifies the number of parallel jobs to run during training and prediction.

max_depth = Maximum Depth of Decision Tree

random state = parameter to a fixed value, the model will use the same subsets of features and the same random samples for bootstrapping each time you run it.

n_estimators = 100

max_depth = 10

n_jobs = 4

random_state = 42

The MAPE observed was 11.06%

n_estimators = 60

max_depth = 10

n_jobs = 6

random_state = 1

The MAPE observed was 10.86%

n_estimators = 100

max_depth = 30

n_jobs = 1

random_state = 41

The MAPE observed was 10.70%

n_estimators = 150

max_depth = 30

n_jobs = 1

random_state = 41

The MAPE observed was 10.51%

4. Summary of the project:

Initially, We collected data from various websites as mentioned above.
The data were cleaned using excel and the pandas module.
Feature engineering was done in order to:

Processing Null values:

Average calculation and substitution
Manual input of some missing data
Removing irrelevant or high variance column

Encode brand and models
Conversion of String to int and float for processing
Dropping of not required columns
Selecting required features only by using to see important features

Model Building:

Implementing models.
Evaluating models.
If satisfactory, Deploy model
Else go to (a) again

The model that we generated was saved as a pickle file.
Now, We started to work with UI part.
App and Projects were started using django
Created basic templates
Added Css to the project
Implemented backend to take in values from user
Used the pickle file to read the module.
Used the module to Predict the value.
The data was returned to backend.
It then converted it to dictionary to implement in the frontend part.
Displayed the output to the user.

I deployed it using python anywhere with minimalist Frontend just for people to try out.

Website — http://skafle239.pythonanywhere.com/valuate/

Source Code — https://github.com/sanatankafle12/minor_project

Predicting Price of Used Phones: Data Driven Approach

Written by sanatan kafle