Finding Similar Older Phone Models

Pranav Vijay
INST414: Data Science Techniques
5 min readMar 24, 2024

Question: What phone in the market is most comparable to the mobile phone I own currently?

The stakeholder asking this question is a person who does not have enough money to buy a new phone and wants to buy an older model phone, similar to the one they own now. The newer phones will have more features, and the prices may be steeper. They want to study the phones in the market that are comparable to the phone they have now and can use the information to buy the replacement phone.

The decision the stakeholder will make after finding the answer to this question is buying a phone that is very comparable to the one they have now.

The data that can answer this question is a dataset of multiple phone models and the many features of a phone. The fields the dataset would contain are the model name, brand name, and other important features such as the screen size, length of battery life, price, memory size, and number of cameras. This data would be relevant to my question since I can use these features to calculate the similarity between phones and find the most similar phones for a specific model.

I used the Kaggle website to collect a subset of this data. Kaggle contains many datasets that are available for downloading. The dataset I downloaded was called “Mobile Phones Data,” which was uploaded by Artem Pozdniakov. I created a Jupyter notebook to create a program to analyze the data. I used a Python kernel to run my program. I downloaded the CSV file of the dataset from Kaggle and read it in Python through a dataframe from the Pandas library using the read_csv() function. I imported the euclidean function from the scipy.spatial.distance module in order to perform euclidean distance calculations.

I am measuring similarity between points by seeing how close the points are to one another. I will be measuring similarity by looking at the best price, screen size, memory size, and battery size of phone models. The similarity metric I am using is the euclidean distance metric. I used L1 normalization in order to normalize the rows in my dataframe before calculating the euclidean distance in order to get better results.

The first phone I did euclidean distance on was the Astro A173 Black/Orange model. Here are the 10 most similar phones:

  1. 2E E280 Dual Sim Black
  2. ERGO B181 DUAL SIM BLACK
  3. Astro A144 Black/Red
  4. Astro A225 Black
  5. ERGO B241 Black
  6. Blackview BV1000 Black
  7. Viaan V241 (Black)
  8. ERGO F186 Solace DS Silver
  9. 2E E180 2019 Red (680576170057)
  10. Sigma mobile X-TREME IO93

The first phone I calculated the euclidean distance for was the Google Pixel 2 64GB Just Black model. Here are the 10 most similar phones:

  1. Samsung Galaxy S8+ SM-G955U 64GB Black
  2. LG V50 ThinQ 5G 6/128GB Single Sim Black
  3. HUAWEI Nova 4 8/128GB Black
  4. Motorola One Zoom 4/128GB Purple
  5. Motorola One Zoom 4/128GB Grey
  6. Apple iPhone 6s Plus 64GB Space Gray (MKU62)
  7. Samsung Galaxy Note10 Lite SM-N770F Dual 6/128GB Black (SM-N770FZKD)
  8. Samsung Galaxy A71 2020 SM-A715F 8/128GB Blue
  9. OPPO Reno 4 Lite 8/128GB Magic Blue
  10. Google Pixel 128GB (Quite Black)

The third phone I calculated the euclidean distance for was the Samsung Galaxy A30s 4/64GB Green model. Here are the 10 most similar phones:

  1. Cubot Kingkong mini 3/32GB Yellow
  2. Cubot Kingkong mini 3/32GB Red
  3. Meizu 15 Plus 6/64GB Gray
  4. Meizu 15 Plus 6/64GB Black
  5. Blackview BV9800 6/128Gb Black
  6. Honor 10 lite 3/64GB Blue
  7. Samsung Galaxy A20e SM-A202F 3/32GB Black SM-A202FZKD
  8. Xiaomi Redmi Note 9T 4/64GB Daybreak Purple
  9. Samsung Galaxy A6+ 3/32GB Lavender
  10. Motorola Moto X (2nd. Gen) (Black) 16GB

For the Astro A173 Black/Orange model, the most similar phone is the 2E E280 Dual Sim Black model. For the Google Pixel 2 64GB Just Black model, the most similar phone is the Samsung Galaxy S8+ SM-G955U 64GB Black. For the Samsung Galaxy A30s 4/64GB Green model, the most similar phone is the Cubot Kingkong mini 3/32GB Yellow model. By using the Euclidean Distance, I was able to see which models had the closest numbers for the best price, screen size, memory size, and battery size. The model with the smallest distance was the most similar, which provided the answer for the question.

Here is a table of the top 10 phones similar to the Astro A173 Black/Orange model sorted by euclidean distance(11 rows are displayed since the first row is the Astro A173 Black/Orange model):

Here is another table of the top 10 phones similar to the Google Pixel 2 64GB Just Black model sorted by euclidean distance(like before, 11 rows are displayed since the first row is the Google Pixel 2 64GB Just Black model):

Here is one more table of the top 10 phones similar to the Samsung Galaxy A30s 4/64GB Green model sorted by euclidean distance(similar to before, 11 rows are displayed since the first row is the Samsung Galaxy A30s 4/64GB Green model):

I cleaned up my data by making sure that no values were missing from the data. After reviewing the dataset, there were rows that were missing data. To solve this issue, I used the dropna() function to drop rows that contain null values. I also reviewed my data to see if there were duplicate values in the dataset. After reviewing, there were duplicate values present. To solve this issue, I used the drop_duplicate() function and specified a subset of the column “model_name” in order to drop duplicate rows with the same model name. I also removed any columns that weren’t needed in my current data analysis. In my dataframe, I removed the “os”, “release_date”, “sellers_amount”, “popularity”, “lowest_price”, and “highest_price” columns. I also removed an unnamed column for the row ID numbers. I created another dataframe as a copy of the original dataframe I used to read the CSV file. In this dataframe, I removed the “model_name” and “brand_name” columns since these columns weren’t necessary when calculating the euclidean distance with this dataframe.

One limitation of my analysis is that this dataset only contains phone models released until 2020. The stakeholder could be looking for a phone released in 2021 or 2022. In my subset, there are many features missing, like the number of cameras and battery life, which may be important to people looking for a new phone to buy. This analysis may be biased since I am only focusing on the best price, screen size, memory size, and battery size of phones when comparing phones. People may have other priorities when looking for similar phones, like the operating system used and the brand name.

Here is a link to my GitHub repository that contains a Jupyter notebook I used to calculate euclidean distance and create the 3 tables for the top 10 similar phones. The GitHub repository also contains the CSV file of the original dataset that I analyzed.

Link: https://github.com/pvijay2024/module3

--

--