Machine Learning Unveils the Factors Driving Headphone Prices

SHAP values for brand, colour, and more

Dmytro Iakubovskyi
Data And Beyond
Published in
3 min readJun 5, 2023

--

Photo by Jair Medina Nossa on Unsplash

In this article, I use the publicly available dataset containing 300+ specifications from headphones of selected brands taken from the Flipkart.com website. The dataset is also publicly available on Kaggle. Full details of the analysis can be found in this public Kaggle notebook.

Step 1 — data preprocessing

Here, data preprocessing consists of the following steps:

  • dropping duplicate records;
  • selecting labels (headphone prices in INR), and log10-transforming them ( x->np.log10(x) so that 1 INR transforms to 0.0, 10 INR to 2.0, etc.);
  • encoding rare categorical variables (brand, colour, form factor, and connectivity type) with no more than 50 different categories in each column and at least 10 records in each category;
  • finally, dropping unused columns.

As a result, we have obtained a cleaned dataset containing 318 headphone types with prices between 255 INR (3.1 USD) and 16,300 INR (200 USD).

Step 2 — setting a Machine Learning model to predict the log-transformed headphone price

--

--

Dmytro Iakubovskyi
Data And Beyond

Top writer in AI, Movies | Senior data scientist | Editor in Data And Beyond | https://www.linkedin.com/in/dima806/