FACTORS AFFECTING THE PRICES OF DIAMONDS

Sbonelo Ndhlazi
4 min readSep 25, 2022

--

Diamonds are minerals composed of carbon. They are the hardest naturally occurring substance known to men. Because of their extreme hardness, diamonds have diverse industrial applications. They are used as an abrasive in drilling, cutting, grinding and polishing.
But I know that most of us when we think about diamonds we imagine shinny jewelry.

We all know that diamonds are expensive, but do we really know why? So I took the liberty to analyse diamond prices and features to find just that. I did an exploratory data analysis whereby a dataset sourced from kaggle was used. This dataset contained nearly 54 000 round cut diamonds and their corresponding prices for the year 2022. These diamonds had varied prices because of their underlying properties.

The prices of diamonds are influenced by 4 variables referred to as 4Cs, these are:

  • Carat — weight of the diamond (1 carat is equal to 200 milligrams)
  • Cut — How well a diamond’s facets reflect light
  • Clarity — Refers to the absence of natural inclusions and blemishes
  • Color — Which actually means Lack of Color.

When analysing the relationship between the carat and the price, a correlation of 0.92 was found. This tells us that the carat and the price are strongly correlated, which means as the size of the diamond increase so does the price.

How Carat and Cut affect the price ?

The cut is classified as the following:

  • Premium
  • Ideal
  • Very good
  • Good
  • Fair

The graph below shows how the carat and the cut affect the price of diamonds. The higher the carat and the ideal the cut is, the more expensive the diamond becomes. There are few outliers of fair cut diamonds with high prices because of their high carat values.

How carat and clarity affect the price?

Clarity is classified as follows (as seen under a 10x magnification)

  • FL — Flawless (no inclusion and no blemishes)
  • IF — Internally Flawless (no inclusion)
  • VVS1 and VVS2 — Very, Very Slightly Included (so slightly, difficult to see)
  • VS1 and VS2 — Very Slightly Included (Observed with effort)
  • SL1 and SL2 — Slightly Included (Noticeable)
  • L1 and L2 — Included (obvious, may affect transparency)

The graph below shows how carat and clarity affect the price of diamonds. Diamonds with high clarity have high average prices as expected. Another noticeable pattern is that diamonds with low clarity (I1) turn to have high carat values hence, the high average prices.

How carat and color affect the price?

GIA Color Scale:

  • DEF — colorless very rare;
  • GHIJ — near colorless (tints of yellow and brown);
  • KLM — Faint;
  • NOPQR — Very light;
  • STUVWXYZ — Light

The graph below shows how carat and color affect the price of diamonds. The dataset only had colorless and near colorless diamonds. The colorless the diamond is, the higher the average price.

Using Decision Tree Regressor

Using Machine Learning, the relationship between the 4Cs and the prices is further illuminated.

Decision Tree https://www.kaggle.com/code/sbonelondhlazi/diamonds-dataset-analysis-2-machine-learning/notebook

Interpretation of the tree

The most important feature is the carat.

  • 1st, 2nd and even 3rd splits are on carat
  • Average prices of diamonds in the dataset is 3930 dollars
  • Diamonds less than 0.995 carat cost $1626 on average and those above 0.995 cost $8161

Clarity and cut appears on the 3rd and 4th splits, respectively

The part of the tree showing how clarity affects the price
  • Diamonds which are slightly included (where SL2<= 0.5 is False) cost less ($4637) compared to diamonds not slightly included (SL2 <= 0.5 is True) which cost $6669.

The same analysis can be done for the cut

Color seems to be the least factor in determining the price of diamonds.

Conclusion

There are four factors that affect the prices of diamonds. These main factors are diamond’s carat, its clarity, cut and color. However, the carat seems to be one factor that has the highest influence on the price of diamonds.

  • Flawless Diamonds have high prices as expected and in general they are smaller in size (low carat value). They also turn to have ideal to premium cuts
  • Premium cut diamond have high prices and Fair cut diamonds generally have lower prices.
  • Colorless diamonds are rare which makes them more expensive

For exploratory analysis using R click here.

For Machine Learning using Python click here.

--

--

Sbonelo Ndhlazi

I use Data to tell stories. I perform Data analysis using excel, SQL, Python and R as well as Data VIZ tools, namely Tableau and Power BI