Curse Of Dimensionality: The curse that all ML Engineers need to deal with

Anuj Shrivastav
Analytics Vidhya
Published in
3 min readFeb 28, 2020

It basically talks about the issues we face because of high dimensional data.

Let’s suppose we are solving a classification problem over a dataset with 3 binary features f1 , f2 and f3 . Maximum number of distinct points we can get is 2³ = 8 .

What if we have 10 binary features f1 , f2 , …, f10 ?

Maximum number of distinct points we can get is 2¹⁰ = 1024

Can you guess the problem?

As the number of features increases, the requirement of data points for better classification also increases, and that too exponentially.

Imagine when we have real valued features

[Source : my PC ]

In the above figure, due to lack of points in the highlighted region, we can’t say with certainty what class a new point will belong to if it falls in that region.

There is also a phenomenon known as Hughes phenomenon which states that “given fixed number of data points, performance of a regressor or a classifier first increases but later decreases as the number of dimensions of the data increases”

[Source: Google Images]

The second problem that Curse of Dimensionality talks about it with respect to Distance measures especially Euclidean distance.

Let’s first define some terms

distmin(xi) = Euclidean distance to closest point xj from xi

distmax(xi) = Euclidean distance to farthest point xk from xi

If we take n random points, and a reference point Q

For small number of dimensions:

But as number of dimensions increases:

[Source : Wikipedia]

What does it say?

This means when we have high number of dimensions distmax (d) ≈ distmin (d) , i.e. all points would be at same distance from each other.

In High Dimensions, Euclidean Distance loses its significance!!!

And this is why models like K-Nearest Neighbours which works on distances , fail in high dimensions.

Other problem that Curse of Dimensionality talks about is Overfitting of data points.

Well, I won’t go into much detail about this as it can be well understood just by looking at the following image

[Source: Coursera Andrew Ng Course on Machine Learning]

In the right most plot , there are more features (x , x² , x³ and x⁴) which has led to overfitting of data.

Now one obvious question might be :

We would get a better class separating hyperplane when we project our data points into higher dimension , then why should we care about Curse of Dimensionality? And doesn’t it violate Hughes phenomenon?

Well, the simplest answer would be that if the newly created features (more dimensions) are uncorrelated with rest of the features and are adding some real value to our task, there’s no harm in increasing the number of features.

That’s all for now ….

--

--