What is the Curse of Dimensionality?
The curse of dimensionality refers to various problems that arise when working with high-dimensional data. In this article we will discuss these problems and how they affect machine learning algorithms, and then suggest a few ways to deal with them.
Living in a High-Dimensional Space
We are so used to live in three dimensions, that our intuition fails when we try to imagine things in high-dimensional spaces. Many things behave very differently in high-dimensional spaces.
For example, if we pick a random point in a unit square (a square whose side length is 1), its probability of being located less than 0.01 from the border of the square is less than 2%. This is because the area of the unit square is 1, whereas the area of the inner square with side length 0.99 is 0.99 × 0.99 = 0.9801. Therefore, the probability that the point falls inside the area between the inner square and the unit square is 1–0.9801 = 0.0199.
On the other hand, if we pick a random point in a unit hypercube with 1,000 dimensions, its probability of being located less than 0.01 from the border of the hypercube is greater than 99.99%!
The reason is that the volume of the unit hypercube is 1¹⁰⁰⁰ = 1, while the volume of the inner hypercube with side length of 0.99 is 0.99¹⁰⁰⁰ = 0.00004317. Therefore, the probability that the point falls inside the volume between the inner hypercube and the unit hypercube is 1–0.00004317 = 0.99995683.