Irreverent Demystifiers

The Curse of Dimensionality… minus the curse of jargon

In a nutshell, it’s all about loneliness

Cassie Kozyrkov
Towards Data Science
7 min readSep 11, 2020

--

The curse of dimensionality! What on earth is that? Besides being a prime example of shock-and-awe names in machine learning jargon (which often sound far fancier than they are), it’s a reference to the effect that adding more features has on your dataset. In a nutshell, the curse of dimensionality is all about loneliness.

In a nutshell, the curse of dimensionality is all about loneliness.

Before I explain myself, let’s get some basic jargon out of the way. What’s a feature? It’s the machine learning word for what other disciplines might call a predictor / (independent) variable / attribute / signal: information about each datapoint, in other words. Here’s a jargon intro if none of those words felt familiar.

If you prefer to listen to the explanation before looking at pretty pictures, here’s the 2min video.

When a machine learning algorithm is sensitive to the curse of dimensionality, it means the algorithm works best when your datapoints are surrounded in space by their friends. The fewer friends they have around them in space, the worse things get. Let’s take a look.

Data social distancing is easy: just add a dimension. But for some algorithms, you may find that this is a curse…

One dimension

Imagine you’re sitting in a large classroom, surrounded by your buddies.

You’re a datapoint, naturally. Let’s put you in one dimension by making the room dark and shining a bright light from the back of the room at you. Your shadow is projected onto a line on the front wall. On that line, it’s not lonely at all. You and your crew are sardines in a can, all lumped together. It’s cozy in one dimension! Perhaps a little too cozy.

Two dimensions

To give you room to breathe, let’s add a dimension. We’re in 2D and the plane is the floor of the room. In this space, you and your friends are more spread out. Personal space is a thing again.

Note: If you prefer to follow along in an imaginary spreadsheet, think of adding/removing a dimension as inserting/deleting a column of numbers.

Three dimensions

Let’s add a third dimension by randomly sending each of you to one of the floors of the 5-floor building you were in.

All of a sudden, you’re not so densely surrounded by friends anymore. It’s lonely around you. If you enjoyed the feeling of a student in nearly every seat, chances are you’re now mournfully staring at quite a few empty chairs. You’re beginning to get misty eyed, but at least one of your buddies is probably still near you…

Four dimensions

Not for long! Let’s add another dimension. Time.

The students are spread among 60min sections of this class (on various floors) at various times — let’s limit ourselves to 9 sessions because lecturers need sleep and, um, lives. So, if you were lucky enough to still have a companion for emotional support before, I’m fairly confident you’re socially distanced now. If you can’t be effective when you’re lonely, boom! We have our problem. The curse of dimensionality has struck!

MOAR dimensions

As we add dimensions, you get lonely very, very quickly. If we want to make sure that every student is just as surrounded by friends as they were in 2D, we’re going to need students. Lots of them.

The most important idea here is that we have to recruit more friends exponentially, not linearly, to keep your blues at bay.

If we add two dimensions, we can’t simply compensate with two more students… or even two more classrooms’ worth of students. If we started with 50 students in the room originally and we added 5 floors and 9 classes, we need 5x9=45 times more students to keep one another as much company as 50 could have done. So, we need 45x50=2,250 students to avoid loneliness. That’s a whole lot more than one extra student per dimension! Data requirements go up quickly.

When you add dimensions, minimum data requirements can grow rapidly.

We need to recruit many, many more students (datapoints) every time we go up a dimension. If data are expensive for you, this curse is really no joke!

Dimensional divas

Not all machine learning algorithms get so emotional when confronted with a bit of me-time. Methods like k-NN are complete divas, of course. It’s hardly a surprise for a method whose name abbreviation stands for k-Nearest Neighbors — it’s about computing things about neighboring datapoints, so it’s rather important that the datapoints are neighborly.

Other methods are a lot more robust when it comes to dimensions. If you’ve taken a class on linear regression, for example, you’ll know that once you have a respectable number of datapoints, gaining or dropping a dimension isn’t going to making anything implode catastrophically. There’s still a price — it’s just more affordable.*

*Which doesn’t mean it is resilient to all abuse! If you’ve never known the chaos that including a single outlier or adding one near-duplicate feature can unleash on the least squares approach (the Napoleon of crime, Multicollinearity, strikes again!) then consider yourself warned. No method is perfect for every situation. And, yes, that includes neural networks.

What should you do about it?

What are you going to do about the curse of dimensionality in practice? If you’re a machine learning researcher, you’d better know if your algorithm has this problem… but I’m sure you already do. You’re probably not reading this article, so we’ll just talk about you behind your back, shall we? But yeah, you might like to think about whether it’s possible to design the algorithm you’re inventing to be less sensitive to dimension. Many of your customers like their matrices on the full-figured side**, especially if things are getting textual.

**Conventionally, we arrange data in a matrix so that the rows are examples and the columns are features. In that case, a tall and skinny matrix has lots of examples spread over few dimensions.

If you’re an applied data science enthusiast, you’ll do what you always do — get a benchmark of the algorithm’s performance using just one or a few promising features before attempting to throw the kitchen sink at it. (I’ll explain why you need that habit in another post, if you want a clue in the meantime, look up the term overfitting.)

Some methods only work well on tall, skinny datasets, so you might need to put your dataset on a diet if you’re feeling cursed.

If your method works decently on a limited number of features and then blows a raspberry at you when you increase the dimensions, that’s your cue to either stick to a few features you handpick (or even stepwise-select if you’re getting crafty) or first make a few superfeatures out of your original kitchen sink by running some cute feature engineering techniques (you could try anything from old school things like principal component analysis (PCA) — still relevant today, eigenvectors never go out of fashion — to more modern things like autoencoders and other neural network funtimes). You don’t really need to know the term curse of dimensionality to get your work done because your process — start small and build up the complexity — should take care of it for you, but if it was bothering you… now you can shrug off the worry.

To summarize: As you add more and more features (columns), you need an exponentially-growing amount of examples (rows) to overcome how spread out your datapoints are in space. Some methods only work well on long skinny datasets, so you might need to put your dataset on a diet if you’re feeling cursed.

P.S. In case you interpreted “close in space” as having to do with scale, let me put that right. This isn’t about the effect of measuring in miles versus centimetres, so we won’t try to blame an expanding universe for our woes — and you can’t dodge the curse by simple multiplication. Instead, maybe this picture will help you intuit it in 3D; it’s less a matter of how big this spherical cow, er, I mean, meow-emitter is… and more a matter of how many packing peanuts it is covered in. Image: SOURCE.

Thanks for reading! Liked the author?

If you’re keen to read more of my writing, most of the links in this article take you to my other musings. Can’t choose? Try this one:

Connect with Cassie Kozyrkov

Let’s be friends! You can find me on Twitter, YouTube, Substack, and LinkedIn. Interested in having me speak at your event? Use this form to get in touch.

How about an AI course?

If you had fun here and you’re looking for an applied AI course designed to be fun for beginners and experts alike, here’s one I made for your amusement:

Enjoy the entire course playlist here: bit.ly/machinefriend

--

--

Chief Decision Scientist, Google. ❤️ Stats, ML/AI, data, puns, art, theatre, decision science. All views are my own. twitter.com/quaesita