A Billionaire Walks into a Bar Chart…

Michael Bagalman
Data Science Rabbit Hole
3 min readApr 26, 2024

--

image by Michael Bagalman & DALL-E

I’ve got a soft spot for outliers. Really.

Outliers keep us on our toes, adding a bit of spice to our data stew. We analyze, we strategize, and then — bam! — an outlier struts in and dropkicks our well-calculated average into the realm of the absurd. It bellows, “You can’t constrain me in a scatterplot or imprison me in a bar chart!”

But here’s the secret about these rabble-rousers. They arent all packing the same punch. Some outliers jab at the arithmetic mean, others toss it around like a ragdoll. Let’s spelunk just briefly down the data science rabbit hole and see what’s lurking down there.

Imagine, if you will, a dive bar frequented by average Joes and Janes — you know, your run-of-the-mill millionaires with pocket change amounting to a measly million or two. Suddenly, Elon Musk saunters in. The net worth thermometer in the bar doesn’t just spike, it explodes. Why? Because our modest million average morphs into a blazing billion faster than you can mutter, “outlier.”

We all know that Musk’s Tesla-sized fortune would warp the average — its no secret. But here’s where the plot gets thicker than a double chocolate fudge brownie. If you’ve got a swanky lounge full of billionaires and a penniless patron sneaks in, the average doesn’t nose-dive with the same gusto. And that’s not just because the bouncer swiftly sends the pauper packing!

Even in the cold and calculated world of descriptive statistics, the top 1% hog the spotlight. What’s the deal?

When we’re calculating the arithmetic mean, we add up all the values and divide by the total number of values. The ripple created by a small value in an ocean of large ones is like a mouse squeaking at a Metallica concert — barely a blip. But parachute a billionaire into a crowd of barely-millionaires and you’ve got a one-man-band in the library — good luck ignoring that!

What’s the solution? Flee to Canada and adopt socialist statistics? You know how long they have to wait for a median up there? The waiting lists are longer than a sloth’s marathon. And let’s not even start on metric.

In the wild west of data science, understanding this quirky, lopsided impact of outliers on the average is no minor detail. We can’t afford to be blindsided by rogue billionaires (or their data set equivalents). As thrilling as the prospect of a billionaire turbo-charging our income averages might be, in the stark daylight of reality, this kind of distortion can play merry havoc with accurate analysis and decision-making.

When wrestling with data, never forget: outliers are a motley crew. Some explode onto the scene, others tiptoe in like ninjas, but all demand our unswerving vigilance. To keep your averages from turning into rollercoaster rides, explore alternatives like the median or mode, which aren’t as ruffled by these statistical mavericks.

Remember, the devil isn’t just lurking in the details; it also loves masquerading as an outlier. And sometimes, that devil is a grinning billionaire, gleefully twirling your averages like a mustache and chuckling all the way to the offshore bank.

--

--