Ozzie Gooen
Mar 28, 2016 · 4 min read

TLDR: If you are estimating only positive numbers, you probably want to use lognormal distributions.

Take 3 seconds to guess the number of cars on the road in the United States. Then take 5 more to guess a range you are 90% confident it is within.

Here’s the real question: what distribution does your range represent?

If this seems abstract, imagine that you have a few buckets to choose from. Zero to 100 Million, 101 Million to 200 Million, 201 Million to 300 Million, etc. For each you have to guess the chance that the true number of cars will fall into that bucket.

Now imagine this with a lot more buckets. The shape of this narrow breakdown would represent your full confidence distribution.

A Uniform Distribution

Your confidence distribution is probably not uniform. Say your answer is 50 Million to 400 Million. A uniform distribution would imply that you think there is a 0% chance the answer is under 50 Million and a 0% chance it is over 400 Million (or slightly wider if it were a 90% interval).

A uniform distribution would also imply that you would consider it equally likely the number be absolutely anything inside this range. That it’s just as likely to be between 200 and 300 Million (the middle) as it is to be between 50 and 150 Million (the bottom). This probably isn’t right. If you had to make a bet that the number was in some bucket within your range, would you have absolutely no preference?

A Normal Distribution

The next option is the Normal Distribution. This is more reasonable. There’s a cluster in the middle of your range, and the extremes become less and less likely. There’s a small chance it could be over 500 Million, and a tiny chance it could be over 600 Million.

However, there’s also a good chance it would be less than 0.

A full histogram of a normal distribution with a 90% confidence interval of 50 to 400

In the image above, you can see that this normal distribution implies a significant chance that there are less than 0, or even -100 Million, cars in the United States. Unless you have a surprisingly convincing definition of a negative car, this is impossible.

One option here is to fix this problem by just removing all negative points, making a truncated normal distribution. This is somewhat of a crude hack for a problem like this.

What this really reflects is a more fundamental problem. Normal distributions are symmetric. The tails on both sides of the median have the same shapes, which seems strange if you could consider that extreme cases would probably exist only on the positive side. Surely there’s some chance of there being 800 Million cars. There’s a possible long tail of positive numbers above the median, but a much tighter tail of numbers less than the median.

A Lognormal Distribution

The last option here is the lognormal distribution. This is similar to the normal distribution, but is always over zero and has a far longer positive tail.

With a normal distribution, a 90% confidence interval from 50 to 400 Million would have a median of 140 Million, but there would be a 1% chance it could be over 600 Million. Plus, there’s a very small but possible chance it could be 800 Million. Maybe Ford made a secret chain of tiny vehicles for the military that are being used underground.

Doesn’t this feel right? There’s a long tail to the right, and a short tail to the left. This distribution may not exactly match your intuition, but it’s probably a lot better than the normal or uniform ones.

How This Works in Guesstimate

We support uniform, normal, and lognormal distributions in Guesstimate, but try to help you decide. Guesstimate attempts to infer the correct distribution type for your range, so you often don’t have to think about it. It often selects lognormal.

One specific exception is when the range includes negative numbers or zero. In these cases lognormal distributions are impossible, and normal distributions make more sense.

The Answer

So what was the right answer to the number of cars in the US? It’s around 250 Million. Not exactly the median of the range shown above, but not not too far out either.

The Guesstimate Blog

A spreadsheet for things that aren’t certain

Thanks to Matthew McDermott

Ozzie Gooen

Written by

Working on Guesstimate, a spreadsheet for things that aren’t certain

The Guesstimate Blog

A spreadsheet for things that aren’t certain

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade