The Linda Problem and what you can do about it
Dipankar Dutta

Great article and discussion of the problem! One thing that I wondered when I originally read Kahnneman’s analysis, though, (and that has continued to nag me upon reading subsequent analyses) is the tendency to ignore the role that the grammatical structure of the problem plays to prime a respondent’s choice.

When we present all of the information about Linda and ask the question “is she more likely to be a bank teller or a bank teller and active in the feminist movement?” the suggestion implied by the way those options are presented is that the first category excludes the possibility of her being active in the feminist movement.

For example if you were to ask someone:

Which ice cream is your sprinkle-loving friend more likely to choose:
1. Chocolate ice cream
2. Chocolate ice cream with sprinkles

Even though the first option doesn’t explicitly exclude the possibility of there being sprinkles on the chocolate ice cream the second option creates a mental category for sprinkles in the respondent’s mind with which to compare the two options presented.

The biggest problem with this question is not the way our brains work but the way our language works.

Any pairing of things in which the second option is simply a sub-category of the first (without explicitly stating it as such) will always prime respondents to think that the probabilities being compared are joint probabilities of two conditions, rather than a marginal probability and a joint probability.

Some examples without all the pretext of the Linda problem:

  • A glass or a glass filled with water
  • A coat or a coat that is buttoned
  • A trip to Spain or a trip to Spain with free breakfast

It’s virtually impossible for someone to read that list and not have the “homunculus in their head” read the following:

  • A(n empty) glass or a glass filled with water
  • A(n unbuttoned) coat or a coat that is buttoned
  • A trip to Spain (without free breakfast) or a trip to Spain with free breakfast

We read both conditions into each option not because of some fundamental misunderstanding of statistics, but because both the hidden structure of our language and the kinds of decisions we make on a regular basis suggest if one option presents two conditions, we must be choosing between those two options on the basis of both conditions.

Perhaps the best evidence for this point is the way that you narrate your variation of the problem:

Most people now would pick #1. Because intuitively, it is less likely that at any given day, there is going to be both thunderstorm and sunshine than just sunshine.

The first option isn’t “just sunshine” it’s technically sunshine with or without the possibility of there being thunderstorms in the afternoon, but that’s such an unnatural comparison that we don’t even have a concise way of articulating it.

Because the marginal probability of one condition will always be greater than the joint probability of two conditions, it’s not actually a choice. As a result, our language doesn’t facilitate presenting it as such, and instead reframes both options as joint probabilities. Which is why the question is more a test of grammatical rigor than statistical intuition.