What is Probability, Really?

Armin Nikkhah Shirazi
Technological Singularity
7 min readDec 1, 2023
Photo by ZSun Fu on Unsplasho7grSsuFSS0

Most people, if asked what probability actually is, would be hard pressed to give a clear answer. This is too bad, because while it is something to which we may not give much thought, it plays an exceedingly important role in our lives: every time you weigh a risk vs. a benefit, consider whether you should take an action or not, should believe something or not, and so on, you use probabilistic reasoning.

It turns out that like many foundational ideas about reality, the question of what it actually is can refer either to the concept of probability or to its interpretation. I will answer each in turn.

After you read this essay and absorb the information in it, you will have a clearer idea about probability than about 98% of the population, with high probability (pun intended).

The Concept

The concept of probability is that it is a unit measure over possibilities.

A measure is a quantitative indicator of size, so probability tells you about the sizes of different possibilities. A unit measure is an indicator of size such that all sizes have to add up to 1.

Let me walk through an example a bit slowly. Suppose that you wish to flip a coin.

First, we assume that it is impossible for the coin to land on its edge. We express this by giving this possibility for the outcome of a coin flip a zero size. If we wished to express this, the mathematical notation would be to write

P(edge)=0

This is a concise way of saying that the probability of obtaining the edge of the coin upon a coin flip is zero. Zero is the lowest one can go, there are no negative probabilities.

Next, (in principle) we assume that it is impossible for the outcome of the flip to yield the number 6, the letter F, the queen of hearts, an elephant etc. so we give all these possibilities also size zero.

There are only two outcomes of a coin flip which we deem to be possible-Heads and Tails-and we express this by giving these two-and only these two-a non-zero size for their respective possibilities.

How big of a size should we give each? That depends on how fair or biased the coin is (or you think it is, more on that below, when we get to the interpretation of probability).

Suppose the coin is fair. “The coin is fair” is a statement of the idea that each outcome is equally possible. So you express this by assigning to each possibility the same size.

Now remember, since probability is a unit measure, all sizes have to add up to 1. So, we have two possibilities, each has the same size and they have to add up to 1. That means each possibility must have size 0.5 or, in the language of percentages, 50%. Symbolizing Heads and Tails by H and T, respectively, we write this as

P(H)= 0.5 and P(T)=0.5

What if a coin is biased? Then you increase the size of the possibility towards which it is biased by the amount by which it is biased. Since both possibilities must add up to 1, increasing the size of one possibility must decrease the size of the other possibility by the same amount. So if the bias is strong enough to increase the size of the possibility of Heads (say) by 0.2 or 20%, then it will decrease the size of the possibility of Tails also by 0.2 or 20%, and we end up with

P(H)= 0.7 and P(T)=0.3.

Most situations have more than just two possibilities, so the math may become more complicated, but the idea remains the same: you assign to each possibility a size according to its likelihood such that when all sizes are added up, you get 1. That is conceptually what probability is.

One basic rule of probability that allows you to apply it usefully in your life is the following:

Alternative probabilities add, consecutive probabilities multiply.

For example, P(H) and P(T) are alternative probabilities, and since they are the only available alternatives, we have

P(H)+P(T)=1

On the other hand, the probability of obtaining H then T when throwing a fair coin twice in a row is:

P(H)×P(T)=0.25

The remaining 0.75 probability is evenly distributed over the other available alternatives for this situation:

P(H)×P(H)=0.25

P(H)×P(T)=0.25

P(T)×P(H)=0.25

P(T)×P(T)=0.25

so that when we add all alternatives, we once again obtain 1 for the total probability of attaining any possible outcome in this situation.

Notice that if the coin was biased as mentioned above, it would still be the case that

P(H)+P(T)=1

But now the probability distribution of two consecutive throws would be

P(H)×P(H)=0.49

P(H)×P(T)=0.21

P(T)×P(H)=0.21

P(T)×P(T)=0.09

which again all add up to 1. A perhaps surprising result is that when one outcome of a coin flip is 40% more probable than the other, flipping it over 2 flips is still 40% more probable than flipping the less likely outcome twice in a row.

Interestingly, the standard axiomatic theory of probability (formulated by Kolmogorov in the 1930s) fails to capture the concept of probability. That is because the theory cannot tell the difference between a set of possibilities and, say, regions on a unit stick, a unit volume or other unit quantities which are definitely not possibilities.

That this is almost never pointed out (and probably not even realized by most people who use it) has two reasons, I think:

  • Contemporary mathematicians largely have had no use for the distinction between possibilities and actualities (such regions on a unit stick), and so consequently largely ignored it.
  • Even if the rare situation came up where the distinction was recognized to be relevant, they could always “fall back on words”, i.e. give some verbal story about what the concept “means” in that context while leaving the math as is.

A negative side effect of this is that it tends to blind people to all the possibilities that surround them, and especially to the difference between situations involving outcome with unit probability, which we may also call a certainty, and an actual fact.

Part of my research involves separating possibilities from facts in the mathematics itself, but this is already going a bit deep into the philosophical foundations of probability.

The Interpretation

The interpretation of probability is that it indicates the nature of the likelihood of some outcome.

Now, likelihood is often used synonymously with probability, so in order to prevent the previous statement from becoming circular, I have to assign a different meaning to “likelihood” than just probability.

It turns out that over the history of probability (started in the 1660s by mathematicians who were asked by gamblers to help them attain an edge over other gamblers) people have given different meanings to what it is for something to be likely. Each of these meanings constitutes an interpretation because it tells us what the nature of this likelihood is.

I will give the four most popular ones that I know of:

  • Frequentist Interpretation: This is a popular interpretation which says that the likelihood of an outcome simply reflects how often it occurs in a large (in principle infinite) number of identical situations, compared to alternative outcomes. We call this the frequency of each outcome. Consider the coin flip example. If the coin is fair, then in the limit of an infinite number of coin flips, exactly half will be heads, and half will be tails, which we denote by P(H)=0.5 and P(T)=0.5. That’s what probability is, really, according to this interpretation.
  • Bayesian (Degrees of Belief) Interpretation: This is another popular interpretation which says that likelihood just reflects your degrees of belief in favor of some outcome. So, taking the coin flip, if you believe that the coin is fair, then you will assign degrees of belief equally to Heads and Tails, and since probability requires that these add up to 1, your degrees of belief give P(H)=0.5 and P(T)=0.5. That’s what probability is, really, according to this interpretation.
  • Propensity interpretation: This was advanced by the philosopher of science Karl Popper, and it says that likelihood simply reflects the propensity of something towards an outcome. So, again, taking the coin flip, a fair coin is equally disposed to yield Heads on a flip as it is to yield Tails. Since the rules of probability require all these propensities to add up to 1, the propensities give P(H)=0.5 and P(T)= 0.5. That’s what probability is, really, according to this interpretation.
  • Logical Interpretation: This interpretation was especially favored by the physicist Edwin Jaynes. In classical deductive logic, propositions are either True (Tr) or False (F). These are called Truth Values. But it is possible to regard Tr and F as opposite extremes on a unit scale by identifying Tr=1 and F=0. Numbers in between are then regarded as expressing intermediate truth values, and probability thereby becomes a kind of logic. Jaynes chose to subsume this under a Bayesian interpretation, but I think it can stand on its own as a logic, similar to the way other logics stand on their own without people trying to give “interpretations” to them. In this interpretation, then, likelihood is the truth value of a proposition. So, taking the coin flip, a fair coin simply instantiates a situation in which the proposition that the coin will yield H has the same intermediate truth value as the proposition that the coin will yield T. Since by the rules of probability the intermediate truth values have to add up to 1, this yields P(H)=0.5 and P(T)=0.5. That’s what probability is, really, according to this interpretation.

There are other interpretations and different variations of these, and it is not at all settled which interpretation is the “right” one. The reason is that each interpretation runs into trouble in some situations. I’ll mention one kind of problem for each interpretation:

  • How do you assign a frequentist interpretation to the probability for a unique event?
  • How do you account for objective likelihoods purely in terms of degrees of belief?
  • How do you characterize propensity as a physical property?
  • How do you relate logical propositions to things “out there” in the world?

Because choosing an interpretation has little effect on how probability is applied, most people who use probability have their own pet interpretations.

My own view is that what we call “probability” is actually a set of distinct classes of phenomena, unified by a common concept. This is called a pluralist view.

--

--

Armin Nikkhah Shirazi
Technological Singularity

I am a physicist, philosopher and composer-pianist. My main interest lies in the foundations of physics and related topics, and anything to do with philosophy