What an odd meaning! From vague to Quantitative statements

Huy Tran
Tamara Tech & Product
5 min readOct 24, 2023

One important skill of a data-driven person is to translate a vague question or statement into a more quantitative one — a very enjoyable practice!

Let us take a look at these examples:

  • Example 1: Imagine today you bump into a long-time no-see friend. “Hey, it’s odd to also see you here”
  • Example 2: Today is Dec 24th in Berlin, and it is shining at 25*C. “That’s odd.” Many people would say.
  • Example 3: An excerpt from a recent article “Stress, Depression Won’t Raise Your Odds for Cancer: Study” (dated Aug. 9, 2023).
  • Example 4: “Goldman Sachs says the odds of a government shutdown are now 90% — and it could last two to three weeks” (article, dated Sep 28, 2023)
  • Example 5: “Man City trophy odds • FA Cup Winners — 3/1” (article, dated July 26, 2023)

What does it mean by “odd” in those examples? Disclaimer: I am not an expert in linguistics nor a native English speaker but I’ll try to explain here.

The first two examples are adjectives and are synonymous with being strange. Can we quantify them somehow? Definitely! One way to do that is by:

  • Estimating that the probability of meeting that friend is ~ 0.1%.
  • Knowing that statistically the temperature in Berlin on Dec 24th is above 25*C only once during the last 50 years, or in other words, the chance of this event is 1/50=2%.

The last three examples are plural nouns which mean chance, probability, or possibility. If we search for the meaning of the singular version of “odd” in the Cambridge Dictionary, we will not find the noun version.

The third and fourth examples are aligned with the meaning of probability since, as we know, probability is a number between 0 and 1. However, in the last example: what does it mean by saying “odds 3/1”? According to Wikipedia:

Odds provide a measure of the likelihood of a particular outcome. They are calculated as the ratio of the number of events that produce that outcome to the number that do not.”

So with Example 5 above, the “odds 3/1” can be translated as the probability of Man City winning the FA Cup 1/(1+3) = 1/4 = 25%.

Now, an interesting question arises: why do people use the word “odds” instead of “probability”? One answer is how the odds are recalculated when something new happens.

The beauty of the “odd” thinking

First, let’s look at the definition of “odds” above from Wikipedia. They measure the likelihood of some outcomes. And in reality, the likelihood will change if there is new information. We distinguish the two likelihoods by naming them pre-odds and post-odds.

The formula to calculate the post-odds (or new-odds) loosely is as follows:

post-odds = a factor * pre-odds (1)

(see [1] in the Appendix below for a full-fletch version)

To illustrate, let’s consider a classification system detecting COVID-19 using a quick test kit. The kit instruction has the following statistics (rounded up for easy calculation later):

  • Sensitivity (i.e. recall, true positive rate) 95%.
  • Specificity (i.e. true negative rate) 99%.

Suppose that before the test, the odds of catching the virus are 1:9 (i.e. the probability is 1/10 = 10%, this number is called prevalence). And the test says positive. What are the odds of catching the virus now?

From (1), it is

post-odds = a factor * (1:9) (2)

In this situation, the factor in (2) is equal to 95 (a detailed calculation is in appendix (3)).

So (1) gives

post-odds = 95 * (1:9) = 95:9

This means the probability of catching covid given tested positive is 95/(9+95) = 91.3%.

What happens if the odds before the test are 1:19 (i.e. the probability is 1/20 = 5%)? Then the post-odds is 95 * (1:19) = 95:19 = 5:1, that gives probability = ⅚ = 83%.

So we can see from this example that even though the test kit is very good, the chance of having COVID-19 given a positive test depends on “pre-odds”. You might think of a similar statement in the case of a rare cancer detection.

Another thing that (2) shows is that the post-odds jump 95 times from pre-odds if the test shows positive. This is a way to quantify how useful the test kit is.

The usage does not end here. From the above argument, each time an event happens, the odds are changed by a multiplication factor. That means the odds can be the product of several numbers.

In data modeling, the word “odds” is no stranger. It appears in the logistic linear regression algorithm. This is an algorithm that assumes the logarithm (a mathematical function) of the odds is a linear function of the inputs (and hence the word “regression”):

“log (odds) = a linear function of inputs”

If we take exponential on both sides, we would see that the odds are the product of some numbers again!

We will discuss more about this algorithm in a later blog post. Now, you can practice this odd thinking with the following exercises:

Exercise 1: Suppose that there are two fraud detection systems. The sensitivity and specificity of one system are 90% and 50%. The other has 60% and 90% respectively. Now let one sample go through two systems. System 1 flags it, but the second doesn’t. Which system is more likely to be true?

Exercise 2: (interview question at a trading firm)

Suppose we began to dig into fraudulent user accounts and found large differences in likelihood if we segmented by their first traded asset. Specifically, we found that:

  • 70% of the first trades on the platform are equities.
  • 1% of these users were found to be fraudulent.
  • 20% of first trades are options.
  • 3% of these users were found to be fraudulent.
  • 10% of first trades are crypto.
  • 7% of these users were found to be fraudulent.

Given that a user is found to be fraudulent, what is the probability that they traded options first?

Appendix:

[1]:

A full-fletch version of (1) is

P(positive | predicted positive)/P(negative | predicted positive) = (P(predicted positive | positive)/ P(predicted positive | negative)) * (P(positive)/ P(negative))

where the left-hand side is the post-odd, the second factor of the right-hand side is the pre-odd,

The formula is based on Bayes’ rule.

[2]: I find it funny about the plural/singular use of the word “odds”.

Example 2.1 (link):

“The odds of rolling a 6 is 1 to 5 (abbreviated 1:5)”

Example 2.2 (same link):

“the odds of an outcome are the ratio of the probability…”

Since I am not an expert in English, I am very confused by the verb conjugation.

[3]: The factor in (2) is calculated using Bayes’ rule:

P(test positive | really positive) / P(test positive | really negative),

which, from the definition of sensitivity and specificity, equals

sensitivity / (1- specificity) = 95.

--

--