“Data Science without data” for decision-making

Minimum score in Fuvest 2012

Danilo de Jesus da Silva Bellini
Customertimes
7 min readJul 3, 2024

--

This is a practical example showing why theoretical knowledge about statistics matters, even when the context is not about machine learning, artificial intelligence, or computers!

The code examples that follow are written in Python.

Brief historical context

USP, the University of São Paulo, is perhaps the most prestigious university in Latin America and is certainly well-known in Brazil for its courses and research. It’s also famous for its entrance exam, called “Fuvest,” whose first selective process phase is a multiple-choice test.

In 2010/2011, I was a member of the Central Undergraduate Council (CoG) at USP, and of its Curriculum and Entrance Exam Chamber. I was part of the group that planned the changes for Fuvest 2012, the entrance exam that took place in late 2011 for candidates to start the related undergraduate courses in 2012. I wanted to “raise the bar” by increasing the minimum valid score for the first phase, but I had to argue and convince the entire council.

Multiple choice test description

At that time, the first phase of Fuvest consisted of 90 equal-weight questions, each with 5 choices/alternatives, of which only one was correct. The exam lasted 5 contiguous hours. The cut-off score was usually determined by the rank of a candidate (a multiple of the vacancies count), and it varied for each course. The minimum cut-off score for any course used to be 22, candidates scoring below this were disqualified regardless of their rank.

So far, this is purely domain information — no statistics and no number of candidates, just a description of the exam rules. Yet, this leads us to ask: by guessing the answers randomly, how many correct answers can we expect? The answer is 18:

>>> n = 90  # Number of questions
>>> p = 1 / 5 # Probability of a correct choice (equal weight)
>>> n * p
18.0

However, not all attempts would result in exactly 18 correct answers. How can we describe the statistical behavior for the proposed experiment of “blindly” guessing all the answers?

Binomial distribution

A guess can be either correct, for which we assign a probability “p”, or incorrect, for which we assign “1-p” as its probability. Moreover, let’s consider the “correct” outcome as 1, and the “incorrect” outcome as 0. In statistics terminology, we say each individual guess follows a Bernoulli distribution.

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

plt.bar(["Incorrect (0)", "Correct (1)"], stats.bernoulli.pmf([0, 1], p))
plt.show() # Not required in a notebook, henceforth omitted
Probability mass function of a single guess (Bernoulli trial)

Each guess should be deemed independent; for example we’re rolling fair 10-sided dice whose opposite face pairs are a question answer. Since the outcome for each guess is 1 or 0, when we sum all the outcomes from all guesses belonging to a single test, we get the score of the purely guessed test. It means we’re summing “n” i.i.d. random variables, Bernoulli distributed, one for each guess in a single test. That’s what we call a binomial distribution, and this is its probability mass function for the values we’re talking about (n=90, p=0.2):

k = np.arange(n + 1)  # All possible outcomes (from 0 to n, inclusive)
plt.plot(stats.binom.pmf(k, n, p))
Probability mass function of the test score by guessing (binomial distribution)

The peak of that curve is the “mode”, which in this case is 18. As demonstrated when finding the expected value, the mean of this distribution is also 18, it’s what should happen “on average”. The median would be the position of a vertical line that evenly splits the area below that probability function in two, at its left and at its right, and in this case it’s also 18.

>>> stats.binom.mean(n, p)
18.0
>>> stats.binom.median(n, p)
18.0
>>> np.argmax(stats.binom.pmf(k, n, p)) # Mode
18

Survival function and the core argument

The binomial distribution shows the probability of getting a specific score just by guessing. But what about getting at least a specific score by guessing? That’s what we call the survival function.

plt.plot(k, stats.binom.sf(k, n, p))
Survival function for the binomial distribution (probability of having at least certain score by guessing)

The core argument was to answer the following question: how many guessers should take the exam for us to expect at least one guesser to meet the minimum cut-off score? We analyzed two scores: 22 and 27. The probabilities from the survival function are:

>>> # Probability of a score of at least 22
>>> print(f"{stats.binom.sf(22, n, p) * 100:.2f}%")
11.95%
>>> # Probability of a score of at least 27
>>> print(f"{stats.binom.sf(27, n, p) * 100:.2f}%")
0.83%

The inverse of these probabilities (i.e., 1/probability) is the count for us to expect a single guesser being approved on average if we repeat this many times, assuming the cut-off score for the specific course will already be the minimum cut-off. We should round it up.

>>> # Cut-off of 22 requires 9 guessers to expect that 1 will pass
>>> print(1 / stats.binom.sf(22, n, p))
8.367645426742701
>>> # Cut-off of 27 requires 121 guessers to expect that 1 will pass
>>> print(1 / stats.binom.sf(27, n, p))
120.05611998193699

That is, with a cut-off score of 22, we would expect 1 out of every 9 guessers to pass the exam. However, raising the minimum cut-off to 27 would require 121 guessers to expect a single one to pass, demonstrating another magnitude order. With this cut-off change, the risk of an undermining advertisement designed to decrease the exam status, for instance, through hired guessers, would meaningfully decrease. The proposed change for the minimum cut-off score could be advertised as “raising the bar,” which in turn would promote a positive perception of a qualifying exam rather than a runoff.

This argument was presented to the stakeholders (the CoG-USP council) with a single “fancy” plot highlighting the information originally created with GNU Octave (link to the original image). Below is a similar reconstruction in English using Python.

fig, ax = plt.subplots(1, 1, figsize=(12, 5))
ax.plot(k, stats.binom.sf(k, n, p), color="blue", linewidth=3,
label="Binomial survival function")
ax.grid(color="black", linestyle=(0, (3, 3)), linewidth=.75)
ax.set(
facecolor="#ffffee",
title='Probability of at least $Q$ correct questions by "guessing"\n'
f"(From cumulative binomial distribution for ${n=}$ and ${p=}$)",
xlim=[0, 90],
xticks=np.sort(np.concat((np.arange(10) * 10, [22, 27]))),
ylim=[0, 1],
yticks=np.linspace(0, 1, 11),
)
ax.legend(loc="upper right", facecolor="black", labelcolor="white")

def mark_value(x, ytext, extra_arrowprops):
circle_scatter_kwargs = dict(s=200, fc="none", ec="black", lw=1.5)
common_arrowprops = dict(arrowstyle="<-", color="black", lw=3, shrinkB=7.5)
ax.scatter(x, stats.binom.sf(x, n, p), **circle_scatter_kwargs)
ax.annotate(
text=f"{stats.binom.sf(x, n, p) * 100:.2f}%",
xy=(x, stats.binom.sf(x, n, p)),
xytext=(28.5, ytext),
fontsize=18,
arrowprops={**common_arrowprops, **extra_arrowprops},
bbox=dict(boxstyle="square,pad=0.35", fc="cyan", ec="black", lw=1.5),
verticalalignment="baseline",
)

msg = f"""
Data for this model:
- Number of questions: n = {n}
- Probability of a correct guess: p = {p} = {p * 100}% = 1/{int(1/p)}

Using only this information, what is the expected number of people
that should apply the exam by guessing all the answers so that we
expect at least one person to reach Q correct answers in questions?

For Q = 22, this number is lesser than 10 people ({
1 / stats.binom.sf(22, n, p)
:.6f}).
For Q = 27, this number is greater than 100 people ({
1 / stats.binom.sf(27, n, p)
:.6f}).

The expected number of "guesser" people is just the inverse of the
highlighted probabilities, that is, 1 / probability.
""".strip()

mark_value(x=22, ytext=.31, extra_arrowprops=dict(relpos=(0, 1), shrinkA=3))
mark_value(x=27, ytext=.17, extra_arrowprops=dict(relpos=(.1, 0), shrinkA=2))
ax.text(40, .2, msg, fontsize=12, bbox=dict(boxstyle="square,pad=0.5",
fc="#aaccee", ec="black", lw=1.5))

fig.tight_layout()

Result

The CoG-USP council approved the change! 🥳 Example of announcement (translated from Portuguese):

Changes at Fuvest: New Cutoff Score for the First Phase of Fuvest. In line with the new rule that increases the weight of the first phase of the Fuvest entrance exam, the minimum cut-off score was also changed. This minimum cut-off score, which was previously 22 points, is now 27 points, reducing the number of candidates who pass to the second phase.

Fuvest’s rules were changed using a statistical argument based only on the information that defines the exam, namely the number of questions and the number of alternatives per question.

Conclusion

Business decisions can be based on theoretical knowledge of statistics when these prove meaningful results, such as risk minimization. Technologies related to data science, especially data visualization tools and techniques, enable the analysis and presentation of ideas to non-technical stakeholders. This allows for clear quantitative predictions of consequences, and some approaches can be applied even when no data is being collected at all.

In this post, I used tools from Python’s data science stack (Numpy, Matplotlib, and Scipy) not only to highlight the modeling aspects but also to reconstruct an image similar to what was originally shown to decision-makers to convince them about the risk of a low cut-off score.

--

--

Danilo de Jesus da Silva Bellini
Customertimes

Lead Python @ Customertimes, Pythonista, Musician, FOSS developer, interested in audio / statistics / data science / AI / economics (B3 stock market)