Bernoulli trials in python

Cristiane Silva
Analytics Vidhya
Published in
3 min readAug 17, 2020

Default risk is the one that a company, individual, or state will be unable to meet its obligations in relation to the payment of contractual interest or the initial capital of its debt.

I did this exercise during the DataCamp course. Suppose we look at the number of loans a bank provides and write down the number 1 for each default and the number 0 for each loan paid.
Although we cannot know the value, there is a probability — which we can assume to be constant — of default.
In addition, whether there is a default or not, does not affect the likelihood of the next loan being paid or not.

The random module and Bernoulli trials

The random module function returns a random number between zero and one.

The Bernoulli process has the following characteristics:

  • Random the random variable can take the values 0 or 1. The value 1 is called a success, and 0, failure;
  • Prob the probability of a success occurring is constant throughout an experiment;
  • Miscellaneous the various results obtained are independent of each other.

The probability function can be written as follows:

p(0) = P(X=0) = 1-p

p(1) = P(X = 1)=p

X is called Bernoulli random variable.

def perform_bernoulli_trials(n, p):
"""Perform n Bernoulli trials with success probability p
and return number of successes."""
# Initialize number of successes: n_success
n_success = 0
# Perform trials
for i in range(n):
# Choose random number between zero and one: random_number
random_number = np.random.random()
# If less than p, it's a success so add one to n_success
if random_number < p:
n_success += 1
return n_success

Now we can consider a bank that made 100 mortgage loans. In this context, it is possible that anywhere between 0 and 100 of the loans will be defaulted upon. Considering that the probability of a default is p = 0.05, the next code performs 100 Bernoulli trials using the perform_bernoulli_trials() function. As already mentioned before, success is a default.

# Seed random number generator
np.random.seed(42)
# Initialize the number of defaults: n_defaults
n_defaults = np.empty(1000)
# Compute the number of defaults
for i in range(1000):
n_defaults[i] = perform_bernoulli_trials(100, 0.05)
# Plot the histogram with default number of bins; label your axes
_ = plt.hist(n_defaults, density=True)
_ = plt.xlabel('number of defaults out of 100 loans')
_ = plt.ylabel('probability')
# Show the plot
plt.show()
Figure 1 — Defaults probability

Figure 1 presents a histogram describing the probability of the number of defaults.

How likely is the bank to default ?

Next, we can see the plot of the number of defaults we got from the previous code as an empirical cumulative distribution.

If interest rates are such that 10 or more of its loans are defaulted upon, what is the probability that the bank will lose money?

# Compute ECDF(Empirical cumulative distribution function): x, ydef ecdf(data):
"""Compute ECDF for a one-dimensional array of measurements."""
# Number of data points: n
n = len(data)
# x-data for the ECDF: x
x= np.sort(data)
# y-data for the ECDF: y
y = np.arange(1, 1+n) / n
return x, yx, y = ecdf(n_defaults)# Plot the CDF with labeled axes
_ = plt.plot(x, y, marker='.', linestyle='none')
_ = plt.xlabel('number of defaults out of 100')
_ = plt.ylabel('CDF')
# Show the plot
plt.show()
# Compute the number of 100-loan simulations with 10 or more defaults: n_lose_money
n_lose_money = np.sum(n_defaults >= 10)
# Compute and print probability of losing money
print('Probability of losing money =', n_lose_money / len(n_defaults))
Figure 2 — Probability of losing money

--

--

Cristiane Silva
Analytics Vidhya

Engineer, MBA in Finance & Investment, and Data Scientist who contributes code to the community. linkedin.com/in/ssilvacris/