Quantile, Percentile (one tail and two tail distribution), Confidence Interval, Box Plot

Anuj shah (Exploring Neurons)
4 min readSep 23, 2023

--

Well the more I read about probabilistic models in deep learning, I realize how important and confusing this basic topic could be, so here I am trying to decipher it.

Quantiles, Quartiles and Percentile

Quantiles are values that split sorted data into equal parts. In general terms, a q-quantile divides sorted data into q parts. The most commonly used quantiles have special names:

  • Quartiles (4-quantiles): Three quartiles split the data into four parts.
  • Deciles (10-quantiles): Nine deciles split the data into 10 parts.
  • Percentiles (100-quantiles): 99 percentiles split the data into 100 parts

We are going to use Quartiles and Percentile.

We will see later that quartiles are the special cases of percentile

Let's take a data distribution of 15 samples

data = [10,20,30,40,50,60,70,80,90,100,110,120,130,140,150]

When we say what is the 20th percentile, it means what is the value in the above data below which we will have 20% of the data

Since we have 15 samples 20% of 15 is 3. So what is the value below which we will have 3 samples, In our case that value is 40

So the 20th percentile (value below which we have 20% of data) for the above dataset is 40

The quartiles are the three special percentile values that will divide data into 4 parts

Q1 — First quartile or Lower Quartile (25th percentile)

Q2 — SecondQuartile or Median (50th percentile)

Q3 — Third Quartile or Upper Quartile (75th percentile)

To compute the percentile value we can use the Pandas quantile function

import pandas as pd
data = pd.DataFrame([10,20,30,40,50,60,70,80,90,100,110,120,130,140,150])

To get Q1, Q2 and Q3

data.quantile(0.25) ==> 45

data.quantile(0.5) ==> 80

data.quantile(0.25) ==> 115

you can use various other parameters in the quantile function: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.quantile.html

To get any random percentile say 40th perecntile or 60th percentile

data.quantile(0.4) ==> 66

data.quantile(0.6) ==> 94

  • In a sample or dataset, the quartiles divide the data into four groups with equal numbers of observations.
  • In a probability distribution, the quartiles divide the distribution’s range into four intervals with equal probability.
img ref: Quartiles & Quantiles | Calculation, Definition & Interpretation — shaun Turney

For a Gaussian distribution, we can compute the percentile value using scipy norm.stats.ppf function

from scipy import stats as stats
norm = stats.norm(loc=0, scale=1)

PPF — Percent point function (inverse of cdf= percentiles)

To get the quantiles value Q1, Q2, Q3

norm.ppf(0.25) → -0.67448
norm.ppf(0.5) → 0.0
norm.ppf(0.75) → 0.67448

To get any other percentile
norm.ppf(.95) → 1.644853
norm.ppf(0.99) → 2.32634

Percentile for one-tail and two-tail distribution

The value going in the ppf function is called as significance level and is represented by alpha (α)

There is one more thing to understand — the one-tail distribution and the two-tail distribution.

For one tail distribution, it is pretty straightforward as can be seen from the plots below for the 25th, 50th, or 80th percentiles. To compute the value for all these percentiles we will be using the norm.ppf

the 25th percentile is the value below which 25% of data will lie, we will use alpha=0.25 ==> norm.ppf(0.25) → -0.67448

For one tail distribution the 25th and 50th percentile, the value is -0.67 and 0.0 respectively
For one tail distribution the 80th and 95th percentile, the value is 0.84 and 1.64 respectively

For a two-tail distribution, the data is taken about the mean, so when we compute the 50th percentile we compute the value for 50% of data about the mean. it means 25% from the left of the mean and 25% from the right of the mean. so the significance level will be divided to compute values of two-tail the lower and upper

data_percent = 0.5

# Significance level (alpha) for a two-tailed test
alpha = 1-data_perecent ==> 0.5

# Calculate the critical values for the tails
critical_value_left = norm.ppf(alpha / 2) ==> norm.ppf(0.25)
critical_value_right = norm.ppf(1 — alpha / 2) ==> norm.ppf(0.75)

For two-tail distribution, the 50th percentile, the value of lower alpha is 0.25 and upper alpha is 0.75, and computing its ppf gives the value of -0.67 and 0.67 respectively

, and for 95% data the remaining 5% is divided into the left(2.5%) and right(2.5%)

data_percent = 0.95

# Significance level (alpha) for a two-tailed test
alpha = 1-data_perecent ==> 0.05

# Calculate the critical values for the tails
critical_value_left = norm.ppf(alpha / 2) ==> norm.ppf(0.025)
critical_value_right = norm.ppf(1 — alpha / 2) ==> norm.ppf(0.975)

For two-tail distribution, the 50th percentile, the value of lower alpha is 0.25 and upper alpha is 0.75, and computing its ppf gives the value of -1.96 and 1.96 respectively

To Dos

Confidence Interval & Box Plot

References

  1. Quartiles & Quantiles | Calculation, Definition & Interpretation — shaun Turney
  2. https://stackoverflow.com/questions/60699836/how-to-use-norm-ppf

--

--