Univariate Normality Test: Manual Calculation Using Python

MD. Tarekujjaman Riad
3 min readJun 11, 2023

--

The normality test is necessary for statistical analysis to assess whether a dataset follows a normal distribution. Many statistical methods assume normality, and violating this assumption can affect the validity of the results. By performing a normality test, we can determine if the data meets this assumption and decide whether to apply appropriate statistical techniques. The Shapiro-Wilk test is a widely used normality test, and the Q-Q plot provides a graphical representation of the data’s departure from normality. Together, they help assess the normality assumption and guide further analysis and interpretation.

I have solved the manual calculation for understanding the root of normality tests using Python Programming.

  1. Importing necessary libraries:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats as stats
from scipy.special import ndtri

2. Reading the dataset and checking the first 10 rows of the dataset:

df = pd.read_csv("https://github.com/tarekujjaman/Data-Science/blob/main/Normality%20Test/Univariate%20Normality%20Test/univariate_data.csv")
df.head(10)

Output:

Distribution of Radiation Data

3. Adding a new column of Probability Level [j — 0.5/n] for every row in the serial column:

def add(row):
return (row[0]-0.5)/len(df["Sl"])

df['prob_level'] = df.apply(add, axis=1)
print(df.head())

Output:

4. Getting the Z Score of the “prob_level” Column:

stats.zscore(df["prob_level"].head(10))

Output:

5. Calculating Standard Normal Quantiles:

ndtri(df["prob_level"].head(10))

Output:

6. Adding new column “Standard Normal Quantile”:

df["Standard Normal Quantile"] = ndtri(df["prob_level"])
df.head(10)

Output:

7. Calculating the correlation coefficient:

corr_cal = df.corr().loc['Standard Normal Quantile', 'Ordered Radiation']
corr_cal

Output:

0.927

From Critical Points for Q-Q Plot Correlation Coefficient Test for Normality Table

for n = 42, Level of significance = 5%

Link: http://www.dm.unibo.it/~simoncin/QQCritVal.pdf

corr_tabulated = 0.9749

8. Let's check the QQ-Plot or Regression Plot of (Standard Normal Quantile vs Ordered Radiation) [The values must be sorted in Ascending Order]:

sns.regplot(data = df,x=df["Ordered Radiation"].sort_values(), y=df["Standard Normal Quantile"].sort_values(),ci=None)

Output:

The data points are mostly outside the line (45 Degrees). Thus, by seeing the plot we can say, Data is not normally distributed.

9. Hypothesis testing:

H0 = "Distribution is not normal"
Ha = "Distribution is normal"

if corr_cal > corr_tabulated:
print("The null hypothesis is rejected. Hence",Ha)
else:
print("We can't reject the null hypothesis. Hence",H0)

Output:

We can’t reject the null hypothesis. Hence Distribution is not normal

--

--