Hypotheses testing part 2: Real example of performing Z test on share market data

2 min readSep 30, 2022

Please refer to my article on the Z test to understand Hypotheses testing and Z-test

Let’s start by importing the libary

import pandas as pd
import numpy as np

We have the dataset of MRF Limited shares from the year of Jan-2020 to Jan-2022 taken from finance.yahoo.com. Now, let’s check whether there was variability in share price for the year 2020 to 2021 and 2021 to 2022. To check this variability we will utilize the Z-Test.

dataframe = pd.read_csv("/Users/pritul/Downloads/MRF.NS (1).csv")
dataframe

Let’s take NSE adjusted closing index for our comparision

dataframe = dataframe[['Date','Adj Close']]

Let’s take the share price for the year 2020 and 2021 for comparision

share_df_2020 = dataframe[dataframe['Date'].apply(lambda x: x[:4]=='2020')]['Adj Close']share_df_2021 = dataframe[dataframe['Date'].apply(lambda x: x[:4]=='2021')]['Adj Close']

1) Finding the mean and standard deviation

N1 = share_df_2020.shape[0]
N2 = share_df_2021.shape[0]print("N1: ",N1)
print("N2: ",N2)N1:  251
N2:  248mean1 = share_df_2020.mean()
mean2 = share_df_2021.mean()
print("Mean 1",mean1)
print("Mean 2",mean2)Mean 1 64400.12678991235
Mean 2 80959.9554879113sd1 = share_df_2020.std()
sd2 = share_df_2021.std()
print("Standard Deviation 1",sd1)
print("Standard Deviation 2",sd2)Standard Deviation 1 6486.874736439581
Standard Deviation 2 4831.957866173906sd_full_data = dataframe['Adj Close'].std()
print("Standard Deviation of full data",sd_full_data)Standard Deviation of full data 10069.586605030592

Calculating the standard error

Since we donot have the standard deviation of the whole population means the standard deviation since the stock came into NSE. So we are utilizing the standard error to calculate the z-transformation

import numpy as np
y = (sd_full_data**2)*((1/sd1)+(1/sd2))
std_err = np.sqrt(y)print("Standard error",std_err)Standard error 191.3520568586531

Calculating the z-transformation

z_value = abs(mean1-mean2)/std_err
print("Z-value",z_value)Z-value 86.54115858410277

Now, let’s compare the z-error with the p-value of 5% significance

For a two-tailed test, the z-error at 5% significance must be between -1.96 and 1.96. However, here it is not the case and hence the null hypotheses is rejected and alternate wins that there is signficance difference between the share value in 2020 and 2021.

import seaborn as sns
sns.kdeplot(share_df_2020)<AxesSubplot:>

sns.kdeplot(share_df_2021)<AxesSubplot:>

If you can see above two figure then also it is clear that closing value in 2020 and 2021 are completely different. This thing we are proving using the z-test.

Thank you for reading my article !!