Hypotheses testing part 2: Real example of performing Z test on share market data

Pritul Dave :)
2 min readSep 30, 2022

--

Please refer to my article on the Z test to understand Hypotheses testing and Z-test

Let’s start by importing the libary

import pandas as pd
import numpy as np

We have the dataset of MRF Limited shares from the year of Jan-2020 to Jan-2022 taken from finance.yahoo.com. Now, let’s check whether there was variability in share price for the year 2020 to 2021 and 2021 to 2022. To check this variability we will utilize the Z-Test.

dataframe = pd.read_csv("/Users/pritul/Downloads/MRF.NS (1).csv")
dataframe
png

Let’s take NSE adjusted closing index for our comparision

dataframe = dataframe[['Date','Adj Close']]

Let’s take the share price for the year 2020 and 2021 for comparision

share_df_2020 = dataframe[dataframe['Date'].apply(lambda x: x[:4]=='2020')]['Adj Close']share_df_2021 = dataframe[dataframe['Date'].apply(lambda x: x[:4]=='2021')]['Adj Close']

1) Finding the mean and standard deviation

N1 = share_df_2020.shape[0]
N2 = share_df_2021.shape[0]
print("N1: ",N1)
print("N2: ",N2)
N1: 251
N2: 248
mean1 = share_df_2020.mean()
mean2 = share_df_2021.mean()
print("Mean 1",mean1)
print("Mean 2",mean2)
Mean 1 64400.12678991235
Mean 2 80959.9554879113
sd1 = share_df_2020.std()
sd2 = share_df_2021.std()
print("Standard Deviation 1",sd1)
print("Standard Deviation 2",sd2)
Standard Deviation 1 6486.874736439581
Standard Deviation 2 4831.957866173906
sd_full_data = dataframe['Adj Close'].std()
print("Standard Deviation of full data",sd_full_data)
Standard Deviation of full data 10069.586605030592

Calculating the standard error

Since we donot have the standard deviation of the whole population means the standard deviation since the stock came into NSE. So we are utilizing the standard error to calculate the z-transformation

import numpy as np
y = (sd_full_data**2)*((1/sd1)+(1/sd2))
std_err = np.sqrt(y)
print("Standard error",std_err)Standard error 191.3520568586531

Calculating the z-transformation

z_value = abs(mean1-mean2)/std_err
print("Z-value",z_value)
Z-value 86.54115858410277

Now, let’s compare the z-error with the p-value of 5% significance

For a two-tailed test, the z-error at 5% significance must be between -1.96 and 1.96. However, here it is not the case and hence the null hypotheses is rejected and alternate wins that there is signficance difference between the share value in 2020 and 2021.

import seaborn as sns
sns.kdeplot(share_df_2020)
<AxesSubplot:>
png
sns.kdeplot(share_df_2021)<AxesSubplot:>
png

If you can see above two figure then also it is clear that closing value in 2020 and 2021 are completely different. This thing we are proving using the z-test.

Thank you for reading my article !!

--

--

Pritul Dave :)

❖ Writes about Data Science ❖ MS CS @UTDallas ❖ Ex Researcher @ISRO , @IITDelhi, @MillitaryCollege-AI COE ❖ 3+ Publications ❖ linktr.ee/prituldave