Hypotheses testing part 2: Real example of performing Z test on share market data
Please refer to my article on the Z test to understand Hypotheses testing and Z-test
Let’s start by importing the libary
import pandas as pd
import numpy as np
We have the dataset of MRF Limited shares from the year of Jan-2020 to Jan-2022 taken from finance.yahoo.com. Now, let’s check whether there was variability in share price for the year 2020 to 2021 and 2021 to 2022. To check this variability we will utilize the Z-Test.
dataframe = pd.read_csv("/Users/pritul/Downloads/MRF.NS (1).csv")
dataframe
Let’s take NSE adjusted closing index for our comparision
dataframe = dataframe[['Date','Adj Close']]
Let’s take the share price for the year 2020 and 2021 for comparision
share_df_2020 = dataframe[dataframe['Date'].apply(lambda x: x[:4]=='2020')]['Adj Close']share_df_2021 = dataframe[dataframe['Date'].apply(lambda x: x[:4]=='2021')]['Adj Close']
1) Finding the mean and standard deviation
N1 = share_df_2020.shape[0]
N2 = share_df_2021.shape[0]print("N1: ",N1)
print("N2: ",N2)N1: 251
N2: 248mean1 = share_df_2020.mean()
mean2 = share_df_2021.mean()
print("Mean 1",mean1)
print("Mean 2",mean2)Mean 1 64400.12678991235
Mean 2 80959.9554879113sd1 = share_df_2020.std()
sd2 = share_df_2021.std()
print("Standard Deviation 1",sd1)
print("Standard Deviation 2",sd2)Standard Deviation 1 6486.874736439581
Standard Deviation 2 4831.957866173906sd_full_data = dataframe['Adj Close'].std()
print("Standard Deviation of full data",sd_full_data)Standard Deviation of full data 10069.586605030592
Calculating the standard error
Since we donot have the standard deviation of the whole population means the standard deviation since the stock came into NSE. So we are utilizing the standard error to calculate the z-transformation
import numpy as np
y = (sd_full_data**2)*((1/sd1)+(1/sd2))
std_err = np.sqrt(y)print("Standard error",std_err)Standard error 191.3520568586531
Calculating the z-transformation
z_value = abs(mean1-mean2)/std_err
print("Z-value",z_value)Z-value 86.54115858410277
Now, let’s compare the z-error with the p-value of 5% significance
For a two-tailed test, the z-error at 5% significance must be between -1.96 and 1.96. However, here it is not the case and hence the null hypotheses is rejected and alternate wins that there is signficance difference between the share value in 2020 and 2021.
import seaborn as sns
sns.kdeplot(share_df_2020)<AxesSubplot:>
sns.kdeplot(share_df_2021)<AxesSubplot:>
If you can see above two figure then also it is clear that closing value in 2020 and 2021 are completely different. This thing we are proving using the z-test.
Thank you for reading my article !!