Hypotheses Testing: P-Test and Z-Test (Part-II)

Pritul Dave :)
7 min readSep 30, 2022

--

Introduction

Much of the time, one goes over research questions and makes suppositions. These presumptions are known as hypotheses. At the point when somebody makes a presumption, it is either acknowledged or dismissed in view of the aftereffects of explicit tests. Subsequently, the p-test and z-test will assist with deciding whether an information researcher’s supposition is acknowledged or dismissed. This is termed as hypothesis testing.

There are a lot more tests in hypotheses testing, however the p-test and z-test structure the foundation of tests in statistics. Thus, let’s understand the p-test and z-test.

Please refer my article Part-1 on hypotheses testing to get clear idea!!!!

What is the p-value:

  • It is the probability value of the observation given the null hypothesis is true. Basically, it represents how confident in determining the null hypothesis.
  • Thus p-value is given as conditional probability:

p(observation | null-hypothesis)

Testing the p-value

We select a threshold value (called a significance value) α and determine whether our p-value is greater than a threshold value or less than a threshold value.

Now, since the p-value is how confident we are in determining the null hypothesis, hence

p-value <= α: reject the null hypothesis (because of less confidence).
p-value > α: accept the null hypothesis (because of more confidence).

The common value chosen for α is 0.05 (5%) or should be chosen as a lower value because the lesser the α value, the more robust will be the interpretation.

Example:

Let’s the coin is tossed 5 times,

Null hypothesis: The coin is not biased towards the head.
Alternate hypothesis: Coin is biased towards the head.

The p-value will be defined as:
p(X=5 | Coin is not biased towards the head)

Now, if it is not biased towards the head, using permutation and combination the probability should be 1/2 * 1/2 * 1/2 * 1/2 * 1/2 = 1/32 = 3%

Thus, we get the threshold value as 3%. Now if
p-value <= 3% → Coin is biased (Alternate wins)
p-value > 3% → Coin is not biased (Null wins)

Let’s in an experiment we tossed coin and we get 60% probability (3/5).

3/5 * 3/5 * 3/5 * 3/5 * 3/50.07776000000000001

This value is less than 3% so Coin is biased and alternate wins.

Thus, this is how the p-test works.

P-Value and the Critical Region

For the above example, if we plot the critical region assuming a Gaussian Distribution, it will be formed as

image-2.png

Thus, if we get a value higher than the critical value (p-value) then our null hypothesis is valid else alternate hypothesis is valid

z-value

Z-Value is used to convert into the standard normal distribution.

Consider an upper-tailed test and let z denote the computed value of the test statistic Z. The null hypothesis is rejected if Z≥Z*
and the P-value is the smallest alpha for which this is the case.

Hence, if the area covered by Z is Φz then Pvalue = 1-Φz

Hence, if

|Z| < Z(alpha) then Accept Null Hypotheses

|Z| > Z(alpha) then Reject Null Hypotheses

Properties of standard normal distribution
→ mean = 0
→ std = 1

Moreover, if you can see over here then the range of standard deviation is from [+4,-4]. The +4(sigma) will contain 99% of the data.

Application of standard normal distribution
If you take any statistical table like z-table or t-table, in all it is pre-assumed that the mean is 0 and the standard deviation is 1.

The formula for calculating the z-statistics
z = (x-µ) / σ
Where µ is the mean of the sample and σ is the standard deviation.

Let’s consider the array with somewhat a normal distribution

import seaborn as sns
x = [2,3,1,1,2,4,8,10,11,20,21,15]
sns.kdeplot(x)
png

Now, let’s apply the standard normal distribution using NumPy

import numpy as np
arr = np.array(x)
mean_ = np.mean(arr)
sigma = np.std(arr)
print(">>>> Mean and standard deviation of array is ",round(mean_,2),round(sigma,2))
>>>> Mean and standard deviation of array is 8.17 6.99

Now calculating the z-score using the above formula.

standard_normal_distributed = (arr - mean_)/sigmasns.kdeplot(standard_normal_distributed)
png

Thus, our mean becomes 0, and the standard deviation is 1.

Let’s say you have a test score of 190. The test has a mean (μ) of 150 and a standard deviation (σ) of 25. Assuming a normal distribution, your z score would be:

z = (x — μ) / σ
= (190–150) / 25 = 1.6.

The z score tells you how many standard deviations from the mean your score is. In this example, your score is 1.6 standard deviations above the mean.

Standard Error of Mean

When you have multiple samples and want to describe the standard deviation of those sample means (the standard error), you would use this z score formula:

z = (x — μ) / (σ / √n)

This z-score will tell you how many standard errors there are between the sample mean and the population means.

In conclusion, the mean of standard deviations is called the standard error.

Converting z-score to Probability

Method 1: Using z-table
z-table is used to convert a z-score value into a corresponding PDF (probability density or simply probability).

Method 2: Using Graph
In figure 1, if you can see at the x-axis we have a z-score value and the y-axis are its probability density so using the graph we can also calculate the probability value or vice versa

Method 3: Using Python
To convert the z-score into probability there are scipy.stats.norm.cdf.
To convert the probability value into a z-score value there are scipy.stats.norm.ppf.

Let’s consider an example and let’s try to convert the value into a probability value

In a company manufacturing light bulbs at an average of 900hr, the standard deviation is 80. If one light bulb is taken randomly what is the probability that the bulb will work for 1000hr.

For finding the probability we need the possible outcome and total outcome but we do not have anything in the given data. However, we are provided with the information on the standard deviation. So we can use the z-value transformation and using any of the three methods we can convert the z-value into the probability value.

Step 1: Finding the z-value for 1000hr
z_value = (x-avg)/std = (1000–900)/80 = 100/80 = 1.25

Step 2: Converting the z-value into the probability value
Using method 1
Refer to the z-table. The probability value for 1.25 will correspond at 1.20 + 0.05 as shown in the figure

image.png

The probability value according to z-table is the 0.89

Method 2: Using the graph

image.png

The graph highlights the probability value between 84.1% and 97.7%. So, by this way using the chart we can obtain the z-value

Method 3: Using the scipy

from scipy.stats import norm
norm.cdf(1.25)
0.8943502263331446

Thus, using any of the three methods we will always get the same value. This is how we can calculate the probability of the event provided the mean and standard deviation is given.

Performing Hypothesis Testing using z-test

Example
The mean height of a random sample of 100 students is 64 inches and the standard deviation is 3 inches. Test the statement that the mean height of the population is 67 inches at a 5% level of significance.

Performing testing

We have given a sample,
n=100, μ=64 inches, σ = 3 inches.

We need to test the Null Hypothesis on population,
H0: x = 67 inches,
HA: x # 67 inches,
Confidence threshold α = 5%

Now, let’s z-transform the x=67 inches. But here unlike the previous we have the standard deviation of the sample and not of the population so we will convert into standard error as 3/sqrt(100) = 3/10 = 0.3 So, z-value = abs(64–67)/0.3 = 10

Now, let’s convert the alpha value into a z-value. Since this is a two-tailed test, we break the alpha value in half that is 0.05/2 = 0.025. Now we find the z-value at the left half (2.5%) and right half (97.5)

image.png
image.png

Now, since the calculated z-value (10) is not falling in the range of -1.96<z<1.96 and falling in the rejection region. Moreover it is more than Z critical hence we are rejecting the null hypothesis and accepting the alternate hypothesis

So this is all about the p-test and z-test. In my next part, I will cover the t-test, ANOVA test and as well as chi-squared test. Please refer my next part…

Thank you for reading my article !!!

--

--

Pritul Dave :)

❖ Writes about Data Science ❖ MS CS @UTDallas ❖ Ex Researcher @ISRO , @IITDelhi, @MillitaryCollege-AI COE ❖ 3+ Publications ❖ linktr.ee/prituldave