Concepts of Hypothesis Testing

Sriram Chunduri
AlmaBetter
Published in
17 min readMar 23, 2021

This blog is about the concepts of Hypothesis testing.

Let’s start with the definition of Hypothesis testing with an example.

Hypothesis testing :

Hypothesis testing is really a systematic way to test claims or ideas about a group or population. Hypothesis testing or significance testing is a method for testing a claim or hypothesis about a parameter in a population, using data measured in a sample. In this method, we test some hypotheses by determining the likelihood that a sample statistic could have been selected, if the hypothesis regarding the population parameter were true.

It is a statistical method that is used in making statistical decisions using experimental data. Hypothesis Testing is basically an assumption that we make about the population parameter.

A hypothesis test evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. When we say that a finding is statistically significant, it’s thanks to a hypothesis test.

Example :

Suppose we read an article stating that children in the United States watch an average of 3 hours of TV per week. To test whether this claim is true, we record the time (in hours) that a group of 20 American children (the sample), among all children in the United States (the population), watch TV. The mean we measure for these 20 children is a sample mean. We can then compare the sample mean we select to the population mean stated in the article.

Method of Hypothesis Testing:

  • To begin, we identify a hypothesis or claim that we feel should be tested. For example, we might want to test the claim that the mean number of hours that children in the United States watch TV is 3 hours.
  • We select a criterion upon which we decide that the claim being tested is true or not. For example, the claim is that children watch 3 hours of TV per week. Most samples we select should have a mean close to or equal to 3 hours if the claim we are testing is true. So at what point do we decide that the discrepancy between the sample mean and 3 is so big that the claim we are testing is likely, not true? We answer this question in this step of hypothesis testing.
  • Select a random sample from the population and measure the sample mean. For example, we could select 20 children and measure the mean time (in hours) that they watch TV per week.
  • Compare what we observe in the sample to what we expect to observe if the claim we are testing is true. We expect the sample mean to be around 3 hours. If the discrepancy between the sample mean and population mean is small, then we will likely decide that the claim we are testing is indeed true. If the discrepancy is too large, then we will likely decide to reject the claim as being not true.

Four Steps to Hypothesis Testing:

The goal of hypothesis testing is to determine the likelihood that a population parameter, such as the mean, is likely to be true.

Step 1: State the hypotheses.

Step 2: Set the criteria for a decision.

Step 3: Compute the test statistic.

Step 4: Make a decision.

Step 1: State the hypotheses :

We begin by stating the value of a population mean in a null hypothesis, which we presume is true. For the children watching TV example, we state the null hypothesis that children in the United States watch an average of 3 hours of TV per week. In hypothesis testing, we start by assuming that the hypothesis or claim we are testing is true. This is stated in the null hypothesis. The basis of the decision is to determine whether this assumption is likely to be true.

The null hypothesis (H0), stated as the null, is a statement about a population parameter, such as the population mean, that is assumed to be true. The null hypothesis is a starting point. We will test whether the value stated in the null hypothesis is likely to be true.

Keep in mind that the only reason we are testing the null hypothesis is that we think it is wrong. We state what we think is wrong about the null hypothesis in an alternative hypothesis. For the children watching TV example, we may have reason to believe that children watch more than (>) or less than (<) 3 hours of TV per week. When we are uncertain of the direction, we can state that the value in the null hypothesis is not equal to (≠) 3 hours.

An alternative hypothesis (Ha) is a statement that directly contradicts a null hypothesis by stating that that the actual value of a population parameter is less than, greater than, or not equal to the value stated in the null hypothesis.

Step 2: Set the criteria for a decision :

To set the criteria for a decision, we state the level of significance for a test. In hypothesis testing, we collect data to show that the null hypothesis is not true, based on the likelihood of selecting a sample mean from a population (the likelihood is the criterion). The likelihood or level of significance is typically set at 5% in behavioral research studies.

Level of significance, or significance level, refers to a criterion of judgment upon which a decision is made regarding the value stated in a null hypothesis. The criterion is based on the probability of obtaining a statistic measured in a sample if the value stated in the null hypothesis were true. In behavioral science, the criterion or level of significance is typically set at 5%. When the probability of obtaining a sample mean is less than 5% if the null hypothesis were true, then we reject the value stated in the null hypothesis.

Step 3: Compute the test statistic :

Suppose we measure a sample mean equal to 4 hours per week that children watch TV. To make a decision, we need to evaluate how likely this sample outcome is if the population mean stated by the null hypothesis (3 hours per week) is true. We use a test statistic to determine this likelihood. Specifically, a test statistic tells us how far, or how many standard deviations, a sample mean is from the population mean. The larger the value of the test statistic, the further the distance, or a number of standard deviations, a sample mean is from the population mean stated in the null hypothesis.

The test statistic is a mathematical formula that allows researchers to determine the likelihood of obtaining sample outcomes if the null hypothesis were true. The value of the test statistic is used to make a decision regarding the null hypothesis.

Step 4: Make a decision :

We use the value of the test statistic to make a decision about the null hypothesis. The decision is based on the probability of obtaining a sample mean, given that the value stated in the null hypothesis is true. If the probability of obtaining a sample mean is less than 5% when the null hypothesis is true, then the decision is to reject the null hypothesis. If the probability of obtaining a sample mean is greater than 5% when the null hypothesis is true, then the decision is to retain the null hypothesis. In sum, there are two decisions a researcher can make:

  1. Reject the null hypothesis. The sample mean is associated with a low probability of occurrence when the null hypothesis is true.
  2. Fail to Reject the null hypothesis. The sample mean is associated with a high probability of occurrence when the null hypothesis is true.

The probability of obtaining a sample mean, given that the value stated in the null hypothesis is true, is stated by the p-value. The p-value is a probability: It varies between 0 and 1 and can never be negative. In Step 2, we stated the criterion or probability of obtaining a sample mean at which point we will decide to reject the value stated in the null hypothesis, which is typically set at 5% in behavioral research. To make a decision, we compare the p-value to the criterion we set in Step 2.

A p-value is a probability of obtaining a sample outcome, given that the value stated in the null hypothesis is true. The p-value for obtaining a sample outcome is compared to the level of significance.

The p-value (or p-value or probability value) is the probability of getting a value of the test statistic that is at least as extreme as the one representing the sample data, assuming that the null hypothesis is true. The null hypothesis is rejected if the P-value is very small, such as 0.05 or less.

Significance, or statistical significance, describes a decision made concerning a value stated in the null hypothesis. When the null hypothesis is rejected, we reach significance. When the null hypothesis is retained, we fail to reach significance.

When the p-value is less than 5% (p < 0.05), we reject the null hypothesis. We will refer to p < 0.05 as the criterion for deciding to reject the null hypothesis, although note that when p = 0.05, the decision is also to reject the null hypothesis. When the p-value is greater than 5% (p > 0.05), we retain the null hypothesis. The decision to reject or retain the null hypothesis is called significance. When the p-value is less than 0.05, we reach significance; the decision is to reject the null hypothesis. When the p-value is greater than 0.05, we fail to reach significance; the decision is to retain the null hypothesis.

Types of test

Non-Directional, Two-Tailed Hypothesis Tests (Ha : ≠) :

We will understand this by using the z test for a nondirectional, or two-tailed test, where the alternative hypothesis is stated as not equal to (≠) the null hypothesis. For this test, we will place the level of significance in both tails of the sampling distribution.

Nondirectional tests, or two-tailed tests, are hypothesis tests where the alternative hypothesis is stated as not equal to (≠). The researcher is interested in any alternative from the null hypothesis.

Consider an example, It was reported that the population mean score on the quantitative portion of a Graduate Examination for students taking the exam between 2014 and 2017 was 558 ± 139 (m ± s). Suppose we select a sample of 100 participants (n = 100). We record a sample mean equal to 585 (M = 585). Compute the one–independent sample z test for whether or not we will retain the null hypothesis (m = 558) at α, 0.05 level of significance (α = 0.05).

Step 1: State the hypotheses

Ho : μ=558 Mean test scores are equal to 558 in the population.

Ha : μ≠558 Mean test scores are not equal to 558 in the population.

Step 2: Set the criteria for a decision -

The level of significance is 0.05, which makes the alpha level a = 0.05. To locate the probability of obtaining a sample mean from a given population, we use the standard normal distribution. We will locate the z scores in a standard normal distribution that are the cutoffs, or critical values, for the sample mean values with less than a 5% probability of occurrence if the value stated in the null (μ = 558) is true.

What is a critical value?

A critical value is a cutoff value that defines the boundaries beyond which less than 5% of sample means can be obtained if the null hypothesis is true. Sample means obtained beyond a critical value will result in a decision to reject the null hypothesis.

In a nondirectional two-tailed test, we divide the alpha value in half so that an equal proportion of the area is placed in the upper and lower tail.

Splitting α in half: α/2 = 0.05/2 = 0.025 in each tail.

The regions beyond the critical values are called the rejection regions. If the value of the test statistic falls in these regions, then the decision is to reject the null hypothesis; otherwise, we retain the null hypothesis.

Rejection region or Critical region:

The rejection region is the region beyond a critical value in a hypothesis test. When the value of a test statistic is in the rejection region, we decide to reject the null hypothesis; otherwise, we retain the null hypothesis.

Step 3: Compute the test statistic

In this step, we will compute a test statistic to determine whether the sample mean we selected is beyond or within the critical values

The test statistic for a one–independent sample z test is called the z statistic. The z statistic converts any sampling distribution into standard normal distribution. The z statistic is therefore a z transformation. The solution of the formula gives the number of standard deviations, or z-scores, that a sample mean falls above or below the population mean stated in the null hypothesis. We can then compare the value of the z statistic, called the obtained value, to the critical values we determined in Step 2. The z statistic formula is the sample mean minus the population mean stated in the null hypothesis, divided by the standard error of the mean:

The z statistic is an inferential statistic used to determine the number of standard deviations in a standard normal distribution that a sample mean deviates from the population mean stated in the null hypothesis.

The obtained value is the value of a test statistic. This value is compared to the critical value(s) of a hypothesis test to make a decision. When the obtained value exceeds a critical value, we decide to reject the null hypothesis; otherwise, we retain the null hypothesis.

compute the z statistic,

Z =1.94

Step 4: Make a decision

To make a decision, we compare the obtained value to the critical values. We reject the null hypothesis if the obtained value exceeds a critical value.

Here, the obtained value (Z = 1.94) is less than the critical value; it does not fall in the rejection region. The decision is to retain the null hypothesis.

The probability of obtaining Z = 1.94 is stated by the p-value.

Look for a z score equal to 1.94 in column A, then locate the probability toward the tail in column C. The value is .0262. Finally, multiply the value given in column C times the number of tails for alpha. Since this is a two-tailed test, we multiply .0262 times 2: p = (.0262) × 2 tails = 0524.

Since p is greater than 5%, we decide to retain the null hypothesis. We conclude that the mean score on the Test in this population is 558 (the value stated in the null hypothesis).

Directional, Upper-Tail (Right Tailed) Critical Hypothesis Tests (Ha : >)

Directional tests, or one-tailed tests, are hypothesis tests where the alternative hypothesis is stated as greater than (>) or less than (<) a value stated in the null hypothesis. Hence, the researcher is interested in a specific alternative from the null hypothesis.

From the same example, It was reported that the population mean score on the quantitative portion of a Graduate Examination for students taking the exam between 2014 and 2017 was 558 ± 139 (μ ± s). Suppose we select a sample of 100 participants (n = 100). We hypothesize that students at this elite school will score higher than the general population. We record a sample mean equal to 585 (μ = 585), Compute the one–independent sample z test at α = 0.05 level of significance.

Step 1: State the hypotheses :

The population mean is 558, and we are testing whether the alternative is greater than (>) this value:

H0:μ=558 Mean test scores are equal to 558 in the population of students at the elite school.

Ha:μ>558 Mean test scores are greater than 558 in the population of students at the elite school.

Step 2: Set the criteria for a decision :

The level of significance is 0.05, which makes the alpha level a = 0.05. The z-score associated with this probability is between z = 1.64 and z = 1.65. The average of these z-scores is z = 1.645. This is the critical value or cutoff for the rejection region. The figure shows that for this test, we place all the values of alpha in the upper tail of the standard normal distribution.

Step 3: Compute the test statistic :

Step 2 sets the stage for making a decision because the criterion is set. The probability is less than 5% that we will obtain a sample mean that is at least 1.645 standard deviations above the value of the population mean stated in the null hypothesis. In this step, we will compute a test statistic to determine whether or not the sample mean we selected is beyond the critical value we stated in Step 2.

We changed only the location of the rejection region in Step 2.

compute the z statistic,

Z=1.94

Step 4: Make a decision.

To make a decision, we compare the obtained value to the critical value. We reject the null hypothesis if the obtained value exceeds the critical value. Figure 8.7 shows that the obtained value (Z = 1.94) is greater than the critical value; it falls in the rejection region. The decision is to reject the null hypothesis. The p-value for this test is 0.0262 (p = 0.0262).

We do not double the p-value for one-tailed tests.

We found that if the null hypothesis were true, then p = 0.0262 that we could have selected this sample mean from this population. The criteria we set in Step 2 was that the probability must be less than 5% that we obtain a sample mean if the null hypothesis were true. Since p is less than 5%, we decide to reject the null hypothesis. We decide that the mean score on the Test in this population is not 558, which was the value stated in the null hypothesis.

Directional, Lower-Tail (Left Tailed) Critical Hypothesis Tests (Ha : <)

We will use the z test for a directional, or one-tailed test, where the alternative hypothesis is stated as less than (<) the null hypothesis. For a lower-tail critical test, or a less than a statement, we place the level of significance or critical value in the lower tail of the sampling distribution.

From the same example, It was reported that the population mean score on the quantitative portion of a Graduate Examination for students taking the exam between 2014 and 2017 was 558 ± 139 ( μ ± s). Suppose we select a sample of 100 participants (n = 100). We hypothesize that students at this elite school will score lower than the general population. We record a sample mean equal to 585 ( μ = 585), Compute the one–independent sample z test at α = 0.05 level of significance.

Step 1: State the hypotheses :

The population mean is 558, and we are testing whether the alternative is less than (<) this value:

Ho : μ=558 Mean test scores are equal to 558 in the population at this school.

Ha : μ<558 Mean test scores are less than 558 in the population at this school.

Step 2: Set the criteria for a decision :

The level of significance is 0.05, which makes the alpha level α = 0.05. To determine the critical value for a lower-tail critical test, we locate the probability 0.05 toward the tail in column C in the unit normal table. The z-score associated with this probability is again z = 1.645. Since this test is a lower-tail critical test, we place the critical value the same distance below the mean: The critical value for this test is z = –1.645. All of the alpha levels are placed in the lower tail of the distribution beyond the critical value.

Step 3: Compute the test statistic :

Step 2 sets the stage for making a decision because the criterion is set. The probability is less than 5% that we will obtain a sample mean that is at least 1.645 standard deviations below the value of the population mean stated in the null hypothesis. In this step, we will compute a test statistic to determine whether or not the sample mean we selected is beyond the critical value we stated in Step 2.

We changed only the location of the rejection region in Step 2.

compute the z statistic,

Z=1.94

Step 4: Make a decision :

To make a decision, we compare the obtained value to the critical value. We reject the null hypothesis if the obtained value exceeds the critical value. The figure shows that the obtained value (Z = +1.94) does not exceed the critical value. Instead, the value we obtained is located in the opposite tail. The decision is to retain the null hypothesis.

Making a Decision: Types of Error

Type I error

When we reject the null hypothesis when the null hypothesis is true.

Type II error

When we fail to reject the null hypothesis when the null hypothesis is false.

The “reality”, or truth, about the null hypothesis, is unknown and therefore we do not know if we have made the correct decision or if we committed an error. We can, however, define the likelihood of these events.

α (‘Alpha’)

The probability of committing a Type I error. Also known as the significance level.

β (‘Beta’)

The probability of committing a Type II error.

Power of Hypothesis testing:

Power is the probability the null hypothesis is rejected given that it is false (ie. 1 - β).

α and β are probabilities of committing an error so we want these values to be low. However, we cannot decrease both. As α decreases, β increases.

Type I error is also thought of as the event that we reject the null hypothesis GIVEN the null is true. In other words, Type I error is a conditional event and α is a conditional probability. The same idea applies to Type II error and β.

Let’s take an example :

A man, Mr. Orangejuice, goes to trial and is tried for the murder of his ex-wife. He is either guilty or not guilty.

Ho : Mr. Orangejuice is innocent

Ha : Mr. Orangejuice is guilty

Let’s Interpret Type I error, α, Type II error, β

Type I Error:

Type I error is committed if we reject when it is true. In other words, when the man is innocent but found guilty.

α:

It is the probability of a Type I error, or in other words, it is the probability that Mr. Orangejuice is innocent but found guilty.

Type II Error:

Type II error is committed if we fail to reject when it is false. In other words, when the man is guilty but found not guilty.

β:

It is the probability of a Type II error, or in other words, it is the probability that Mr. Orangejuice is guilty but found not guilty. As you can see here, the Type I error (putting an innocent man in jail) is the more serious error. Ethically, it is more serious to put an innocent man in jail than to let a guilty man go free. So to minimize the probability of a type I error we would choose a smaller significance level.

--

--

Sriram Chunduri
AlmaBetter

Data enthusiast, passionate about solving business uses cases with ML methodologies.