What is P-value and test statistic in statistical testing: A visual guide.

Ramez Shendy

Published in

𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨

9 min readSep 29, 2023

A histogram distribution of two similar data groups (Left), and the results of a two-tailed t-test to measure the statistical significance of the difference between these two groups (Right).

Introduction and definitions

What is statistical testing?

Statistical testing is a fundamental principle in the field of statistics, playing a key role in both scientific research and industrial applications. It serves as the basis for making informed decisions and drawing meaningful conclusions from data. This fundamental concept involves subjecting data to rigorous analysis, allowing hypotheses assessment, validate theories, and make predictions with a quantifiable degree of certainty. Statistical testing is the means by which a statistical significance of an experimental result is determined, and by that, distinguishing real effects from being nothing more than chance. In industry, it empowers organizations to optimize processes and make data-driven decisions that can have impacts on efficiency and profitability.

What is statistical testing for?

One crucial application is the comparison of statistical measures like the mean between two data groups, a process often known as hypothesis testing. By subjecting data to tests like the t-test or analysis of variance (ANOVA), it can be determined whether there exists a statistically significant difference between these groups. Statistical testing is essential in assessing the goodness-of-fit of data to specific probability distributions, such as the normal distribution. Statistical testing offers a wide array of applications beyond comparing means and assessing data distributions. Here are a few more examples:

A/B Testing: In online marketing and product development, A/B testing involves comparing two versions of a web page or app to determine which one performs better. Statistical tests are used to analyze user behavior data and ascertain if there’s a statistically significant difference in user engagement, conversion rates, or other key metrics between the two versions.
Chi-Square Test: This test is employed to analyze categorical data and determine if there’s an association or independence between two or more categorical variables.
Regression Analysis: Statistical tests, such as the F-test and t-test, are used in regression analysis to assess the significance of predictors in a regression model. This helps identify which variables have a statistically significant impact on the dependent variable, allowing for more robust modeling and prediction.
Quality Control: In manufacturing and quality control processes, statistical testing is crucial for ensuring product quality and consistency. Techniques like control charts and hypothesis testing are used to monitor production processes and detect deviations that may indicate defects or variations in product quality.

What is test statistic and p-value in statistical testing?

The test statistic is a numerical summary of sample data that measures the discrepancy between the observed data and what is expected under the null hypothesis, while the p-value provides a quantitative measure of the strength of evidence against the null hypothesis based on the test statistic. These two components are central to hypothesis testing, helping researchers and analysts make informed decisions about whether to accept or reject a null hypothesis in favor of an alternative hypothesis.

Example and a visual guide (Python)

In the following example, we will use visual aids to help understand the concepts of test statistic and p-value more intuitively. Grasping these ideas becomes easier when one can see them in action and learn through practical examples, making them less abstract and more comprehensible.

T-test

The t-test is the statistical test used in this example and it’s a widely used statistical method that helps determine if there is a significant difference between the means of two groups or samples.

T-test Assumptions:

Random Sampling: The data should be collected through random sampling to ensure its representativeness of the population.
Normality: The data within each group or sample should follow a roughly normal distribution.
Homogeneity of Variance: The variances of the two groups being compared should be roughly equal.
Independence: Observations within each group should be independent of one another. The values in one group should not depend on or be influenced by the values in the other group.

Violations of any of these assumptions may affect the reliability of the test.

Now that we’ve identified the t-test as our chosen statistical method and have a clear understanding of its underlying assumptions, let’s proceed to delve into a practical visual example using Python.

Python imports:

import numpy as np ## for creating the samples
import scipy.stats as stats ## for the statitistical testing
import matplotlib.pyplot as plt ## for visualization

Generating the two samples:

n = 500 ## number of samples
group1 = np.random.normal(loc=10, scale=2, size=n)
group2 = np.random.normal(loc=10, scale=2, size=n)

Consider these two generated samples are the price data for a specific vegetable in Euros, spanning various cities and local markets across two different countries (two groups).

These three lines of Python code can be understood as follows:

n=500 means that the number of samples that will be generated are 500 data samples for each group,
np.random.normal function generates random (t-test assumption) normally distributed (t-test assumption) with parameters loc which is the mean of each group and equal scale (t-test assumption) which is the standard deviation of each group.
Values in each group are independent.

Having satisfied all four assumptions of the t-test, we can confidently utilize this statistical method, assured that the outcomes will be reliable and accurate.

Visualizing the Samples:

plt.hist(group1, bins=15, alpha=0.5, label='Group 1')
plt.hist(group2, bins=15, alpha=0.5, label='Group 2')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Distributions of Two Groups')
plt.legend()

The image demonstrates that both sample distributions follow a normal distribution, and it’s noticeable that they have approximately equal means. We generated them randomly with the same mean (mean = 10) and standard deviation (SD) (SD = 2).

Perform independent t-test (two tailed):

Null Hypothesis: The two independent samples have identical average (expected) values.

alpha (The significance level) = 0.05, when you set alpha to 0.05 (or 5%), it means that you are willing to accept a 5% chance of making a Type I error, which is the error of incorrectly rejecting a true null hypothesis.

t_statistic, p_value = stats.ttest_ind(group1, group2)

test statistic = -0.450
p-value = 0.6527

With p-value > alpha, we accept the null hypothesis that states that the two samples have identical means.

But what does this actually mean?

## Generate the t-distribution with appropriate degrees of freedom
df = len(group1) + len(group2) - 2
x = np.linspace(-5, 5, 1000)  # Range of values for the t-distribution
t_dist = stats.t.pdf(x, df)

## Plot the t-distribution
plt.plot(x, t_dist, label='t-distribution')
plt.axvline(x=t_statistic, color='red', linestyle='--', label='t-statistic')

## Plot the p-value as the area
p_value_area = stats.t.cdf(t_statistic, df)
plt.fill_between(x, 0, t_dist, where=np.abs(x) >= np.abs(t_statistic), color='red', alpha=0.3,
                 label='p-value')

## Plot annotations
plt.xlabel('t-value')
plt.ylabel('Probability Density')
plt.title('t-distribution with t-statistic and p-value')
plt.legend()

The t-distribution along with the t-test test statistic and the p-value as the area under the curve.

The above code snippet do the following:

Generate a probability density function for the t-distribution which is the statistical distribution used in the t-test.
Plot the vertical red line which corresponds to the test statistic we got from the above test (test statistic = -0.450)
Plot the p-value by shading the region under the distribution curve such that it represents the cumulative area below a specific test statistic value that we obtained, and because this is a two-tailed test, the p-value is doubled since the distribution is symmetrical.

The total area of the shaded region in the above plot is the p-value (p-value = 0.6527).

Keep in mind that the p-value is a probability, and its values always fall within the range of 0 to 1.

But what is the difference between one-tailed and two-tailed tests?

One-tailed and two-tailed tests represent distinct strategies in hypothesis testing, and their selection depends on the particular research question and whether there is a specific expected direction for the effect under investigation.

Two-Tailed Test:

Non-Directional Hypothesis: In a two-tailed test, you don’t make a specific prediction about the direction of the effect. You are interested in whether a parameter is different from a certain value, and you consider both greater and less than alternatives. In our example, this implies that we are testing whether there is a difference in the average prices between the two distinct countries. We are not anticipating that one country’s prices will be lower or higher than the other, instead, we are investigating the presence of any difference, regardless of its direction.
Hypotheses: Null Hypothesis (H0): States that there is no effect or no difference. Alternative Hypothesis (Ha): Typically states that there is an effect or difference, without specifying the direction.
P-Value Interpretation: The p-value for a two-tailed test represents the probability of observing a result as extreme as the one in the sample, in either direction (greater or less than), assuming the null hypothesis is true. That is why the p-value is doubled in our case.

One-Tailed Test:

Directional Hypothesis: In a one-tailed test, you have a specific expectation about the direction of the effect or difference you’re testing. You are interested in whether a parameter is greater than or less than a certain value, but not both.
Hypotheses: Null Hypothesis (H0): Typically states that there is no effect or no difference. Alternative Hypothesis (Ha): Specifies the direction of the effect and that it exists.
P-Value Interpretation: The p-value for a one-tailed test represents the probability of observing a result as extreme as the one in the sample, in the specified direction, assuming the null hypothesis is true.

Now, let’s modify the mean of the first group and observe how this change impacts the outcomes.

n = 500 ## number of samples
group1 = np.random.normal(loc=9.8, scale=2, size=n)
group2 = np.random.normal(loc=10, scale=2, size=n)

Everything is the same as the previous experiment except for the change of the mean of the first group:

mean of the first group = 9.8
mean of the second group = 10

test statistic = -1.219
p-value = 0.223
Accept Null Hypothesis — The difference is insignificant

Now, let’s drastically decrease the mean of the first group and observe how this change impacts the outcomes.

mean of the first group = 6
mean of the second group = 10

n = 500 ## number of samples
group1 = np.random.normal(loc=6, scale=2, size=n)
group2 = np.random.normal(loc=10, scale=2, size=n)

test statistic = -30.304
p-value = 1.473e-143 <<< alpha
Reject Null Hypothesis — Significant difference between two groups

In conclusion, the p-value in statistical testing serves as a critical metric that informs us about the strength of evidence against the null hypothesis. It is derived from the test statistic and is essentially the cumulative shaded area under the distribution curve associated with a specific statistical test. This area represents the probability of obtaining a test statistic as extreme or more extreme than the one observed in the sample data, assuming the null hypothesis is true. Importantly, in the context of a two-tailed test, this area is doubled to account for the possibility of significant effects in either direction from the null hypothesis. The p-value provides a clear and quantifiable measure of the evidence supporting or proving a hypothesis, aiding researchers and analysts in making informed decisions based on their data.

Full code can be found on:

T-test-with-visualization/t_test_with_visualization.py at main ·…

Contribute to ShendoxParadox/T-test-with-visualization development by creating an account on GitHub.

github.com

👏 Clap for the story if you liked it.

❓ Feel free to drop your question in the comments.