What is the p-value in Statistics?

Neelam Tyagi
Analytics Steps
Published in
5 min readFeb 28, 2020

--

P-value is used to check whether hypothesis tests are statistically significant or not, understand how to interpret p-values accurately.

The researchers in the field of data science left no stone unturned when making use of notions from a family of disciplines including mathematics, computer science, applied science, and statistics. They keep using many statistical terms among which p-value is hiking in the data science frame.

In the blog post, we will review the inductive voyage of p-values that covers the concept of hypothesis testing, the significance of p-value, significance level, confidence interval through an example.

Introduction

The American Statistical Association (ASA) has published a “Declaration on Statistical Significance and P-Values” with six principles carrying the use and interpretation of the p-value;

The six principles in terms of misconceptions and misuse of the p-value are the following:

  1. P-values can show how inconsistent the data are with a specified statistical model.
  2. P-values don’t calculate the probability that the examined hypothesis is true or the probability that the data were generated by random chance alone.
  3. Scientific outcomes and business or policy judgments shouldn’t depend only on whether a p-value reaches a particular threshold.
  4. Decent conclusions require full reporting and transparency.
  5. A p-value, or statistical significance, doesn’t estimate the size of an effect or the significance of a result.
  6. A p-value doesn’t implement a good standard of proof about a model or hypothesis by itself.

Setting up a hypothesis testing

Hypothesis testing is a conventional approach in making insights from data that is essentially used in every quantitative domain. The common way to hypothesis testing is to specify a question in terms of the variable you are involved in.

Unknowingly, most of the people botch it by beginning with the wrong foot, even though addressing hypothesis is a game-art (learn another art of gaming through Games Theory), let’s play it safe in the following steps;

“The p-value is probably the most ubiquitous and at the same time, misunderstood, misinterpreted, and occasionally miscalculated index in all of the biomedical research.”

(Steven Goodman)

Step 1: Address the Default Action

As acknowledged the first step in writing a hypothesis is to figure out what is to do regardless of what the data reveals. Everything begins with a physical action or decision to which one is committed to doing, known as Default Action.

So getting started with action, not belief and gathering data accordingly.

Step 2: Draft the Alternate Action

Let’s keep the decision simply in binary terms, either to do things or not to do things, whatever be the default action, the negative of a default action is alternative action. Although selecting a default, there should be a province of a trio of decision-makers ( learn how a team of researchers take decisions using an ML technique).

It is taken based on the business mind while refecting in an adjournment. In short, selecting a default action demands business acumen and is the integrity of the trio’s decision-makers.

Step3: Specify the Null Hypothesis(H0)

The null hypothesis is the assumption that there is no difference or no effect of analysis or an operation. It means that whether calculations may show there is a difference or effect in terms of numbers, it is highly assumed that it is due to random chance variation, and therefore this difference is not a real one.

Step 4: Draw the Alternative Hypothesis(H1)

In it, it is assumed that null is not true, then some mathematical-statistical methods are applied to predict the probability of what we are recognizing provided that the null hypothesis is correct.

The two mathematical hypotheses are complementary to each other. In addition, if we have a biased picture of the data, it could lead to uncertainty in terms of default action, as default action is picked up in a manner that precisely depicts business utilities.

Significance of p-value

A p-value evaluates the probability of observing data points at or beyond current observation. In accessing the connotation of the p-value, very first a 95% confidence interval is measured where data is expected to would be.

Next, if the value is perceived outside this boundary that is unexpected as it falls within the most extreme 5% of your data and strengthens the null hypothesis. And if, in case, the confidence interval is measured as 97.5% and it enables p-value as 2.5% to reject the null hypothesis.

Understanding p-value through an example

Consider a problem, a company is willing to perceive if their goods are bought more by male or female for a given sample of the population. For this, two groups are distributed within a given population: one is a monitor group and the other is an exploratory group.

The exploratory group addresses as a random sample taken from the population on which a test is executed and then will be compared with the monitor group. A fine difference between both the groups is determined in account with test statistics like the t-test.

Here, comes the part of setting up hypothesis testing, assumptions are made that the Null Hypothesis(H0) is true as there is no difference between two groups whereas the Alternate Hypothesis(H1) that there is a significant difference between two groups.

In making the null hypothesis true, an analysis is conducted on an exploratory group to check is there any effect on the group or not. To check the significance of p-value, the repeated probability is calculated to examine the effect on the group is due to chance and what percentage of time a difference is observed in the exploratory group by chance.

“A p-value doesn’t *prove* anything. It’s simply a way to use surprise as a basis for making a reasonable decision.”

Cassie Kozyrkov

The p-value is used here to power both null and alternate hypothesis, p-value shows its value in numbers between 0 and 1 that assists as a probabilistic source to power the hypothesis, the value is also expressed in percentage.

While performing analysis, if the p-value is found to be greater than 0.05 that reflects the null hypothesis can be accepted, the analysis shows there is no difference between the two groups. Here 0.05 is termed as the level of significance(ɑ).

In contrast to this, if the p-value is less than 0.05 then the null hypothesis can be rejected. The value near or equal to 0.05 ( or defined level of significance) shows that researchers can take the command.

In terms of Data Science, when to interpret the relationship between the predictor variable and the response variable in a simple linear model.

A p-value is explained, less p-value depicts that it is unbelievable to see a strong connection between the predictor and response variable is due to chance and can reject the null hypothesis.

Moreover, in most of the regression problems, the p-value to be much less than 0.05 can be considered as significant for a variable.

Conclusion

From the statistical intent of belief, the p-value and significance level has a crucial role in the aspects of hypothesis testing and various statistical methods such as regression (Read the blog, 7 types of regression technique in ML to learn more about regression).

P-values are one of the widely used notions in statistical exploration and highly implemented by researchers, analysts, and statisticians in order to drag out high inferences from data and produce informative decisions.

Now you have read how to correctly interpret p-values and address hypothesis tests for an exposure. Follow us on LinkedIn and Twitter for more latest updates.

--

--

Neelam Tyagi
Analytics Steps

The Single-minded determination to win is crucial- Dr. Daisaku Ikeda | LinkedIn: http://linkedin.com/in/neelam-tyagi-32011410b