Basics of statistical significance

By Archana Shukla, a citizen scientist working with Dr. Sukant Khurana

(Article has been penned solely by the student, who is not as of yet aware of all recent debates about significance scores. Penned for laymen)

Is it beneficial for me to read this article and will I get something important from this?

Now if this thought comes in your mind, what will be your next step? You will read the article and decide whether your perception is right or not. In the process you will use a bit of concepts of Statistical Significance but are you aware of it? The aim of the article is to make you aware about of term Statistical Significance in layman language.

The concept of Statistical Significance was introduced by Ronald Fisher.

Significance is a very important word in itself except its meaning. If we go through our own daily life, we will notice a habit of our self to always keep what is important and worthy. Statistics is much applied subject in itself. A term of this subject, called as Statistical Significance, is a tool which implies to keep the result that is important for (you) researcher in the data. If we start to notice, there are many situations in which one applies this term and its procedure.

To see its implication in daily life, we can take an example. If you have to show your cooking skills to someone, then what will be your first step? You decide the menu but keeping the fact in mind that the menu must contain the dishes in which you are expert means you want to keep your data according to your belief. There are many other examples in daily life which we can easily see after understanding the term Statistical Significance. From a Student’s perspective, like a UPSC aspirant, during his preparation for the exam, he always wants to study what is worthy and significant for his exam so that in a limited time, he can complete the syllabus.

As Statistics is said to be a mathematical theory of ignorance, we will first understand this concept in layman language and then go through its mathematical complications.

To understand the term Significance from a Statistical point of view, first we must know about few Statistical terms such as Hypothesis and its types, Test Statistic, Significance level, Size of the test, p-value, Type 1 error , Type 2 error, One tailed or two tailed test, Critical region. These terms will help us to understand how should we frame the null and alternative hypothesis? What should be the rejection criteria for the process? Whether there is need of one tailed test or two tailed test? What are the steps of testing? What should be the significance level?

First we will understand the definition of these terms and then understand them by extending the previous example in an easier way.

1. HYPOTHESIS- Hypothesis is a simple statement about population or any characteristic of population on which we want to draw some inferences. From the testing point of view, it is divided into two types — Null and Alternative hypothesis. Null hypothesis is said to be a hypothesis of no difference or an assertion about the population in favor of existing situation whereas Alternative hypothesis consists the result which the Investigator or Researcher wants to prove from the data.

2. TEST STATISTIC- Test Statistic is a tool or a Sample Statistic (a function of sample observations) which is used to decide whether to reject or not to reject the null hypothesis.

3. SIGNIFICANCE LEVEL and p-VALUE — In the procedure of testing, Investigator take some percentage say 5 % of error in advance. This percentage of error is called level of Significance. p-value is the lowest significance of level on which Investigator rejects the null hypothesis.p-value is also known as the probability of obtaining the data when null hypothesis is true.

4. SIZE OF THE TEST- The size of the test depends on the risk we (investigator) wish to incur which ultimately gives the significance level of the test.

5. TYPE 1 ERROR — The probability of rejecting null hypothesis when it is true is known as type 1 error.

6. TYPE 2 ERROR — The probability of not rejecting null hypothesis when it is false is known as type 2 error.

7. CRITICAL REGION — The part of sample space or sample observations in which null hypothesis is rejected is known as Critical Region.

8. ONE TAILED OR TWO TAILED — The alternative hypothesis decides whether it is one tailed or two tailed means whether investigator wants to reject null hypothesis on seeing one side of critical region or both sides of critical region.

**** Other than the critical region criteria, p-value criteria can be chosen to decide whether to accept or reject null hypothesis. If p-value is less than the level of significance then the null hypothesis is rejected.

Now to understand these terms in layman language, we will extend the previous example. Let us assume Two Friends named Karan and Arjun are working hard to clear the UPSC CSE exam since last two years. In their first attempt, they cleared the preliminary stage of the UPSC exam but could not clear the mains examination. In their second attempt, both cleared the mains exam but eliminated in their interview round. Now they are filled with the thought that the reason of their failure is their coaching faculty for not completing the syllabus at the right time, non-availability of test series, lack of proper guidance of how to study and from which sources etc. Both of them want to prove that the reason of their failure is only carelessness of their coaching faculty, using the concept of Statistical Significance.

Now firstly they will decide what should be the null and alternative hypothesis. Since null hypothesis is said to be a hypothesis of no difference so they take H_0: The coaching faculty is working properly where H_0 denotes the null hypothesis. After that they have to decide the alternative hypothesis. Since they (Investigator) wants to prove that the coaching institute is responsible for their failure so alternative hypothesis H_1: Coaching institute is not working properly .H_1 plays a decisive role in classifying a test as one sided or two sided. Here H_1 helps us them to decide whether they can get desired results by seeing one side of critical region or both sides of critical region. The next step is to set the level of significance so that they take some percentage of error i.e. some non controllable reasons of their failure in advance. Let us assume, they take it 5 %. This 5% includes all other reasons like bad health, bad company, non- availability of required facilities during preparation time etc. The next step is to frame a test statistic so that based on the marks of previous attempts they can decide whether their claim is right or not. In this procedure, they take mean of mains examination marks of successful candidate (here they take ten students) as the test statistic and decide the critical region say sample mean less than 100. This means that if their own mean marks of both attempt in which they cleared the mains phase is less than 100 then the null hypothesis is rejected while if the mean marks are greater than 100 then the null hypothesis is accepted i.e. if their own mean marks is less than the mean marks of successful candidates then the reason of their failure can be considered due to other reasons but not due to the coaching faculty.

The other concept of deciding the rejection or acceptance of null hypothesis is the p- value. Let us understand it by assuming an other situation that both the friends have a perception (H_0) about their coaching institute that this coaching faculty is the best available coaching in their area since their intermediate. While doing graduation they analyze the graph of no. of successful candidate from the coaching institute. Due to declining number in selected candidate from the coaching institute, gradually the p-value i.e. the probability of obtaining data when H_0 is true becomes lesser and lesser and at a point of time when p-value become less than the level of significance then H_0 will be rejected i.e. Now there is no positive perception about coaching institute in their mind.

The p-value can be defined as the probability of getting data when H_0 is true. The criteria for taking decision using p-value is that the smaller the p-value, lower the chance of getting data if the null hypothesis is true. Therefore, smaller the p-value, the stronger the evidence against the null hypothesis. General criteria is that if p-value is less than the level of significance (generally 0.05) then we reject null hypothesis.

If we talk about the conclusion from the data, we can easily draw it by using above Rules, but this is not always easy. The Interpretation drawn from Statistical Analysis using p-value concept of Statistical Significance theory is always questionable especially in the field of medical and psychology in which it is mostly used.

The concept is made on the basis of Deductive reasoning i.e. first we deduce what we expect to observe and then compare it with the real thing. If we see this concept in terms of real effects then null hypothesis can be constructed as there is no real effect and p-value as the probability of making our observations if there were no real effect i.e. what would be expected if there is no real effect. Clearly smaller the p-value, less chances that the null hypothesis is true. All you have to decide how small the p-value must be before you declare that you have made a discovery, but that turns out to be very difficult.(Science is an exercise in inductive reasoning : we are making observations and trying to infer general rules from them. Induction can never be certain) What we really want to know is not the probability of the observations given a hypothesis about the existence of a real effect, but rather the probability that there is a real effect — that the hypothesis is true — given the observations i.e. A right process should be to observe first and then take decision, not first make a hypothesis and then make observations according to them.

p-value is often interpreted as the probability that observations occurred by chance.

The difference between ‘Significant’ and ‘not significant’ can be very little and somewhat ambiguous in some cases. e.g. If the p-value is 4.6% and significance level is 5% then the result is called significant while if the p-value is 5.3% with the same significance level then the result is significant.

Most casual readers of scientific research know that for results to be declared “Statistically Significant,” they need to pass a simple test. The answer to this test is called a p-value, and if your p-value is less than 5%, you got yourself a statistically significant result.

Rejecting the null is kind of like the “innocent until proven guilty” principle in court cases, Regina Nuzzo, a mathematics professor at Gallaudet University, explains. In court, you start off with the assumption that the defendant is innocent. Then you start looking at the evidence: the bloody knife with his fingerprints on it, his history of violence, eyewitness accounts. As the evidence mounts that presumption of innocence starts to look naive. At a certain point, jurors get the feeling, beyond a reasonable doubt, that the defendant is not innocent.

We cannot replace p-value with any Scientific Reasoning. Well-reasoned statistical arguments contain much more than the value of a single number and whether that number exceeds an arbitrary threshold. p-value has become just a gatekeeper to the thing that the work is publishable or not. The technique is called p-hacking in which the significance of a Statistical Research is decided by only a threshold.

“AMERICAN STATISTICAL ASSOCIATION RELEASES STATEMENT ON STATISTICAL SIGNIFICANCE AND p-VALUES” provides Principles to Improve the Conduct and Interpretation of Quantitative Science. The statement’s six principles, many of which address misconceptions and misuse of the p value, are the following:

1. p-value can indicate how incompatible the data are with a specified statistical model.

2. p-value do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.

3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.

4. Proper inference requires full reporting and transparency.

5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.

6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.

The Statement also includes other approaches such as confidence and prediction interval, Bayesian Methods and other methods like likelihood ratios which can be used in place of p value.

Written by