# Understanding Significant Levels

## The Difference Between Alpha & Standardized Alpha

Apr 19 · 4 min read

Statistical tests are an integral part of a data scientist's repertoire. Every day, we clean, sort, and model data with the assumption that the differences we find in the numbers actually matter.

Does the salary of Segment A and Segment B of the population really differ enough to matter? Can we explain the two-dollar difference by sheer luck in sampling? That is why we have to do statistical tests — to show that there is more than a likely chance our assumptions matter.

## Hypothesis Tests Overview

The samples we work with stand in as the data from the bigger population and because of this, we have to make certain that our sampling doesn’t lead us to believe untrue conclusions about the population. That is why we do statistical tests and why significance levels are so important.

Statistical tests allow us to determine if the differences we see in two segments from the sample are truly different. And that’s where significance levels come into play. When we conduct hypothesis tests we are testing one phrase, known as the null hypothesis. That phrase is: “There is no difference between Segment A and Segment B.”

All of our results either allow us to reject this phrase or mean we fail to reject it. Our significance level is usually set at 0.05, which equates to 5%. So if a p-value comes in at 0.12, or 12% — we say we fail to reject the null hypothesis and most likely there is no difference between the two segments. But if the p-value is 0.03, or 3% — we say we reject the null hypothesis in favor of the alternate hypothesis, which states that there is a difference between Segment A and Segment B.

## Significance Levels

If I had two data sets, one with 200 observations and another with 30,000 observations — the statistical differences have to be measured differently. A difference between segments in a sample of 200 is very different than in a sample of 30,000. The larger data set could fool the hypothesis test into producing a false negative when checking differences.

As the number of observations grows, a small difference in the segments becomes significant and can lead to very low p-values. Lowering the alpha for these situations accommodates for the naturally lower p-values.

## Standardized Alpha

Below is a formula that is inversely proportionate with the number of observations. This formula will deliver a smaller significance level as the number of observations increase.

The use of 100 as the constant to divide the number of observations is a random choice done by several studies and can be replaced with other numbers. The point is to use the number of observations as a reducing factor for a dataset that is much larger than the constant.

The common flaw in the argument of using smaller significance levels automatically — using 5% vs 10% or 1% — is that these numbers are also arbitrarily set. An alpha 5% on a 1000 observation data set and 1 million will see varying levels of differences and shouldn't be used interchangeably.

# Connect

You can check out my projects on Github and give me a shout if there is something there that interests you.

I am also on Twitter where I share my projects, data puns, and thoughts on cool uses for data in contemporary ways.

## Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

### By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

Medium sent you an email at to complete your subscription.

Written by

## Paul Torres

Data Scientist with a Physicist’s heart. Looking through numbers to tell a story that people will care about.

## Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Written by

## Paul Torres

Data Scientist with a Physicist’s heart. Looking through numbers to tell a story that people will care about.

## Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

## The Talk Track to Promote Your Data Science Projects

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app