Standard Error — A clear intuition from scratch
This is the most significant concept when it comes to inferential statistics. There are a lot of misconceptions and less intuitive explanations about this statistical concept because of many resources available on the internet and in every resource, it is explained differently. In this story let’s understand the clear intuition about the standard error from scratch
Before understanding these concepts let us recall the basic concepts of inferential statistics.
Population
The data on which we want to draw insights. This is our target data, this is the whole data on which we want to perform any experiment.
Sample
It is just a part of the population on which we can perform any experiment. In the real world, we use samples because getting a population is hard.
Okay! let’s get a clear understanding of this with a scenario.
Assume that you operate a website that provides ratings of the movies. There is a new movie which released recently and you want to know the overall rating of that movie. You have to collect ratings from each and every person who watched that movie, this is our population and it is nearly impossible to ask for ratings from every person who watched that movie. But instead of wasting your time and resources for collecting ratings from everyone who watched that movie, you collect ratings from 179 people randomly who watched that movie and consider it as the overall rating of that movie, this is our sample which is easy to find and it generalizes the population.
Inferential Statistics is all about finding inferences about a population using a sample. Samples are used to replace the population as gathering whole data is a hard task in this era where the amount of data is increasing exponentially in every field.
Now let’s discuss the standard error.
Standard Error
Standard Error is a part of inferential statistics which is often misunderstood with standard deviation which is a part of descriptive statistics.
Standard Deviation tells you how much distance each data point is from its average value.
Standard Error is the standard deviation of a sampling distribution of statistics.
Okay, now what is a sampling distribution?
To get the more generalized inferences about the population we gather more samples randomly. Each sample has its mean(average) and standard deviation. The distribution of the means of all the samples is known as the sampling distribution of mean and similarly, the distribution of the standard deviations of samples is known as the sampling distribution of standard deviations.
The above formula is the standard deviation of sampling distribution and standard error of mean. Where the numerator(sigma) is the population standard deviation and n is sample size of samples considered in sampling distribution.
Let us understand this concept of standard error with an example.
We have 20 random samples of data with a sample size of 40 related to the population data of rating of a movie whose standard deviation is 2. Let us find the sampling distribution and standard error of the samples we have.
Calculate the means of all the samples and plot the distribution of the means.
This is called the sampling distribution of means. Now the standard deviation of the sampling distribution will be our standard error.
Standard error defines how much a sample mean deviates from the mean of the sampling distribution.
Why standard error is a key factor in inferential statistics?
We discussed that inferential statistics is all about getting inferences on the population using samples.
There is a concept known as the Central Limit Theorem which helps to know the importance of Standard Error.
Central Limit Theorem
It states that irrespective of the distribution of the population, the sampling distribution of the mean of that population is always normally distributed and if the sample distribution is large then the mean of the sampling distribution is equal to the mean of the population.
Let us get back to our example of movie rating. We have 20 samples of size 40 and we already have our sampling distribution, now calculate the mean of the sampling distribution. This mean is approximately equal to the mean of the population if the sample size is large.
The mean of the sampling distribution is 7.01
In the definition of standard error, we have discussed that a standard error is the standard deviation of a sampling distribution. Let us calculate the standard error of the sampling distribution by calculating the standard deviation of the sampling distribution,
By calculating the standard deviation of the sampling distribution we will get know-how samples are deviating from the mean.
The main essence of the standard error is to know how much a sample statistic deviates from the population parameter. It is a metric to know the accuracy of a sample statistic.
The main goal is to select a sample that has the least standard error because it will be more representative of a population. If a sample is an exact representation of the population then the standard error will be zero.
The standard error can be reduced by increasing the sample size. As we can see in the formula that we have the sample size n in the denominator, so as we increase the sample size then automatically the standard error decreases.
The standard error helps you in selecting the perfect sample which gives the best representation of the population.