Bootstrapping

Sai Krishna Dammalapati
3 min readFeb 14, 2024

--

Check this for Python Implementation of Bootstrapping

Hope you’ve read my article on Confidence Intervals. If not, better to read that before proceeding.

https://www.facebook.com/statsmemes?__tn__=-UC*F

You did an experiment/survey and collected a few values — Say 2, 4, 9 ,12 — average of which is 6.75.

But is that the true mean value? Can you confidently say that the true mean value of the experiment is 6.75? No.

For that, you would want to conduct the experiment multiple times, in different scenarios to go close to the true mean value.

But India’s expenditure on RnD is among the lowest in the world. You don’t have enough money to conduct many experiments. And hence, you take help of Statistics. Because, Indians are good at math.

Bootstrapping is a statistical method in which we resample with replacement the original sample to calculate any stat with more confidence.

Resampling means — we just randomly pickup 4 values again from our original values by replacing the picked up value each time. Basically, we will be able to pick up 2, 4, 2, 4; 2, 2, 2, 2; 12, 2, 9, 2 and multiple other combinations. We will do this resampling 9999 times now. Because as per numerology, 9999 is an Angel Number that takes the energy of 9 and quadruples it!

If you believed the last sentence, don’t do statistics any more.

So you did the resampling 9999 times and collected the means of the sample each time. Plot them on the histogram!

Histogram of the bootstrapping samples

Basically, you can consider this like you have performed the experiment 9999 times. Not truly, just for the sake of it.

Now you can see that the histogram is peaking somewhere near 6.73 (that extreme detail you can see in the code I shared above). Means there is a 20% chance that mean true value is 6.73. With 95% confidence, I can tell you that my true mean value will be between 3 and 10. (2.5th percentile to 97.5th percentile) — I told you the confidence interval without making any assumption on the nature of the distribution!!!

Also, you can observe that the sampling distribution looks like a normal distribution. So you can use z-stats on this bootstrapped sampling distribution.

See, you can calculate confidence interval using the formula I shared in Confidence Intervals blog as well. You just have to know some Greek. But there you will have to make assumptions on the nature of sampling distribution — is it normal distribution?; should I use z-stat or t-stat? etc. With Bootstrapping, you can avoid this. So it is more friendly for Data Scientists who skipped Statistics classes.

--

--

Sai Krishna Dammalapati

Interested in inter-sectoral areas of Technology and Socio-Economic Development.