The bootstrap. The Swiss army knife of any data scientist

Applications of the bootstrap techniques in R

Gianluca Malato
Data Science Reporter

--

Photo by Mika Baumeister on Unsplash

Every measure must be followed by an error estimate. There’s no chance to avoid this. If I tell you “I’m 1,93 metres tall”, I’m not giving you any information about the precision of this measure. You could think that my precision is on the second decimal digit, but you can’t be sure.

So, what we really need is some way to assess the precision of our measure starting from the data sample we have.

If our observable is the mean value calculated over a sample, a simple precision estimate is given by the standard error. But what can we do if we are measuring something that is not the mean value? That’s the point at which bootstrap comes in help.

Bootstrap in a nutshell

Bootstrap is a technique made in order to measure confidence intervals and/or standard error of an observable that can be calculated on a sample.

It relies on the concept of resampling, which is a procedure that, starting from a data sample, simulates a new sample of the same size, considering every original value with replacement. Each value is taken at the same probability of the others (which is 1/N).

--

--

Gianluca Malato
Data Science Reporter

Theoretical Physicists, Data Scientist and fiction author. I teach Data Science, statistics and SQL on YourDataTeacher.com. E-mail: gianluca@gianlucamalato.it