The bootstrap. The Swiss army knife of any data scientist

Applications of the bootstrap techniques in R

Published in

Data Science Reporter

4 min readFeb 11, 2019

Every measure must be followed by an error estimate. There’s no chance to avoid this. If I tell you “I’m 1,93 metres tall”, I’m not giving you any information about the precision of this measure. You could think that my precision is on the second decimal digit, but you can’t be sure.

So, what we really need is some way to assess the precision of our measure starting from the data sample we have.

If our observable is the mean value calculated over a sample, a simple precision estimate is given by the standard error. But what can we do if we are measuring something that is not the mean value? That’s the point at which bootstrap comes in help.

Bootstrap in a nutshell

Bootstrap is a technique made in order to measure confidence intervals and/or standard error of an observable that can be calculated on a sample.

It relies on the concept of resampling, which is a procedure that, starting from a data sample, simulates a new sample of the same size, considering every original value with replacement. Each value is taken at the same probability of the others (which is 1/N).

The bootstrap. The Swiss army knife of any data scientist

Applications of the bootstrap techniques in R

Bootstrap in a nutshell

Written by Gianluca Malato