bootstrapping (statistics)
From wikipedia page
1. approach
Given a sample from a population, we can make an inference (sample \(\rightarrow\) population) about the population
- ex: "the population mean is the sample mean"
How reliable is that inference? We could assume that sample means are distributed normally and compute confidence intervals by using a gaussian.
Bootstrapping approach: Model the population using the sample and model the inferences (sample \(\rightarrow\) population) using re-sampled inferences (re-sampled \(\rightarrow\) sample).
- So, grab 100 people and bin their heights – use this as a model of the population height distribution
- Draw with replacement from the sample to obtain re-samples
- Now, how often do the confidence intervals for a re-sample contain the sample mean? This is the confidence interval!
- Key: we know the sample mean, so we can directly check the validity of a given inference (re-sampled \(\rightarrow\) sample).
2. relationships with other concepts
- resampling: how to make your peers believe your inferences
- confidence that the data we've observed is representative of a data generating process we've summarized in our statistic of choice: bootstrapping
- how precise our are observations? Is the data generating process very noisy?
- what the observations would look like if the data generating process was random: permuatation test
- is the data generating process going to continue to produce the type of observations we've seen
- confidence that the data we've observed is representative of a data generating process we've summarized in our statistic of choice: bootstrapping
- bootstrapping:
- pretend that we're taking new data from our old data generating function
- in R: create huge data-frame that is many copies of your original dataframe
- do operations grouped by the bootstrap-id
2.1. helpful links
- Wright et al – Using Bootstrap Estimation and the Plug-in Principle for Clinical Psychology Data
- 2021-09-13
- sampling distribution – the distribution of a statistic, considered as a random variable
- youtube video by Very Normal