UP | HOME

bootstrapping (statistics)

From wikipedia page

1. approach

Given a sample from a population, we can make an inference (sample \(\rightarrow\) population) about the population

  • ex: "the population mean is the sample mean"

How reliable is that inference? We could assume that sample means are distributed normally and compute confidence intervals by using a gaussian.

Bootstrapping approach: Model the population using the sample and model the inferences (sample \(\rightarrow\) population) using re-sampled inferences (re-sampled \(\rightarrow\) sample).

  • So, grab 100 people and bin their heights – use this as a model of the population height distribution
  • Draw with replacement from the sample to obtain re-samples
  • Now, how often do the confidence intervals for a re-sample contain the sample mean? This is the confidence interval!
  • Key: we know the sample mean, so we can directly check the validity of a given inference (re-sampled \(\rightarrow\) sample).

2. relationships with other concepts

  • resampling: how to make your peers believe your inferences
    • confidence that the data we've observed is representative of a data generating process we've summarized in our statistic of choice: bootstrapping
      • how precise our are observations? Is the data generating process very noisy?
    • what the observations would look like if the data generating process was random: permuatation test
    • is the data generating process going to continue to produce the type of observations we've seen
  • bootstrapping:
    • pretend that we're taking new data from our old data generating function
    • in R: create huge data-frame that is many copies of your original dataframe
      • do operations grouped by the bootstrap-id

Created: 2024-07-15 Mon 01:27