UP | HOME

permuatation test

1. simple example from wikipedia

  • We have two populations \(A\) and \(B\)
  • We want to determine whether the population mean of \(A\) is different from the population mean of \(B\)
  • We draw \(n_a\) samples from \(A\) and \(n_b\) samples from \(B\) and calculate \(\hat{\mu}_a\) and \(\hat{\mu}_b\)
  • Let our null hypothesis be: the distributions of \(A\) and \(B\) are the same. Under this assumption, we could take our \(n_a\) and \(n_b\) points and relabel them however we want and the probability of drawing that sample would be the same.
  • So, we test every possible re-labeling of our sampled examples, such that we have \(n_A\) points labeled as \(A\) and \(n_B\) labeled as \(B\).
  • Then, for each re-labeling (or permutation), what is the difference between \(\hat{\mu}_a\) and \(\hat{\mu}_b\)?
    • Look at the distribution of differences to get the p-value
    • Let's say we are doing a one-sided test. So we want to know the probability of seeing a more drastic difference than the one we observed originally. We would just find the frequency of permutations that result in a more drastic difference. (see hypothesis testing)

2. some questions

  • how does this vary with sample size?
  • Let's say that our p-value represents the probability of seeing a difference as drastic as the observed one
    • Usually when we give a p-value, we have a distribution for the null hypothesis. Here, we don't have that. How can we say that we have the p-value? TODO: come back to this question

3. sources

4. see also

Created: 2024-07-15 Mon 01:28