permuatation test
1. simple example from wikipedia
- We have two populations \(A\) and \(B\)
- We want to determine whether the population mean of \(A\) is different from the population mean of \(B\)
- We draw \(n_a\) samples from \(A\) and \(n_b\) samples from \(B\) and calculate \(\hat{\mu}_a\) and \(\hat{\mu}_b\)
- Let our null hypothesis be: the distributions of \(A\) and \(B\) are the same. Under this assumption, we could take our \(n_a\) and \(n_b\) points and relabel them however we want and the probability of drawing that sample would be the same.
- So, we test every possible re-labeling of our sampled examples, such that we have \(n_A\) points labeled as \(A\) and \(n_B\) labeled as \(B\).
- Then, for each re-labeling (or permutation), what is the difference between \(\hat{\mu}_a\) and \(\hat{\mu}_b\)?
- Look at the distribution of differences to get the p-value
- Let's say we are doing a one-sided test. So we want to know the probability of seeing a more drastic difference than the one we observed originally. We would just find the frequency of permutations that result in a more drastic difference. (see hypothesis testing)
2. some questions
- how does this vary with sample size?
- Let's say that our p-value represents the probability of seeing a difference as drastic as the observed one
- Usually when we give a p-value, we have a distribution for the null hypothesis. Here, we don't have that. How can we say that we have the p-value? TODO: come back to this question