fisher exact test
1. derivation for \(2 \times 2\) contingency table
- setting: you have \(a+b\) blue balls and \(c+d\) red balls
- There are \(a\) blue balls and \(c\) red balls in class I. There are \(b\) blue balls and \(d\) red balls in class II. What is the probability under the null hypothesis that this occurs?
- This is the hypergeometric distribution
1.1. story 1
- Under the null model, you can imagine you have \(a+b+c+d\) total balls. Let's say they are labeled. You put them into labeled slots. Take the first \(a+c\) slots to be the class I balls. The rest are the class II balls.
- There are \(n!\) ways to arrange the balls in the slots. This is the denominator.
- Now, we want to count the number of settings where there are \(a\) blue balls in class I.
- Imagine the first \(a+c\) slots that you have reserved for class I balls. Choose \(\binom{a+c}{c}\) of those slots to be reserved for blue balls. Similarly, choose \(\binom{b+d}{b}\) of the class II slots and reserve those for blue balls.
- Within the blue balls, we can permute them \((a+b)!\) ways within their reserved slots. Within the red balls, we can permute them \((c+d)!\) ways. The total number of settings is then: \(\binom{a+c}{c}\binom{b+d}{b}(a+b)!(c+d)!\). This is the numerator.
1.2. story 2
- There are \(\binom{n}{a+c}\) ways to choose a set of balls of size \(a+c\). This is the denominator.
- There are \(\binom{a+b}{a}\) ways to choose the \(a\) blue balls which will go into the class I set. There are \(\binom{c+d}{c}\) ways to choose the red balls which go into the class I set. So the numerator is \(\binom{a+b}{a}\binom{c+d}{c}\).