cohen's kappa
- measure of inter-rater agreement \(1 - \frac{1 - p_o}{1 - p_e}\) where \(p_o\) is the observed agreement and \(p_e\) is the agreement you would expect to see by chance
- higher is more agreement
- Intuition: It's the observed agreement normalized by the amount of agreement that you would expect to see by chance.
- What's the agreement that you expect to see?
- For each rater \(i\), for each label \(k\), let \(n_{k,i}\) be the observed number of samples labeled \(k\) by \(i\). Then if there are N observations, you would expect to see rater \(i\) and \(j\) agree on label \(k\) for \(\frac{n_{i,k}}{N}\frac{n_{j,k}}{N}\) fraction of the observations
- Interpretation: So called "Landis and Koch" criteria