how to tell if one sample mean is bigger than a set of other sample means
A reviewer asked us to show that our model is better than all other models in the table. They suggested using multiple paired tests and then correcting for multiple comparisons using Bonferroni. This struck me as odd, because shouldn't we do the opposite of correcting for multiple comparisons? Corrections such as Bonferroni limit the occurrence of any type I error. But the claim we are making is that our model is better than all other models, not that there exists a model that our model is better than. What we are claiming already puts us at increased risk for type II errors, because, as a family, the other models have many chances to beat us.
In the end I came around to their point of view, because at the very least, we should try to err on the more conservative side. And in a sense, if we are doing multiple comparisons, we should always correct for multiple comparisons, so each individual comparison can be said to be "valid" independently.
Other solutions:
- Use tukey's test to show that your model is pairwise better than all other models
- Use dunnett's test to show that all other models are worse than your model.
- Question: Do these tests handle exactly what I care about?
see also discussion here: