receiver operating characteristic
Recall that the solution to a hypothesis testing problem is a decision rule. How do we evaluate the quality of a decision rule?
There are two relevant quantities
1. detection and false alarm
\[\begin{align*} P_D &= \mathbb{P}(\hat{H}(\mathsf{y}) = H_1 \mid \mathsf{H} = H_1) = \int_{\mathcal{Y}_1} p_{\mathsf{y} \mid \mathsf{H}} (\mathbf{y} \mid H_1) d\mathbf{y}\\ P_F &= \mathbb{P}(\hat{H}(\mathsf{y}) = H_1 \mid \mathsf{H} = H_0) = \int_{\mathcal{Y}_0} p_{\mathsf{y} \mid \mathsf{H}} (\mathbf{y} \mid H_0) d\mathbf{y} \end{align*}\] where \(\mathcal{Y}_i\) is the set of all \(\mathbf{y}\) such that \(\hat{H}(\mathbf{y}) = H_i\)
Then
- \(P_D\) is called the "probability of detection". I think of it as:
- "Imagine that the sender is sending a bit. The bit can be corrupted. For all the true 1's that we receive, what is the fraction that we will end up detecting?"
- or "In the case that hypothesis 1 is true, what is the probability that it will produce an observation that we will classify as being from hypothesis 1?"
- or "if a person has a disease, what is the probability that they will present symptoms so that our decision rule will give them a positive diagnosis"
- \(P_F\) is called the "false alarm probability"
- "For all the true 0's that we receive, what is the fraction that we will end up incorrectly identifying as 1's?"
- "In the case that hypothesis 0 is true, what is the probability that it will produce an observation that we will classify as being from hypothesis 1?"
2. threshold of a rule
For a binary decision rule, our decision rule, e.g. likelihood ratio test (LRT), may assign a quantity to each \(\mathbf{y}\). Then, we can make our decision based on whether that value is below or above a certain threshold \(\eta\). So \(LRT(\mathbf{y}) \underset{H_2}{\overset{H_1}{\gtreqless}} \eta\). Then, for each choice of \(\eta\), we will have a different \((P_D, P_F)\). When \(\eta\rightarrow \infty\), we classify nothing as \(H_1\), so we detect nothing, but we also have no false positives. When \(\eta \rightarrow 0\), we will classify everything as \(H_1\), so we will have all false positives, but we will also detect everything.
All the points in between trace out the ROC curve.
3. other commonly used metrics in detection theory
- \(P_F\) is often called the size of the test (think: you get large \(P_F\) by making your detection area "large" and covering more observations)
- \(P_D\) is often called the power of the test (think: a powerful test is one that will say capture as many positive diagnoses as possible)
- \(P_F\) is also called "probability of Type 1 error" – false positive
- "probability of Type 2 error" is \((1-P_D)\) – false negative
- \(P_D\) is also called recall