confounds

Consider the following dependence relations:

     Z
    / \
   /   \
  /     \
 /       \
v         v
X -------> Y

Where \(X\) is the drug, \(Y\) is the recovery event and \(Z\) is gender. We want to know what the effectiveness of \(X\) is: \[ P(Y \mid do(X)) \] That is, if every sick person was forced to take the drug, what would the recovery rate by? We need to be careful, because we can't just say that this is the same as \[ P(Y\mid X) \] Gender, \(Z\), affects \(X\) and \(Y\), so we could imagine the case where:

\(X\) is not effective for men. \(X\) is 100% effective for women.
Most women take \(X\), and most men do not take \(X\)

Then, looking at \(P(Y\mid X=\text{take drug}) \approx 0\) because what we observe is confounded by gender effects.

How should we adjust for this? We need to look at the men who actually took the drug and find their recovery rate. Then we will look at the women who actually took the drug and find their recovery rate. We will weight each rate by the proportion of men and women in the population: \[ P(Y \mid do(X)) = P(Z=\text{men})P(Y \mid Z=\text{men}, X) + P(Y \mid Z=\text{women}, X)P(Z=\text{women}) \] Another way to interpret this adjustment:

let's find the rate of recovery per gender (\(P(Y \mid Z=x)\)) and then pretend that each gender had that recovery rate, for all members of the gender.

1. Correction to the above

Upon reading more, I think a better teaching example would by \(X\) is the treatment, \(Y\) is the outcome, and \(Z\) is the severity of the disease.

If it happens that \(x_1\) is a stronger drug, but is only given in more severe cases, it could happen that \(p(y=recover\mid x_1)\) is lower than the weaker drug.

Importantly, severity causally effects the outcome as well as the medication.

Note the subtle difference between this and the previously given example. For the "severity" example, you might imagine that the effectiveness of the drugs are the same, and keeping the drugs fixed, the outcome will vary linearly with severity. So the "severe" drug-outcome distribution is just a scaled version of the "mild" drug-outcome distribution.

In contrast, in the "gender" example, there is an entirely different distribution of outcomes per drug, dependent on gender.

2. sources

confounds wikipedia page

3. see also

conditional dependence – especially "explaining away"
wikipedia page on interactions – if we don't include an interaction term in our statistical model, we may be ignoring the fact that the main effect is confounded with an interaction effect between two covariates. – TODO come back to this