ignorability
1. motivation
Let \(Y^1_i\) be the potential outcome that subject \(x_i\) sees when they receive treatment \(1\). Let \(Y^0_i\) be the potential outcome if they receive treatment \(0\).
So \(Y^1_i\) could be \(x_i\) 's probability of recovery after receiving medication \(i\).
Ideally, we would like to know \(E[Y^1 - Y^0]\), but of course the subject will only receive one treatment.
If the ignorability condition is met, then we will be able to estimate \(Y^1\) from the fraction of subjects who recovered and who did receive treatment 1.
We have ignorability if the event of realized treatment \(Tx_i\) is independent of both \([Y^1_i, Y^0_i]\): \[(Y^1_i, Y^0_i) \perp Tx_i\]
2. what can we do if we have the assumption?
The subjects each have a ideal responsiveness to the treatments. This responsiveness is a property of the subject. The independence relationship above means that conditioning on a certain treatment shouldn't move us to a region in the outcome space where the subjects have a different responsiveness than the population-wide profile.
If we can ensure that the realized treatments are doled out randomly, without any relationship to the subject's inherent responsiveness, then we can get a sense of the inherent responsiveness of the population to the drug.
Let's say that we decide to give the drug to 10% of the population randomly, then our 10% samply is guaranteed to be representative of the population distribution of responsiveness to the drug.
3. what happens if we don't have that assumption?
If we don't have ignorability, then our estimate is confounded by other factors. Let's say that only men choose to take drug 1. And that \(Y^1_i\) is very low for men, i.e., they have very low responsiveness to treatment 1. Let's say that no women choose to take drug 1, but \(Y^1_i\) is very high for women. Then, we don't have ignorability, because if we learn that \(x_i\) is a man, then we know that they took treatment 1, and that their \((Y^1_i, Y^0_i)\) has the male-specific profile.
Then, we run into a problem if we try to claim that the fraction of subjects who took 1 and recovered should represent the effectiveness of \(1\) across the population.