fisher information

1. definition

\(I(\theta) = - E\left[ \frac{\partial^2 l(X;\theta) }{\partial \theta^2} \mid \theta \right]\)

where \(X\) is a r.v. – the observation and \(\theta\) parameterizes the distribution from which \(X\) is drawn.

2. intuition

Note that the Fisher information is a function of \(\theta\). Let's first consider the term \(\frac{\partial^2 l(X;\theta)}{\partial \theta^2}\) Given that \(X\) is fixed, lets say that \(\theta\) is the Maximum Likelihood Estimation. Then what is this quantity? It is the curvature of the log-likelihood curve. It is how much the log likelihood is going to change as \(\theta\) is wiggled. If the curvature were very flat, then that means that all the \(\theta\) 's around the MLE have about the same likelihood. This can be interpreted as meaning that the observation \(X\) doesn't tell us which \(\theta\) generated the observation. Indeed if we gave all \(\theta\) 's the same prior, and we had to choose based on maximum likelihood, a flat curvature would mean that this choice is very hard. On the other hand, if the curvature were very steep around the MLE, that would mean we could be very certain that \(\theta\) generated \(X\).

Now, how hard is our choice on average? Take the expectation over all the possible values of the observation \(X\). Finally, if we're taking the expectation over \(X\), we need to draw \(X\) from some distribution. Remember that \(\theta\) is given, and that the Fisher information is a function of \(\theta\).

3. links

wikipedia page