UP | HOME

fisher information

1. definition

\(I(\theta) = E \left[ \left( \frac{\partial l (X; \theta)}{\partial \theta} \right)^2 \right]\)

Where \(\frac{\partial l (X; \theta)}{\partial \theta}\) is the score

And if some technical conditions are met, e.g., the second derivative exists: \(I(\theta) = -E \left[ \frac{\partial^2 l(X;\theta) }{\partial \theta^2} \mid \theta \right]\)

where \(X\) is a r.v. – the observation and \(\theta\) parameterizes the distribution from which \(X\) is drawn.

The fisher information can also be written as the variance of the score: \[\begin{align*} Var \left( \frac{\partial l (X; \theta)}{\partial \theta} \right) &= E\left[ \left( \frac{\partial l(x; \theta)}{\partial \theta} \right)^2 \right] - E\left[ \frac{\partial l(X; \theta)}{\partial \theta} \right]^2\\ &= E\left[ \left( \frac{\partial l(x; \theta)}{\partial \theta} \right)^2 \right] - 0\\ &= I(\theta) \end{align*} \] The second line comes from the fact that the expectation of the score is zero (see score).

2. intuition

Note that the Fisher information is a function of \(\theta\). Let's first consider the term \(\frac{\partial^2 l(X;\theta)}{\partial \theta^2}\). What is this quantity? It is the curvature of the log-likelihood curve. Let's say that \(X\) is fixed, i.e., is a realization. If the curvature were very flat, then that means that all the \(\theta\) 's around \(\theta\) have about the same likelihood. If we're trying to identify \(\theta\) by its likelihood, this can be interpreted as meaning that the observation \(X\) doesn't tell us which \(\theta\) generated the observation. Indeed if we gave all \(\theta\) 's the same prior weight, and we had to choose based on maximum likelihood, a flat curvature would mean that this choice is very hard. On the other hand, if the curvature were very steep around \(\theta\), that would mean we could be very certain that \(\theta\) generated \(X\).

Now, how hard is our choice on average? Take the expectation over all the possible values of the observation \(X\), assuming that \(X\) is drawn with repect to the distribution with parameter \(\theta\).

Note: there are some flaws in this intuitive story: see this answer. The only real way to connect this to the difficulty of estimation is through the cramer-rao bound

3. links

Created: 2025-11-02 Sun 18:55