score
1. definition
- \(s = \frac{\partial l (X; \theta)}{\partial \theta}\). Where \(l\) is the log likelihood
- Note, in score matching, what is called the "score" is the derivative with respect to \(X\) (see andy jones blog)
2. expectation
The expectation of the score is 0. We can use the log-derivative trick to show this \[\begin{align*} E[s] &= E\left[ \frac{\partial l (X; \theta)}{\partial \theta} \right]\\ &= E\left[ \frac{\partial \log p(x \mid \theta)}{\partial \theta} \right]\\ &= E\left[ \frac{1}{p(x \mid \theta)} \frac{\partial p(x \mid \theta)}{\partial \theta} \right]\\ &= \int \frac{1}{p(x \mid \theta)} \frac{\partial p(x \mid \theta)}{\partial \theta} p(x \mid \theta) dx\\ &= \int \frac{\partial p(x \mid \theta)}{\partial \theta} dx\\ &= \frac{\partial}{\partial \theta} \int p(x \mid \theta) dx\\ &= \frac{\partial}{\partial \theta} 1\\ &= 0\\ \end{align*}\]
- The third line comes from the log-derivative trick (just the chain rule)
- The second to last line comes from moving the derivative outside the integral (see also leibniz rule)