perplexity
The perplexity of a model \(q\), given a test set \(\{x_1,...,x_N\}\) is: \[ \exp_{b}\left(-\frac{1}{N} \sum_{i=1}^{N} \log q_b(x_i)\right) \] which measures how well the test set is explained by the model (lower is better).
Let \(\tilde{p}(x) = \frac{n}{N}\) be the fraction of times that \(x\) appears in the test set. That is, \(n\) of the \(N\) samples are \(x\). Then, the exponent in the perplexity is the Cross Entropy between the empirical distribution \(\tilde{p}\) and \(q\): \[ H(\tilde{p}, q) = \sum_{x} \tilde{p}(x) \log_b q(x) \]
Yoon Kim explains the perplexity as the "branching factor" of your language model. How many choices on average does the model have at each step of token generation?