UP | HOME

mutual information

The mutual information between random variables X and Y is the Kullback-Leibler divergence between P(X,Y) and P(X)P(Y): I(X;Y)=DKL(P(X,Y)||P(X)P(Y)=x,yp(x,y)logp(x,y)p(x)p(y) Expressed in terms of entropy: I(X;Y)=H(X)H(XY)=H(Y)H(YX) In the first line, you can think of:

You can also think of I(X;Y) as a measure of the information shared by X and Y. How much does knowing one variable reduce uncertainty about the other? If X and Y are independent, then I(X;Y) is zero, because knowing the value of Y doesn't change the distribution of X at all. However, if X is a deterministic function of Y, then I(X;Y)=H(X)H(XY)=H(X) because H(XY)=0, since there is no uncertainty after observing Y. Then, the mutual information is H(X). That is, the amount of uncertainty reduction that we get from observing Y is exactly all the uncertainty that X had to begin with.

You can also take a close look at the definition. Just like Entropy, MI is obtained by averaging over a distribution. Here, we average over the joint distribution, and at each point, we take a measure of how far from independence we are.

It turns out that I(X;Y)=0 if and only if X and Y are independent.

Note that I(X;Y)=I(Y;X) is a symmetric measure.

Notice that log(p(x,y))log(p(x)p(y)) can be thought of as a distance from independence.

1. useful links

Created: 2024-07-15 Mon 01:28