patternMinor
Mutual information intuition
Viewed 0 times
mutualintuitioninformation
Problem
I was creating an example for a casual talk on mutual information. I considered a system of two coins, which with probability 1/2 are copies of each other, and with probability 1/2 are independent.
Then $P(A=B)=1/2+1/2\times 1/2=3/4$, so $P(HH)=P(TT)=3/8$ and $P(HT)=P(TH)=1/8$.
Total information is then $E(-\log_2 p) = 1.81$.
Therefore mutual information is $H(X)+H(Y)-I(X;Y)=1+1-1.81=0.19$ bits.
Is this correct? Because intuitively in this case one might expect, as 50% of the times the coins are equal, the mutual information is $0.5$ bits.
Is there any simple intuition why it is less than $0.5$ bits?
Then $P(A=B)=1/2+1/2\times 1/2=3/4$, so $P(HH)=P(TT)=3/8$ and $P(HT)=P(TH)=1/8$.
Total information is then $E(-\log_2 p) = 1.81$.
Therefore mutual information is $H(X)+H(Y)-I(X;Y)=1+1-1.81=0.19$ bits.
Is this correct? Because intuitively in this case one might expect, as 50% of the times the coins are equal, the mutual information is $0.5$ bits.
Is there any simple intuition why it is less than $0.5$ bits?
Solution
Mutual information tells you how much you learn about $X$ from knowing the value of $Y$ (on average over the choice of $Y$). In other words, mutual information measures how many fewer bits you need to encode $X$ if you already know $Y$, than if you didn't. In your example, learning $Y$ enables you to guess $X$ correctly with probability 0.75, but as $H_2(0.75)>0.5$ ($H_2$ being the binary entropy function), you still need more than half a bit to encode the correct value of X even if you know Y. So your "savings" in encoding length are less than 0.5.
Here is a more illustrative example, I think. You have three random bits, $X$, $Y$, $Z$. $X$ and $Y$ are independent and unbiased, while $Z$ is the exclusive or of $X$ and $Y$. Then $H(X,Y) = 2$, while $H(X,Y|Z=z) = 1$ for any $z \in \{0,1\}$, because knowing $z$ and the value of $X$ determines $Y$. Then $I(X,Y;Z) = H(X,Y) - H(X,Y|Z) = 1$. In words, if you know $Z$ you only really need to encode $X$ rather than both $X$ and $Y$.
Here is a more illustrative example, I think. You have three random bits, $X$, $Y$, $Z$. $X$ and $Y$ are independent and unbiased, while $Z$ is the exclusive or of $X$ and $Y$. Then $H(X,Y) = 2$, while $H(X,Y|Z=z) = 1$ for any $z \in \{0,1\}$, because knowing $z$ and the value of $X$ determines $Y$. Then $I(X,Y;Z) = H(X,Y) - H(X,Y|Z) = 1$. In words, if you know $Z$ you only really need to encode $X$ rather than both $X$ and $Y$.
Context
StackExchange Computer Science Q#30155, answer score: 7
Revisions (0)
No revisions yet.