patternMinor
Understanding of self-information
Viewed 0 times
informationunderstandingself
Problem
I am trying to fully understand the meaning behind the self-information of events. While I totally understood how to calculate it, all explanations I find online lack of an explanation why less likely events have more self-information.
I can't figure out what consequences this has and can't put it into perspective like other concepts in computer science.
What does the self-information really mean? Can someone put it into context and why exactly less likely events have a higher self-information?
I can't figure out what consequences this has and can't put it into perspective like other concepts in computer science.
What does the self-information really mean? Can someone put it into context and why exactly less likely events have a higher self-information?
Solution
The self-information is a measure of deviation from expectation of random variable in shannons (bits, the unit may vary when used in different context) when sampling a random variable.
Straight from definition it is reciprocal of probability of occurence for some event, so a little example would help.
Imagine that somewhere it rains only in one month every year (for some random number of days). From the event it is a sunny day we know almost nothing and it could be any month, so it meets the common (the most probable) outcome and we learned almost nothing, so it is not very important piece of information. But the event it rains gives us the exact month, so it is worth more in the sense that we have learned more, hence the higher self-information.
Some people call it a surprisal, which means "not the expected outcome". You need more memory to store rare event, (sorry for circular ref, but it is one of consequences, for the compression) think about the entropy encoding, it takes the most amount of bits to store the least expected events. Huffman codes and self-information.
Straight from definition it is reciprocal of probability of occurence for some event, so a little example would help.
Imagine that somewhere it rains only in one month every year (for some random number of days). From the event it is a sunny day we know almost nothing and it could be any month, so it meets the common (the most probable) outcome and we learned almost nothing, so it is not very important piece of information. But the event it rains gives us the exact month, so it is worth more in the sense that we have learned more, hence the higher self-information.
Some people call it a surprisal, which means "not the expected outcome". You need more memory to store rare event, (sorry for circular ref, but it is one of consequences, for the compression) think about the entropy encoding, it takes the most amount of bits to store the least expected events. Huffman codes and self-information.
Context
StackExchange Computer Science Q#80399, answer score: 3
Revisions (0)
No revisions yet.