snippetMinor
How do I measure the reliability of a confidence value in a predictive algorithm?
Viewed 0 times
thepredictiveconfidencevaluealgorithmhowmeasurereliability
Problem
Supposing I have some algorithm that is able to provide me with a confidence value for some event occurring. Let's say on day 1 it tells me that there is a 80% chance it will rain, on day 2 it tells me there is a 20% chance that the mail will be late, and on day 3 it tells me that there's a 90% chance my milk will go off.
3 days later, I measure what actually happened and come up with this result:
In other words, it did not rain on day 1, the mail was not late on day 2, and my milk did go off on day 3.
Supposing that the data set is large, but that the confidence values remain distributed and not confined to any particular range, how do I go about measuring the "reliability" of my algorithm, and what metrics could I use?
Note that the floating-point precision of the confidence value is high (say a 8-byte
3 days later, I measure what actually happened and come up with this result:
Confidence Occurrence
80.000% false
20.000% false
90.000% trueIn other words, it did not rain on day 1, the mail was not late on day 2, and my milk did go off on day 3.
Supposing that the data set is large, but that the confidence values remain distributed and not confined to any particular range, how do I go about measuring the "reliability" of my algorithm, and what metrics could I use?
Note that the floating-point precision of the confidence value is high (say a 8-byte
double). I know I could simply divide each confidence up and measure the samples in "buckets", but this would reduce the reliability of the result as it would require me to trade off between reducing the sample set for each range or accepting a larger error in the form of a broader test range. I want to use all of the information to get an accurate a result as possible.Solution
Measuring probability given a single occurrence doesn't really make sense, because the only data that can fit such a sample is 100% or 0%. If you have many samples, that is, if you have some sample distribution, then you can do some measuring.
For example, you could take the KL-Divergence as a measure (although it's not symmetric).
Once you have a certain reliability measure on your algorithm, you can update this data every time you have an additional sample, in an on-line manner.
If you have more details regarding the specific algorithm, or the specific type of predictions, you can probably tailor something that suits better your scenario.
For example, you could take the KL-Divergence as a measure (although it's not symmetric).
Once you have a certain reliability measure on your algorithm, you can update this data every time you have an additional sample, in an on-line manner.
If you have more details regarding the specific algorithm, or the specific type of predictions, you can probably tailor something that suits better your scenario.
Context
StackExchange Computer Science Q#29505, answer score: 2
Revisions (0)
No revisions yet.