snippetMinor

How do I measure the reliability of a confidence value in a predictive algorithm?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

machine-learning cs stackoverflow algorithms statistics testing

thepredictiveconfidencevaluealgorithmhowmeasurereliability

Problem

Supposing I have some algorithm that is able to provide me with a confidence value for some event occurring. Let's say on day 1 it tells me that there is a 80% chance it will rain, on day 2 it tells me there is a 20% chance that the mail will be late, and on day 3 it tells me that there's a 90% chance my milk will go off.

3 days later, I measure what actually happened and come up with this result:

Confidence  Occurrence
80.000%         false
20.000%         false
90.000%         true

In other words, it did not rain on day 1, the mail was not late on day 2, and my milk did go off on day 3.

Supposing that the data set is large, but that the confidence values remain distributed and not confined to any particular range, how do I go about measuring the "reliability" of my algorithm, and what metrics could I use?

Note that the floating-point precision of the confidence value is high (say a 8-byte double). I know I could simply divide each confidence up and measure the samples in "buckets", but this would reduce the reliability of the result as it would require me to trade off between reducing the sample set for each range or accepting a larger error in the form of a broader test range. I want to use all of the information to get an accurate a result as possible.

Solution

Measuring probability given a single occurrence doesn't really make sense, because the only data that can fit such a sample is 100% or 0%. If you have many samples, that is, if you have some sample distribution, then you can do some measuring.

For example, you could take the KL-Divergence as a measure (although it's not symmetric).

Once you have a certain reliability measure on your algorithm, you can update this data every time you have an additional sample, in an on-line manner.

If you have more details regarding the specific algorithm, or the specific type of predictions, you can probably tailor something that suits better your scenario.

Context

StackExchange Computer Science Q#29505, answer score: 2

Revisions (0)

No revisions yet.