patternMinor

Finding the dichotomy that maximizes information gain for a classifier?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

machine-learning information-theory cs stackoverflow optimization entropy probability-theory

thegainmaximizesclassifierthatforfindingdichotomyinformation

Problem

Suppose that $\Omega$ is a finite probability space,with measure $P$ (we can take $P$ uniform). Let $C \in \{\pm 1 \}$ be a random variable on $\Omega$, the classifier. Let

$$H(C) = H(C, \Omega, P) = \sum_{i \in \pm 1} P(C = i) \log P(C = i)$$

be the entropy of $C$.

For a random variable $Y$ on $\Omega$ and $r \in \mathbb{R}$, define the level set $\{Y = r\} = V_r$, and define

$$H(C | Y) = \Sigma_{r \in \mathbb{R}} P(Y = r) H(C|_{V_r}, V_r, P_{V_r})$$

where $P_{V_r}(B) = P(B \cap V_r) / P(V_r)$ is a measure on $V_r$. $H(C|_{V_r}, V_r, P_{V_r})$ is the entropy of the random variable restricted to the the subset $V_r$, with the conditional probability measure.

As in a decision tree classifier, I want to determine the subset $A \subset \Omega$ that maximizes $H(C) - H(C|1_A)$. That is, I want to maximize the information gain. What is the "best" algorithm for doing so? What is the complexity of this question? Are there good probabilistic algorithms ?

Some computation allows one to reformulate this as finding the subset $A$ that maximizes

$$\sum_e \sum_{i = \pm 1} P(C^i \cap A^e) \log(P(C^i \cap A^e)) + \sum_e P(A^e) \log P(A^e),$$

provided we let $A^e$ mean $A$ or $A^c$, and $C^i = \{C = i\}$.

Solution

As in the discussion, there are too many "features" for this to be in interesting question. In particular, letting $A = C^{-1}(1)$, the set of positive examples, then $H(C|A) = H(C|C) = 0$ so information gain is $H(C)$, which is maximal.

This is obvious in retrospect, in order to decrease the entropy maximally, make $C$ deterministic on $A$ and $A^c$.

For decision trees, the point is that we have a limited collection of features, specifically excluding C, to split our data set around.

Context

StackExchange Computer Science Q#73778, answer score: 2

Revisions (0)

No revisions yet.