HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Showing that Bayes classifier is optimal

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
showingoptimalbayesclassifierthat

Problem

Consider domain $X$, label set $ Y=\{0,1\}$ and the zero-one loss.

Given any probability distribution D over $ X\times \{0,1\} $, we've defined the Bayes classifier $ f_D $ to be-

$$ f_{D}(x)=
\begin{cases}
1 & \text{if }\mathbb{P}[y=1|x]\geq\tfrac{1}{2}\\
0 & \text{otherwise.}
\end{cases} $$

I wish to prove that, for any classifer $ g\colon X\rightarrow\{0,1\}$,
$ L_D(f_D)\leq L_D(g)$, which means that $ f_D$ is optimal.

$L_D(h) $ is defined to be the "true error" of the classifier $h$. That is, $L_D(h)=D\{(x,y)\mid h(x)\not = y\}$.

I'm having some hard time proving this given the definitions above, and some hints/intuition will be appreciated.

Solution

The true error of a classifier $h$ is
$$
\begin{align*}
L_D(h) &= \sideset{\mathbb{E}}{}{}_{x,y \sim D} \Pr[h(x) \neq y] \\ &=
\sideset{\mathbb{E}}{}{}_{x,y \sim D}
\begin{cases}
\Pr[y \neq 0|x] & \text{if } h(x) = 0, \\
\Pr[y \neq 1|x] & \text{if } h(x) = 1.
\end{cases}
\end{align*}
$$
(All probabilities are with respect to $D$.)

The optimal classifier is thus the one that minimizes the loss function
$$
\phi(x) = \begin{cases}
\Pr[y \neq 0|x] & \text{if } h(x) = 0, \\
\Pr[y \neq 1|x] & \text{if } h(x) = 1
\end{cases}
$$
for all $x$. We can rewrite the loss function as
$$
\phi(x) = \begin{cases}
\Pr[y = 1|x] & \text{if } h(x) = 0, \\
1-\Pr[y = 1|x] & \text{if } h(x) = 1
\end{cases}
$$
So if $\Pr[y=1|x] 1-\Pr[y=1|x]$ we should choose $h(x) = 1$; if $\Pr[y=1|x] = 1-\Pr[y=1|x]$ then the choice doesn't matter.

Finally, $\Pr[y=1|x] < 1-\Pr[y=1|x]$ is the same as $\Pr[y=1|x] < 1/2$, and this explains the formula for $f_D(x)$.

Context

StackExchange Computer Science Q#72295, answer score: 11

Revisions (0)

No revisions yet.