patternMinor
AdaBoost - why using such alpha function?
Viewed 0 times
suchwhyalphafunctionusingadaboost
Problem
I'm reading the paper where AdaBoost was invented (link), and I couldn't understand why they have chosen the formula
snippet:
AdaBoost algorithm from the paper
What is the motivation behind that specific formula?
Why not using something that feels more natural like
α_t = 1/2 * ln((1-ε_t) / ε_t).snippet:
AdaBoost algorithm from the paper
What is the motivation behind that specific formula?
Why not using something that feels more natural like
ε_t?Solution
Recall that the final hypothesis after $T$ rounds is $h_T(x)=sign\left(\sum\limits_{i=1}^T \alpha_t h_t(x)\right)$, i.e. $\alpha_t$ is the weight of $h_t$ in $h_T$. If $\epsilon_t$ is high (near one) you want to answer the opposite of $h_t$, so you want $\alpha_t$ to be negative and very large in absolute value. If on the other hand $\epsilon_t$ is very low then you want $\alpha_t$ to be very large. The worst case is $\epsilon_t=\frac{1}{2}$, in which case $h_t$ is of no use to you.
The function $\log\frac{1-\epsilon_t}{\epsilon_t}$ satisfies those properties. This magnitude is known as log odds (where the probability considered is $p=1-\epsilon_t$, the success probability). Intuitively, the odds ratio tells you how often an event with probability $p$ occurs, in our case if e.g. $\epsilon_t=1/3$ then $\frac{1-\epsilon_t}{\epsilon_t}=2$, i.e. $h_t$ is expected to succeed with ratio $2:1$.
The function $\log\frac{1-\epsilon_t}{\epsilon_t}$ satisfies those properties. This magnitude is known as log odds (where the probability considered is $p=1-\epsilon_t$, the success probability). Intuitively, the odds ratio tells you how often an event with probability $p$ occurs, in our case if e.g. $\epsilon_t=1/3$ then $\frac{1-\epsilon_t}{\epsilon_t}=2$, i.e. $h_t$ is expected to succeed with ratio $2:1$.
Context
StackExchange Computer Science Q#136958, answer score: 4
Revisions (0)
No revisions yet.