HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Why do we try to maximize Lagrangian in SVMs?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
whymaximizelagrangiansvmstry

Problem

I was learning about support vector machines from MIT OpenCourseWare. I figured it out. I understand why we try to minimize $\frac{1}{2} w^2$. I just did not get why we try to maximize Lagrange expression like said at 35:56 in the YouTube video.

Margin (width) of the support vector is $2/\|w\|$. We want to maximize width it means that we also want to minimize $\|w\|$ from the equation $2/\|w\|$. It is true that we also want to minimize $\frac{1}{2}\|w\|^2$. Now we use Lagrange expression here, $$L = \frac{1}{2}\|w\|^2 + \sum_i α_i(y_i(w \bullet x+b)-1).$$ To find the minimum of $\frac{1}{2}\|w\|^2$, we apply gradient; $∇L = 0$. From the $∇L = 0$ equation, we get $Σ_iα_iy_i = 0$ and $w = Σ_iαi_iy_ix_i$. Then by using $Σ_iα_iy_i = 0$ and $w = Σ_iα_iy_ix_i$ equations and putting this in the Lagrange expression we end up with $$L(w,b,α) = \sum_iα_i-\frac{1}{2}\sum_{i,j}y_iy_jα_iα_j(x_i\bullet x_j).$$

I really understand up to here, but the professor said that we want to maximize $L$ here, why?
I really don't understand. Why are we trying to maximize the Lagrange expression? I thought we were trying to minimize it.

Solution

I think the answer to this deals with convex functions and duality.

$$L(w, b, \alpha) = \frac{1}{2}\|w\|^2 + \sum_i α_i(y_i(w \bullet x+b)-1).$$

When you minimize this, you are minimizing it over $w,b$. As you stated, this will produce the Lagrangian dual function:

$\Theta (\alpha) = inf L(w, b, \alpha) = \sum_iα_i-\frac{1}{2}\sum_{i,j}y_iy_jα_iα_j(x_i\bullet x_j)$

The maximization must be done here, but of the function $\Theta (\alpha)$ (the Lagrangian dual function).

Here is some background on why we are maximizing:

1) Let $p$ be the optimal value of the problem of minimizing $\frac{1}{2}\|w\|^2$ (the primal). The Lagrangian dual function has the property that $L(w, b, \alpha) \leq p$. It is a lower bound on the primal function. Instead of solving the primal problem, we are want to get the maximum lower bound on $p*$ by maximizing the Lagrangian dual function (the dual problem).

2) The primal problem is a convex function. The Lagrangian dual function is a concave function(can be verified). Maximizing a concave function is equivalent to minimizing a convex function. Therefore it is easy.

3) Why maximizing $\Theta (\alpha)$ is as good as minimizing $\frac{1}{2}\|w\|^2$: Let $d$ be the value of $max \Theta (\alpha)$. This primal convex optimization problem satisfies Slater's constraint qualification which gives us strong duality ($d=p*$). Therefore, $max \Theta (\alpha)$ is the same as $min \frac{1}{2}\|w\|^2$.

Context

StackExchange Computer Science Q#77368, answer score: 4

Revisions (0)

No revisions yet.