This post is the first part of a double post about SVMs. This part tries to help to clarify how SVMs work in theory (with 2 full developed examples). The second part (not published yet) will explain the algorithm to solve this problem using a computer: Quadratic Programming and SMO.
Index
- Basic definition of SVMs for binary classification
- Mathematical explanation
- Kernel trick
- Example 1: 2 points in a plane
- Example 2: 3 points in a plane
- Appendix A: Scalar Projection of Vectors (1 example)
- Appendix B: Lagrange Multipliers (3 examples)
Basic definition
Support vector machines (SVMs from now on) are supervised learning models used for classification and regression. The basic idea of SVMs in binary classification is to find the farthest boundary from each class.
Therefore, solving a basic mathematical function given the coordinates (features) of each sample will tell whether the sample belongs to one region (class) or other. Input features determine the dimension of the problem. To keep it simple, the explanation will include examples in 2 dimensions.
Mathematical explanation
The vector |
If both vectors are multiplied, the result will be the purple vector. |
Let us say that:
Where is a positive sample (class A) and
is a negative sample (class B).
A new variable is now introduced:
Multiply each of them by the previous equations:
The result is the same equation. Therefore, we only need the previous formula:
Finally, we add an additional constrain |
The next step is to maximize the projection of The length of the projection is given by the following formula: |
From the previous formula now let us substitute both positive and negative samples
and
so that:
Therefore:
The goal is to maximize which is the same as minimizing
or, to make it more mathematically convenient,
Thus, we have a function to minimize with a constraint (), so Lagrange is applied. In case you want to know more about Lagrange multipliers, you can check Appendix B.
First we have the function we want to minimize, and later the constraints.
Plug these two functions to .
Hence, we aim to minimize
The optimization depends on
Kernel trick
One of the most interesting properties of SVMs is that we can transform problems from a certain number of dimensions to another dimensional space. This flexibility, as known as kernel trick, allows SVMs to classify nonlinear problems.
![]() |
|
The following example shows how to graphically solve the XOR problem using 3 dimensions.
Now it is not difficult to imagine a plane that can separate between blue and red samples.
Example 1: 2 points in a plane
Points and class (coordinate x (x1), coordinate y (x2), class/output (y)):
Class (output):
Point 2:
Class (output):
We want to minimize:
We know that:
and
Let us calculate the second part of the function we want to minimize first to keep it simple:
Ergo:
Now let us calculate :
Now we have to figure out the bias
Solution =
Example 2: 3 points in a plane
Points and class (coordinate x (x1), coordinate y (x2), class/output (y)):
Class (output):
Point 2:
Class (output):
Point 3:
Class (output):
First let us calculate the second part of the function we want to minimize. You can see both alphas being multiplied by two numbers. The first number is the product of and
. The second number is the dot product between the two coordinates
.
Result:
Appendix A: Scalar Projection of Vectors
We have two points and its vector:
And we want to calculate the length of the vector’s projection (purple) on the orange vector (which by the way, is the perpendicular of both green lines). The result is the blue vector which is over the orange one within the green region.
For this, we have to solve the formula:
Now with the length and the angle, we can calculate the coordinates using sine and cosine functions.
B was in [9,7], so the point on the other side of the projection is
Appendix B: Lagrange Multipliers
Lagrange is a strategy to find local maxima and minima of a function subject to equality constraints, i.e. max of subject to
.
and
need to have continuous first partial derivatives.
If is a maximum of
for the original constrained problem, then there exists
such that
is a stationary point for Lagrange function (so
is 0).
In mathematics, a stationary point of a differentiable function of one variable is a point of the function domain where the derivative is zero. For a function of several variables, the stationary point is an input whose all partial derivatives are zero (gradient zero). They correspond to local maxima or minima. |
To make it clear, let us say that we have a surface |
Example 1: 1 constraint, 2 dimensions
Maximize on unit circle
Now we plug these results into the original equation.
Therefore we have two points:
Example 2: 1 constraint, 2 dimensions
Find the rectangle of maximal perimeter that can be inscribed in the ellipse . We want to maximize the perimeter which is
subject to
Now plug it into the original equation.
(we take the positive because it is a maximum),
Then,
So the maximum permieter is:
Example 3: 2 constraints, 3 dimensions
Now plug it into and
:
Since this is a parabola in 3 dimensions, this has no maximum, so it is a minimum.