Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks. It tries to find the best boundary known as hyperplane that separates different classes in the data. It is useful when you want to do binary classification like spam vs. not spam or cat vs. dog.
The main goal of SVM is to maximize the margin between the two classes. The larger the margin the better the model performs on new and unseen data.
Key Concepts of Support Vector MachineThe key idea behind the SVM algorithm is to find the hyperplane that best separates two classes by maximizing the margin between them. This margin is the distance from the hyperplane to the nearest data points (support vectors) on each side.
Multiple hyperplanes separate the data from two classesThe best hyperplane also known as the "hard margin" is the one that maximizes the distance between the hyperplane and the nearest data points from both classes. This ensures a clear separation between the classes. So from the above figure, we choose L2 as hard margin. Let's consider a scenario like shown below:
Selecting hyperplane for data with outlierHere, we have one blue ball in the boundary of the red ball.
How does SVM classify the data?The blue ball in the boundary of red ones is an outlier of blue balls. The SVM algorithm has the characteristics to ignore the outlier and finds the best hyperplane that maximizes the margin. SVM is robust to outliers.
Hyperplane which is the most optimized oneA soft margin allows for some misclassifications or violations of the margin to improve generalization. The SVM optimizes the following equation to balance margin maximization and penalty minimization:
\text{Objective Function} = (\frac{1}{\text{margin}}) + \lambda \sum \text{penalty }
The penalty used for violations is often hinge loss which has the following behavior:
Till now we were talking about linearly separable data that seprates group of blue balls and red balls by a straight line/linear line.
What if data is not linearly separable?When data is not linearly separable i.e it can't be divided by a straight line, SVM uses a technique called kernels to map the data into a higher-dimensional space where it becomes separable. This transformation helps SVM find a decision boundary even for non-linear data.
Original 1D dataset for classificationA kernel is a function that maps data points into a higher-dimensional space without explicitly computing the coordinates in that space. This allows SVM to work efficiently with non-linear data by implicitly performing the mapping. For example consider data points that are not linearly separable. By applying a kernel function SVM transforms the data points into a higher-dimensional space where they become linearly separable.
In this case the new variable y is created as a function of distance from the origin.
Mathematical Computation of SVMConsider a binary classification problem with two classes, labeled as +1 and -1. We have a training dataset consisting of input feature vectors X and their corresponding class labels Y. The equation for the linear hyperplane can be written as:
w^Tx+ b = 0
Where:
The distance between a data point x_i and the decision boundary can be calculated as:
d_i = \frac{w^T x_i + b}{||w||}
where ||w|| represents the Euclidean norm of the weight vector w.
Linear SVM ClassifierDistance from a Data Point to the Hyperplane:
\hat{y} = \left\{ \begin{array}{cl} 1 & : \ w^Tx+b \geq 0 \\ 0 & : \ w^Tx+b < 0 \end{array} \right.
Where \hat{y} is the predicted label of a data point.
Optimization Problem for SVMFor a linearly separable dataset the goal is to find the hyperplane that maximizes the margin between the two classes while ensuring that all data points are correctly classified. This leads to the following optimization problem:
\underset{w,b}{\text{minimize}}\frac{1}{2}\left\| w \right\|^{2}
Subject to the constraint:
y_i(w^Tx_i + b) \geq 1 \;for\; i = 1, 2,3, \cdots,m
Where:
The condition y_i (w^T x_i + b) \geq 1 ensures that each data point is correctly classified and lies outside the margin.
Soft Margin in Linear SVM ClassifierIn the presence of outliers or non-separable data the SVM allows some misclassification by introducing slack variables \zeta_i . The optimization problem is modified as:
\underset{w, b}{\text{minimize }} \frac{1}{2} \|w\|^2 + C \sum_{i=1}^{m} \zeta_i
Subject to the constraints:
y_i (w^T x_i + b) \geq 1 - \zeta_i \quad \text{and} \quad \zeta_i \geq 0 \quad \text{for } i = 1, 2, \dots, m
Where:
The dual problem involves maximizing the Lagrange multipliers associated with the support vectors. This transformation allows solving the SVM optimization using kernel functions for non-linear classification.
The dual objective function is given by:
\underset{\alpha}{\text{maximize }} \frac{1}{2} \sum_{i=1}^{m} \sum_{j=1}^{m} \alpha_i \alpha_j t_i t_j K(x_i, x_j) - \sum_{i=1}^{m} \alpha_i
Where:
The dual formulation optimizes the Lagrange multipliers \alpha_i and the support vectors are those training samples where \alpha_i > 0 .
SVM Decision BoundaryOnce the dual problem is solved, the decision boundary is given by:
w = \sum_{i=1}^{m} \alpha_i t_i K(x_i, x) + b
Where w is the weight vector, x is the test data point and b is the bias term. Finally the bias term b is determined by the support vectors, which satisfy:
t_i (w^T x_i - b) = 1 \quad \Rightarrow \quad b = w^T x_i - t_i
Where x_i is any support vector.
This completes the mathematical framework of the Support Vector Machine algorithm which allows for both linear and non-linear classification using the dual problem and kernel trick.
Types of Support Vector MachineBased on the nature of the decision boundary, Support Vector Machines (SVM) can be divided into two main parts:
We will predict whether cancer is Benign or Malignant using historical data about patients diagnosed with cancer. This data includes independent attributes such as tumor size, texture, and others. To perform this classification, we will use an SVM (Support Vector Machine) classifier to differentiate between benign and malignant cases effectively.
from sklearn.datasets import load_breast_cancer
import matplotlib.pyplot as plt
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.svm import SVC
cancer = load_breast_cancer()
X = cancer.data[:, :2]
y = cancer.target
svm = SVC(kernel="linear", C=1)
svm.fit(X, y)
DecisionBoundaryDisplay.from_estimator(
svm,
X,
response_method="predict",
alpha=0.8,
cmap="Pastel1",
xlabel=cancer.feature_names[0],
ylabel=cancer.feature_names[1],
)
plt.scatter(X[:, 0], X[:, 1],
c=y,
s=20, edgecolors="k")
plt.show()
Output:
SVM Advantages of Support Vector Machine (SVM)RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4