Skip to content

Commit 236927a

Browse files
committed
major SVC update
1 parent 3828716 commit 236927a

2 files changed

Lines changed: 280 additions & 43 deletions

File tree

book/2_models/7_SVM.md

Lines changed: 258 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,7 @@ myst:
1717

1818
# <i class="fa-solid fa-gear"></i> Support Vector Machines
1919

20-
WORK IN PROGRESS
21-
22-
After a brief excursion into generative models such as [discriminant analysis](5_LDA_QDA) or [Naïve Bayes](6_Naive_Bayes), we will now again discuss a discriminative family of models: Support Vector Machines (SVM). SVMs are powerful supervised learning models used for classification and regression tasks. When used for classification, they are called Support Vector Classifiers (SVC).
20+
After a brief excursion into generative models such as [LDA & QDA](5_LDA_QDA) or [Naïve Bayes](6_Naive_Bayes), we will now again discuss a discriminative family of models: Support Vector Machines (SVM). SVMs are powerful supervised learning models used for classification and regression tasks. When used for classification, they are called Support Vector Classifiers (SVC).
2321

2422
Let's consider some simulated classification data:
2523

@@ -28,100 +26,317 @@ import numpy as np
2826
import matplotlib.pyplot as plt
2927
import seaborn as sns
3028
from scipy import stats
29+
from matplotlib.lines import Line2D
3130
from sklearn.datasets import make_classification
3231
sns.set_theme(style="darkgrid")
3332
3433
X, y = make_classification(n_samples=50, n_features=2, n_informative=2, n_redundant=0,
3534
n_clusters_per_class=1, class_sep=2.0, random_state=0)
3635
3736
fig, ax = plt.subplots()
38-
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y, ax=ax)
37+
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y, s=60, ax=ax)
38+
ax.set(xlabel="Feature 1", ylabel="Feature 2")
39+
40+
# Custom legend
41+
legend_elements = [
42+
Line2D([0], [0], marker='o', linestyle='None', markersize=8, label='Class 0', markerfacecolor="#0173B2", markeredgecolor='None'),
43+
Line2D([0], [0], marker='o', linestyle='None', markersize=8, label='Class 1', markerfacecolor="#DE8F05", markeredgecolor='None')]
44+
ax.legend(handles=legend_elements, loc="upper left", handlelength=1);
45+
```
46+
47+
```{code-cell} ipython3
48+
:tags: ["remove-input"]
49+
from jupyterquiz import display_quiz
50+
display_quiz("quiz/SVC.json", shuffle_answers=True)
51+
```
52+
53+
```{admonition} Solution
54+
:class: dropdown
55+
56+
There are infinite ways to separate the two classes because you can find an infinte amount of lines which perfectly separate them.
57+
```
58+
59+
If we visualise this and add a new data point for classification a potential issue becomes apparent. For some models, this data point would fall into Class 0 and for others into Class 1:
60+
61+
```{code-cell} ipython3
62+
:tags: [remove-input]
63+
from collections import OrderedDict
64+
65+
fig, ax = plt.subplots()
66+
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y, ax=ax, s=60)
67+
68+
x_vals = np.linspace(X[:, 0].min(), X[:, 0].max(), 100)
69+
np.random.seed(42)
70+
slopes = np.random.uniform(1.5, 4, 20)
71+
intercepts = np.random.uniform(1, 3, 20)
72+
73+
for i, (m, b) in enumerate(zip(slopes, intercepts)):
74+
alpha = 0.4
75+
ax.plot(x_vals, m * x_vals + b, color='black', alpha=alpha, label='Decision boundaries' if i == 0 else None)
76+
77+
ax.plot(-0.8, 0, 'x', color='red', markeredgewidth=3, markersize=10, label="New data")
78+
79+
ax.set_xlim(-4, 3)
80+
ax.set_ylim(-1, 5)
3981
ax.set(xlabel="Feature 1", ylabel="Feature 2")
40-
ax.legend(labels=["Class 0", "Class 1"]);
82+
83+
# Custom egend
84+
legend_elements = [
85+
Line2D([0], [0], marker='o', linestyle='None', markersize=8, label='Class 0', markerfacecolor="#0173B2", markeredgecolor='None'),
86+
Line2D([0], [0], marker='o', linestyle='None', markersize=8, label='Class 1', markerfacecolor="#DE8F05", markeredgecolor='None'),
87+
Line2D([0], [0], color='k', linestyle='-', label='Decision boundaries'),
88+
Line2D([0], [0], marker='x', color='red', markersize=10, markeredgewidth=3, label='New data', markerfacecolor='None', linestyle='None')]
89+
ax.legend(handles=legend_elements, loc="upper left", handlelength=1);
4190
```
4291

43-
How
92+
## Support Vector Classifiers (SVC)
4493

94+
So evidently, we can't just be satisfied with having an infinite amount of possible solutions we need to come up with a more justifiable one. If you remember, we already did so for linear regression: there, the least squares method chose the line that minimised the total squared distance between predictions and true values.
4595

96+
Support Vector Classifiers have a slightly different method. As Robert Tibshirani put it, they are
4697

47-
- **Hyperplane**: A decision boundary that separates classes. In p dimensions, it is a p−1 dimensional flat affine subspace, given by the equation:
48-
$\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_p X_p = 0$
49-
The vector $\beta = (\beta_1, \beta_2, \dots, \beta_p)$ is the normal vector.
98+
> An approach to the classification problem in a way computer scientists would approach it.
99+
100+
Rather than minimising a squared error, they aim to find the hyperplane that maximises the margin — the distance between the separating hyperplane and the closest data points from each class. The idea is that by maximising this margin, we obtain a decision boundary that is both robust and generalisable.
101+
102+
- **Hyperplane**: A decision boundary that separates classes. In p dimensions, it is a p−1 dimensional subspace, given by the equation: $\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_p X_p = 0$. So in the case of two predictors the hyperplane is one dimensional (a line).
50103
- **Separating Hyperplane**: A hyperplane that correctly separates the data by class label.
51104
- **Margin**: The (perpendicular) distance between the hyperplane and the closest training points. A maximal margin classifier chooses the hyperplane that maximises this margin.
52105
- **Support Vectors**: Observations closest to the decision boundary. They define the margin and the classifier.
53106
- **Soft Margin**: A method used when the data is not linearly separable. Allows some observations to violate the margin. Controlled via the hyperparameter $C$.
54107
- **Kernel Trick**: Implicitly maps data into a higher-dimensional space to make it linearly separable using functions like polynomial or RBF (Gaussian) kernels.
55108

56-
## When to Use SVC
109+
110+
To formalise this intuition, SVCs look for the maximum margin classifier — a hyperplane that not only separates the classes but does so with the greatest possible distance to the closest training samples. These closest samples are known as support vectors, and they uniquely determine the position of the hyperplane. All other samples can be moved without changing the decision boundary, making SVCs especially robust to outliers away from the margin.
111+
112+
113+
## Using SVCs
114+
115+
As you learned in the lecture, SVCs are considered to be one of the best "out of the box" classifiers and can be used in many scenarios. This includes:
57116

58117
- When the number of features is large relative to the number of samples
59118
- When classes are not linearly separable
60119
- When a robust and generalisable classifier is needed
61120

62-
## Linear vs Nonlinear SVC
121+
If the data is not perfectly separable (either because the classes overlap, or the classes are not linearly separable) SVCs become creative in two ways
122+
123+
1. "Soften" what is meant by separating the classes and allow for errors
124+
2. Map feature space into a higher dimension (kernel trick)
125+
63126

64-
- **Linear SVC**: Suitable when data is linearly separable or when using a linear decision boundary is sufficient.
65-
- **Nonlinear SVC**: Use kernel methods when data exhibits nonlinear patterns.
127+
### Example 1: Linear Classification
66128

67-
## Example 1: Linear SVC on Linearly Separable Data
129+
Fitting a SVC is straigthforward:
68130

69131
```{code-cell} ipython3
70-
from sklearn.datasets import make_classification
71132
from sklearn.svm import SVC
72-
import matplotlib.pyplot as plt
73-
import numpy as np
74-
75-
X, y = make_classification(n_samples=100, n_features=2, n_informative=2,
76-
n_redundant=0, n_clusters_per_class=1, class_sep=2.0)
77133
78134
clf = SVC(kernel='linear')
79-
clf.fit(X, y)
135+
clf.fit(X, y);
136+
```
137+
138+
We can then write a little helper function to visualize the decision function and supports:
139+
140+
```{code-cell} ipython3
141+
def plot_svc_decision_function(model, ax=None):
142+
"""Plot the decision boundary and margins for a trained 2D SVC model."""
143+
# Set up grid
144+
xlim = ax.get_xlim()
145+
ylim = ax.get_ylim()
146+
xx, yy = np.meshgrid(np.linspace(*xlim, 100), np.linspace(*ylim, 100))
147+
grid = np.c_[xx.ravel(), yy.ravel()]
148+
decision_values = model.decision_function(grid).reshape(xx.shape)
149+
150+
# Plot decision boundary and margins
151+
ax.contour(xx, yy, decision_values, levels=[-1, 0, 1], linestyles=['--', '-', '--'], colors='k', alpha=0.5)
152+
153+
# Support vectors
154+
ax.scatter(*model.support_vectors_.T, s=200, linewidth=0.5, facecolors='none', edgecolors='k')
155+
156+
fig, ax = plt.subplots()
157+
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y, s=60, ax=ax)
158+
ax.set(xlabel="Feature 1", ylabel="Feature 2", xlim=(-5,3))
159+
plot_svc_decision_function(clf, ax=ax)
160+
161+
# Custom legend
162+
legend_elements = [
163+
Line2D([0], [0], marker='o', linestyle='None', markersize=8, label='Class 0', markerfacecolor="#0173B2", markeredgecolor='None'),
164+
Line2D([0], [0], marker='o', linestyle='None', markersize=8, label='Class 1', markerfacecolor="#DE8F05", markeredgecolor='None'),
165+
Line2D([0], [0], color='k', linestyle='-', label='Decision boundary'),
166+
Line2D([0], [0], color='k', linestyle='--', label='Decision margins'),
167+
Line2D([0], [0], marker='o', color='k', markersize=8, label='Support vectors', markerfacecolor='None', linestyle='None')]
168+
ax.legend(handles=legend_elements, loc="upper left", handlelength=1);
169+
```
170+
171+
172+
### Example 2: Nonlinear Classification
173+
174+
Let's consider different data, which is not linearly separable:
175+
176+
```{code-cell} ipython3
177+
from sklearn.datasets import make_circles
178+
X, y = make_circles(100, factor=.1, noise=.15)
179+
180+
fig, ax = plt.subplots()
181+
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y, s=60, ax=ax)
182+
ax.set(xlabel="Feature 1", ylabel="Feature 2")
80183
81-
w = clf.coef_[0]
82-
a = -w[0] / w[1]
83-
x_vals = np.linspace(X[:, 0].min(), X[:, 0].max())
84-
y_vals = a * x_vals - (clf.intercept_[0]) / w[1]
184+
# Custom egend
185+
legend_elements = [
186+
Line2D([0], [0], marker='o', linestyle='None', markersize=8, label='Class 0', markerfacecolor="#0173B2", markeredgecolor='None'),
187+
Line2D([0], [0], marker='o', linestyle='None', markersize=8, label='Class 1', markerfacecolor="#DE8F05", markeredgecolor='None')]
188+
ax.legend(handles=legend_elements, loc="upper left", handlelength=1);
189+
```
190+
191+
In that case, non-linear SVC can be applied. For example, a simple projection would be a radial basis function centered on the middle clump. As you can see, the data becomes linearly separable in three dimensions:
192+
193+
```{code-cell} ipython3
194+
from mpl_toolkits import mplot3d
195+
196+
# Apply radial basis function to the feature space
197+
r = np.exp(-(X ** 2).sum(1))
85198
86-
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='bwr')
87-
plt.plot(x_vals, y_vals, 'k-')
88-
plt.title('Linear SVC Decision Boundary')
89-
plt.show()
199+
# Plot features in 3D
200+
fig = plt.figure()
201+
ax = fig.add_subplot(projection='3d')
202+
203+
colors = np.array(["#0173B2", "#DE8F05"])[y] # colors for each class
204+
ax.scatter(X[:, 0], X[:, 1], r, c=colors, s=50, alpha=0.5, edgecolors=colors)
205+
ax.view_init(elev=20, azim=30)
206+
ax.set(xlabel='x', ylabel='y', zlabel='r');
207+
208+
# Custom egend
209+
legend_elements = [
210+
Line2D([0], [0], marker='o', linestyle='None', markersize=8, label='Class 0', markerfacecolor="#0173B2", markeredgecolor='None'),
211+
Line2D([0], [0], marker='o', linestyle='None', markersize=8, label='Class 1', markerfacecolor="#DE8F05", markeredgecolor='None')]
212+
ax.legend(handles=legend_elements, loc="upper left", handlelength=1);
90213
```
91214

92-
## Example 2: Nonlinear Classification with RBF Kernel
215+
We can create a similar plot as above, first with a linear SVC and second with a RBF SVC to visualize the decision boundary, margins, and support vectors:
93216

94217
```{code-cell} ipython3
95-
from sklearn.datasets import make_moons
96218
from sklearn.model_selection import train_test_split
97-
from sklearn.svm import SVC
98-
from sklearn.metrics import accuracy_score
219+
from sklearn.metrics import classification_report
99220
100-
X, y = make_moons(n_samples=200, noise=0.2, random_state=42)
101221
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
102222
103-
clf = SVC(kernel='rbf', C=1.0, gamma='scale')
104-
clf.fit(X_train, y_train)
223+
# Linear SVC
224+
clf_lin = SVC(kernel='linear')
225+
clf_lin.fit(X_train, y_train)
105226
106-
y_pred = clf.predict(X_test)
107-
print("Test Accuracy:", accuracy_score(y_test, y_pred))
227+
y_pred = clf_lin.predict(X_test)
228+
print("Linear SVC classification report:\n", classification_report(y_test, y_pred))
229+
230+
# Plot
231+
fig, ax = plt.subplots()
232+
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y, s=60, ax=ax)
233+
ax.set(xlabel="Feature 1", ylabel="Feature 2", xlim=(-5,3))
234+
plot_svc_decision_function(clf_lin, ax=ax)
235+
236+
# Custom legend
237+
legend_elements = [
238+
Line2D([0], [0], marker='o', linestyle='None', markersize=8, label='Class 0', markerfacecolor="#0173B2", markeredgecolor='None'),
239+
Line2D([0], [0], marker='o', linestyle='None', markersize=8, label='Class 1', markerfacecolor="#DE8F05", markeredgecolor='None'),
240+
Line2D([0], [0], color='k', linestyle='-', label='Decision boundary'),
241+
Line2D([0], [0], color='k', linestyle='--', label='Decision margins'),
242+
Line2D([0], [0], marker='o', color='k', markersize=8, label='Support vectors', markerfacecolor='None', linestyle='None')]
243+
ax.legend(handles=legend_elements, loc="upper left", handlelength=1);
108244
```
109245

246+
```{code-cell} ipython3
247+
# RBF SVC
248+
clf_rbf = SVC(kernel='rbf')
249+
clf_rbf.fit(X_train, y_train)
250+
251+
y_pred = clf_rbf.predict(X_test)
252+
print("RBF SVC classification report:\n", classification_report(y_test, y_pred))
253+
254+
# Plot
255+
fig, ax = plt.subplots()
256+
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y, s=60, ax=ax)
257+
ax.set(xlabel="Feature 1", ylabel="Feature 2", xlim=(-5,3))
258+
plot_svc_decision_function(clf_rbf, ax=ax)
259+
260+
# Custom legend
261+
legend_elements = [
262+
Line2D([0], [0], marker='o', linestyle='None', markersize=8, label='Class 0', markerfacecolor="#0173B2", markeredgecolor='None'),
263+
Line2D([0], [0], marker='o', linestyle='None', markersize=8, label='Class 1', markerfacecolor="#DE8F05", markeredgecolor='None'),
264+
Line2D([0], [0], color='k', linestyle='-', label='Decision boundary'),
265+
Line2D([0], [0], color='k', linestyle='--', label='Decision margins'),
266+
Line2D([0], [0], marker='o', color='k', markersize=8, label='Support vectors', markerfacecolor='None', linestyle='None')]
267+
ax.legend(handles=legend_elements, loc="upper left", handlelength=1);
268+
```
269+
270+
Notably, we now see that there are a lot more support vectors, especially when we fit a linear SVC to the data. This is because of the softening of the margins (???)
271+
110272
## Multiclass Classification
111273

112-
SVMs are inherently binary classifiers but can be extended:
274+
Cs are inherently binary classifiers but can be extended:
113275

114276
* **One-vs-One**: $\binom{K}{2}$ classifiers for each pair of classes.
115277
* **One-vs-All**: K classifiers, each comparing one class against the rest.
116278

117279
## Choosing Hyperparameters
118280

281+
SVCs have a few hyperparameters. Please have a look at the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC) for a more in-depth overview. For the SVC used in the previous examples, the most important ones are:
282+
119283
* `C`: Regularisation parameter; trade-off between margin width and classification error.
120284
* `kernel`: `'linear'`, `'poly'`, `'rbf'`, `'sigmoid'`, or custom.
121-
* `gamma`: Defines the influence of a training example; affects RBF, polynomial, and sigmoid kernels.
285+
* `gamma`: Kernel coefficient (for RBF, polynomial, and sigmoid kernels)
286+
287+
As always, hyperparameters should be tuned using [cross-validation](book/1_basics/3_resampling) to balance bias and variance. It often makes sense to use a [grid search](https://scikit-learn.org/stable/modules/grid_search.html) or related strategies to find the optimal solution:
288+
122289

123-
Hyperparameters should be tuned using cross-validation to balance bias and variance.
290+
```{code-cell} ipython3
291+
import pandas as pd
292+
from sklearn.model_selection import GridSearchCV
293+
294+
# Generate data
295+
X, y = make_circles(100, factor=.1, noise=.3)
296+
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=0)
297+
298+
# Grid search
299+
C_vals = np.logspace(-3, 3, 30) # 0.001 to 1000
300+
gamma_vals = np.logspace(-3, 1, 30) # 0.001 to 10
301+
param_grid = {'C': C_vals, 'kernel': ['rbf'], 'gamma': gamma_vals}
302+
303+
grid = GridSearchCV(SVC(), param_grid, cv=5)
304+
grid.fit(X_train, y_train)
305+
306+
# Results
307+
print("Best parameters:", grid.best_params_)
308+
print("Best cross-validation score:", grid.best_score_)
309+
print("Test set score:", grid.score(X_test, y_test))
310+
311+
# Plot heatmap
312+
results = pd.DataFrame(grid.cv_results_)
313+
scores_matrix = results.pivot(index='param_gamma', columns='param_C', values='mean_test_score') # Pivot table to make a matrix of mean test scores
314+
315+
fig, ax = plt.subplots()
316+
sns.heatmap(
317+
scores_matrix,
318+
cmap="viridis",
319+
xticklabels=False,
320+
yticklabels=False,
321+
ax=ax)
322+
323+
# Plot custom ticks
324+
n_ticks = 10 # plot n ticks
325+
xticks = np.linspace(0, len(scores_matrix.columns) - 1, n_ticks, dtype=int)
326+
yticks = np.linspace(0, len(scores_matrix.index) - 1, n_ticks, dtype=int)
327+
328+
xticklabels = [f"{scores_matrix.columns[i]:.3g}" for i in xticks]
329+
yticklabels = [f"{scores_matrix.index[i]:.3g}" for i in yticks]
330+
331+
ax.set(xticks=xticks, yticks=yticks, yticklabels=yticklabels, title="Mean CV Accuracy")
332+
ax.set_xticklabels(xticklabels, rotation=45);
333+
```
124334

125-
## Summary
335+
```{admonition} Summary
336+
:class: note
126337
127-
Support Vector Classifiers are a robust and versatile tool for classification tasks. The key ideas are rooted in geometry—finding the optimal hyperplane that separates data with maximum margin. With the use of kernels, SVMs extend effectively to non-linear decision boundaries and multiclass problems.
338+
- Support Vector Classifiers are a robust and versatile tool for classification tasks
339+
- The key ideas are rooted in geometry - finding the optimal hyperplane that separates data with maximum margin
340+
- With the use of kernels, SVCs extend effectively to non-linear decision boundaries
341+
- Multiclass classification can be done in a one-vs-one or one-vs-all approach
342+
```

book/2_models/quiz/SVC.json

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
[{
2+
"question": "How many solutions exist for linearly separating the two classes?",
3+
"type": "multiple_choice",
4+
"answers": [
5+
{
6+
"answer": "None",
7+
"correct": false
8+
},
9+
{
10+
"answer": "Infinite",
11+
"correct": true
12+
},
13+
{
14+
"answer": "One",
15+
"correct": false
16+
},
17+
{
18+
"answer": "Two",
19+
"correct": false
20+
}
21+
]
22+
}]

0 commit comments

Comments
 (0)