Skip to content

Commit 3bbdb48

Browse files
committed
initial preparation for decision trees
1 parent 7404127 commit 3bbdb48

3 files changed

Lines changed: 46 additions & 4 deletions

File tree

book/2_models/7_SVM.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -135,9 +135,13 @@ clf = SVC(kernel='linear')
135135
clf.fit(X, y);
136136
```
137137

138-
We can then write a little helper function to visualize the decision function and supports:
138+
With a little helper function we can visualize the decision function and supports:
139139

140140
```{code-cell} ipython3
141+
---
142+
tags:
143+
- hide-input
144+
---
141145
def plot_svc_decision_function(model, ax=None):
142146
"""Plot the decision boundary and margins for a trained 2D SVC model."""
143147
# Set up grid
@@ -181,7 +185,7 @@ fig, ax = plt.subplots()
181185
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y, s=60, ax=ax)
182186
ax.set(xlabel="Feature 1", ylabel="Feature 2")
183187
184-
# Custom egend
188+
# Custom legend
185189
legend_elements = [
186190
Line2D([0], [0], marker='o', linestyle='None', markersize=8, label='Class 0', markerfacecolor="#0173B2", markeredgecolor='None'),
187191
Line2D([0], [0], marker='o', linestyle='None', markersize=8, label='Class 1', markerfacecolor="#DE8F05", markeredgecolor='None')]
@@ -317,6 +321,10 @@ As always, hyperparameters should be tuned using [cross-validation](book/1_basic
317321

318322

319323
```{code-cell} ipython3
324+
---
325+
tags:
326+
- hide-input
327+
---
320328
import pandas as pd
321329
from sklearn.model_selection import GridSearchCV
322330

book/2_models/8_Trees.md

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,36 @@ https://mlu-explain.github.io/decision-tree/
2121

2222
https://animlbook.com/classification/trees/index.html
2323

24-
https://mlu-explain.github.io/random-forest/
24+
https://mlu-explain.github.io/random-forest/
25+
26+
27+
28+
```{code-cell}ipython3
29+
import numpy as np
30+
import pandas as pd
31+
import matplotlib.pyplot as plt
32+
import seaborn as sns
33+
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor, plot_tree
34+
from sklearn.model_selection import (train_test_split, cross_val_score, GridSearchCV, RandomizedSearchCV, cross_validate, KFold)
35+
from sklearn.metrics import accuracy_score, mean_squared_error
36+
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
37+
from xgboost import XGBClassifier
38+
```
39+
40+
We will work with two datasets:
41+
42+
- Breast Cancer (binary classification)
43+
- Features: 30 real-valued measurements of tumors
44+
- Target: malignant (1) vs. benign (0)
45+
- Diabetes (continuous regression)
46+
- Features: 10 baseline variables (age, sex, BMI, …)
47+
- Target: a quantitative measure of disease progression
48+
49+
50+
```{code-cell}ipython3
51+
X = [[0, 0], [1, 1]]
52+
Y = [0, 1]
53+
54+
clf = DecisionTreeClassifier()
55+
clf = clf.fit(X, Y)
56+
```

requirements.txt

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
jupyter-book
22
numpy
33
pandas
4+
scipy
45
matplotlib
56
seaborn
67
plotly
@@ -12,4 +13,5 @@ statsmodels
1213
mlxtend
1314
patsy
1415
ipywidgets
15-
ISLP
16+
ISLP
17+
xgboost

0 commit comments

Comments
 (0)