|
| 1 | +# Getting Started |
| 2 | + |
| 3 | +This page shows how to generate an operating profile in a notebook and how to interpret it for common binary classifiers. |
| 4 | + |
| 5 | +## Setup |
| 6 | + |
| 7 | +```bash |
| 8 | +pip install -e . |
| 9 | +``` |
| 10 | + |
| 11 | +```python |
| 12 | +import numpy as np |
| 13 | +from opproplot import operating_profile_plot |
| 14 | +``` |
| 15 | + |
| 16 | +## Basic example |
| 17 | + |
| 18 | +```python |
| 19 | +rng = np.random.default_rng(0) |
| 20 | +y_true = rng.integers(0, 2, size=5000) |
| 21 | +scores = rng.random(size=5000) |
| 22 | + |
| 23 | +fig, ax_hist, ax_metric = operating_profile_plot(y_true, scores, bins=30) |
| 24 | +``` |
| 25 | + |
| 26 | +- Left axis: stacked histogram of scores by class. |
| 27 | +- Right axis: TPR, FPR, and Accuracy evaluated at each bin midpoint threshold. |
| 28 | +- Choose thresholds where TPR/FPR trade-offs make sense for your application. |
| 29 | + |
| 30 | +## With scikit-learn (real example) |
| 31 | + |
| 32 | +```python |
| 33 | +from sklearn.datasets import load_breast_cancer |
| 34 | +from sklearn.model_selection import train_test_split |
| 35 | +from sklearn.linear_model import LogisticRegression |
| 36 | + |
| 37 | +data = load_breast_cancer() |
| 38 | +X_train, X_test, y_train, y_test = train_test_split( |
| 39 | + data.data, data.target, test_size=0.3, random_state=0, stratify=data.target |
| 40 | +) |
| 41 | + |
| 42 | +clf = LogisticRegression(max_iter=500) |
| 43 | +clf.fit(X_train, y_train) |
| 44 | + |
| 45 | +y_score = clf.predict_proba(X_test)[:, 1] |
| 46 | + |
| 47 | +fig, ax_hist, ax_metric = operating_profile_plot(y_test, y_score, bins=30) |
| 48 | +ax_hist.set_title("Breast cancer classifier operating profile") |
| 49 | +``` |
| 50 | + |
| 51 | +Pattern applies to other models: |
| 52 | + |
| 53 | +- Random forest / gradient boosting: use `model.predict_proba(X)[:, 1]`. |
| 54 | +- XGBoost / LightGBM: use `predict` outputs as scores. |
| 55 | + |
| 56 | +## Interpreting the plot |
| 57 | + |
| 58 | +- Separability: wider gap between class histograms indicates better discrimination. |
| 59 | +- Threshold effects: steep TPR drops highlight sensitive regions. |
| 60 | +- Accuracy peak: dashed accuracy curve shows the maximizer without trial-and-error. |
| 61 | + |
| 62 | +For deeper theory and metric formulas, see [Theory](theory.md). |
0 commit comments