Skip to content

feat(surrogate): GP/RF interpolation from results table (#82)#108

Merged
jc-macdonald merged 1 commit into
mainfrom
feat/82-surrogate-models
May 12, 2026
Merged

feat(surrogate): GP/RF interpolation from results table (#82)#108
jc-macdonald merged 1 commit into
mainfrom
feat/82-surrogate-models

Conversation

@jc-macdonald
Copy link
Copy Markdown
Contributor

Closes #82.

Summary

Adds a new trade_study.surrogate module that fits scikit-learn surrogates (Gaussian process or random forest) to a ResultsTable and predicts any observable at unseen factor combinations.

API

from trade_study import fit_surrogate

model = fit_surrogate(results, factors, method="gp", seed=0)
model.predict({"alpha": 0.5, "beta": 0.3})         # {"y": ..., "z": ...}
model.predict_batch([{...}, {...}])                # {"y": ndarray, ...}
model.uncertainty({"alpha": 0.5, "beta": 0.3})     # GP only; RF raises
  • Continuous factors are min-max scaled to [0, 1].
  • Categorical / discrete factors are one-hot encoded against factor.levels.
  • NaN rows are dropped per-observable, so partial sweeps still fit.
  • method="gp" uses ConstantKernel * Matern(nu=1.5) + WhiteKernel with normalize_y=True; method="rf" uses RandomForestRegressor.

Packaging

  • New optional extra: pip install "trade-study[surrogate]" (scikit-learn ≥ 1.3).
  • Also rolled into the all aggregate.

Docs

  • New docs/api/surrogate.md page, wired into the mkdocs nav between Stacking and Visualization, with an install hint for the optional extra.

Tests

13 new tests covering both backends, GP uncertainty, mixed-factor encoding, NaN row handling, and input validation. surrogate.py coverage: 100%. Total: 296 passing, 99.5% project coverage.

Follow-ups

Add a new `trade_study.surrogate` module that fits scikit-learn
Gaussian-process or random-forest models to a ResultsTable and lets users
predict any observable at unseen factor combinations.

- `fit_surrogate(results, factors, *, method, seed, n_estimators)` returns
  a `SurrogateModel` carrying one estimator per observable.
- Continuous factors are min-max scaled to [0,1]; categorical/discrete
  factors are one-hot encoded against `factor.levels`.
- `SurrogateModel.predict` / `predict_batch` work for both backends;
  `uncertainty` returns the GP posterior std and raises
  NotImplementedError for RF.
- NaN rows are dropped per-observable.

Packaging:
- New optional extra `surrogate = ["scikit-learn>=1.3"]`, also added to
  the `all` aggregate.

Docs:
- New `docs/api/surrogate.md` page wired into mkdocs nav.

Tests:
- 13 new tests covering fit/predict round-trip for GP and RF, batch
  shape, GP uncertainty, RF uncertainty raising, mixed-factor encoding,
  unknown level / missing factor errors, NaN row handling, and input
  validation. surrogate.py coverage: 100%.

Closes #82.
Unblocks #105 (regime-conditional surrogate).
@jc-macdonald jc-macdonald merged commit ae54c34 into main May 12, 2026
4 checks passed
@jc-macdonald jc-macdonald deleted the feat/82-surrogate-models branch May 12, 2026 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Surrogate modeling: GP/RF interpolation from results table

1 participant