Skip to content

[ENH] Add Probabilistic boosting and stacking compositors#993

Open
arnavk23 wants to merge 5 commits intosktime:mainfrom
arnavk23:feature/probabilistic-ensembling-pipelines
Open

[ENH] Add Probabilistic boosting and stacking compositors#993
arnavk23 wants to merge 5 commits intosktime:mainfrom
arnavk23:feature/probabilistic-ensembling-pipelines

Conversation

@arnavk23
Copy link
Copy Markdown
Contributor

@arnavk23 arnavk23 commented Mar 24, 2026

Reference Issues/PRs

Towards #7

What does this implement/fix? Explain your changes.

  • Adds ProbabilisticStackingRegressor and ProbabilisticBoostingRegressor as composable pipeline elements for probabilistic ensembling.
  • Both compositors support mixture-based output, true residual-based boosting, and are extensible for meta-learners and advanced weighting.

Does your contribution introduce a new dependency? If yes, which one?

No new dependencies are introduced.

What should a reviewer concentrate their feedback on?

  • API design and extensibility for future meta-learners and weighting schemes.

Did you add any tests for the change?

Yes.

Any other comments?

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

…e all imports are at the top of the test file to resolve UnboundLocalError. Clean up test file for robust import handling.
Copy link
Copy Markdown

@JiwaniZakir JiwaniZakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _predict_proba of ProbabilisticStackingRegressor, when meta_learner_ is present, the code constructs Mixture(distributions=[("meta", meta_pred)], weights=[1.0]) where meta_pred is either a raw numpy array from predict_proba or a 1D array from predict — neither is a skpro distribution object. Mixture expects its distributions argument to contain actual distribution instances, so this path will almost raise a runtime error and appears untested by the get_test_params scenarios (which don't exercise the meta_learner path at all).

Additionally, _fit builds meta-features via est_fitted.predict(X).values.flatten() (point predictions), which discards all distributional information when training the meta-learner. For a "probabilistic stacking" regressor, you'd typically want to include at minimum both the predicted mean and variance (or quantiles) as features, otherwise the meta-learner has no basis for producing calibrated uncertainty estimates.

The add_base_estimator method mutates self.estimators in-place post-construction, which conflicts with skpro/sklearn's convention that constructor parameters are not mutated — cloning the estimator after calling this method would not preserve the added estimator, breaking pipeline serialization and cross-validation workflows.

@arnavk23
Copy link
Copy Markdown
Contributor Author

Saw the same after my recent commit, trying to fix it. But thanks for the in detail review.

…ying to fix them. I think the issue is that the test is not properly setting up the data or the model, which is causing the predictions to be all zeros. I will try to debug this by adding some print statements to see what is going on with the data and the model. I will also check if there are any issues with the way the ensemble is being created or used in the test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants