Skip to content

[BUG] Fix incorrect variance calculation and capability tagging in Hurdle#973

Open
ANANYA542 wants to merge 1 commit intosktime:mainfrom
ANANYA542:fix-hurdle-variance
Open

[BUG] Fix incorrect variance calculation and capability tagging in Hurdle#973
ANANYA542 wants to merge 1 commit intosktime:mainfrom
ANANYA542:fix-hurdle-variance

Conversation

@ANANYA542
Copy link
Copy Markdown
Contributor

Reference Issues/PRs

fixes #972

What does this implement/fix? Explain your changes.

This PR fixes an issue in the Hurdle distribution where the variance was being computed incorrectly.
The current implementation uses p * var_positive + p * (1 - p) * mean_positive, but the mean term should be squared. This was leading to significant underestimation of the variance. The formula has been corrected to use (mean_positive ** 2) in line with the law of total variance.

I also updated how capability tags are determined. Previously, Hurdle checked the base distribution to decide whether mean/var are exact, but since it internally wraps the distribution using LeftTruncated, those values are actually computed via numerical approximation. The tags now reflect this correctly by checking the truncated distribution instead.

Does your contribution introduce a new dependency? If yes, which one?

No

What should a reviewer concentrate their feedback on?

  • Whether the updated variance formula aligns with expected behavior across different base distributions
  • Whether capability tagging should consistently be based on the internally used truncated distribution

Did you add any tests for the change?

I verified the fix using Monte Carlo simulation (100k samples), where the empirical variance now matches the analytical result within tolerance. No new formal unit tests have been added yet.

Any other comments?

A screenshot showing the before/after behavior has also been added for clarity.

Screenshot 2026-03-20 at 2 23 44 AM

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

@ANANYA542
Copy link
Copy Markdown
Contributor Author

Hi @fkiraly — just following up on this PR. If you could take some time to review it and let me know if any changes or explanation is required, I’d really appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Incorrect variance computation and misleading capability tags in Hurdle

1 participant