Skip to content

[ENH] add truncated_mean interface to BaseDistribution#995

Open
patelchaitany wants to merge 1 commit intosktime:mainfrom
patelchaitany:feature/truncated-mean-interface
Open

[ENH] add truncated_mean interface to BaseDistribution#995
patelchaitany wants to merge 1 commit intosktime:mainfrom
patelchaitany:feature/truncated-mean-interface

Conversation

@patelchaitany
Copy link
Copy Markdown
Contributor

Reference Issues/PRs

Addresses #221.

What does this implement/fix? Explain your changes.

Adds truncated_mean(lower, upper) to BaseDistribution. Returns E[X | lower < X < upper]. Closed-form implementations for Normal, Exponential, Laplace, and Uniform. Other distributions fall back to a ppf-based numerical approximation.

The base class _energy_x now uses truncated means when available: if a distribution has exact truncated_mean and cdf, it computes E[|X - c|] from the energy identity instead of Monte Carlo sampling.

TruncatedDistribution gets a _mean() that calls the inner distribution's truncated_mean, so TruncatedDistribution(Normal(...), lower=0).mean() is exact now.

Changes

  • base/_base.py - new truncated_mean / _truncated_mean with ppf/MC default, updated _energy_x
  • normal.py - exact _truncated_mean
  • exponential.py - exact _truncated_mean
  • laplace.py - exact _truncated_mean
  • uniform.py - exact _truncated_mean
  • truncated.py - _mean() via inner distribution's truncated_mean, updated tag logi

Does your contribution introduce a new dependency? If yes, which one?

No

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

No

Any other comments?

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

Copy link
Copy Markdown

@JiwaniZakir JiwaniZakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sampling-based fallback in _truncated_mean calls self.sample(1) inside a loop of approx_spl_size (default 1000) iterations, which is orders of magnitude slower than calling self.sample(approx_spl_size) once and filtering. The existing _mean fallback at the end of the file does exactly this with _sample_mean, so the pattern is already established — the loop approach here is inconsistent with the rest of the codebase and will be a significant performance bottleneck.

In _energy_x, the added docstring block (around line 1235) describes the truncated-mean identity used in the new code path, but it's inserted into the existing _energy_x docstring rather than being clearly separated from the original parameter/return docs. This makes the docstring structurally confusing — the formula section appears before the "Parameters" block, which is fine, but it references _truncated_mean and cdf without explaining that this only triggers when both are exact capabilities, which is important context for subclass implementors.

The zero_mass guard in the ppf-based approximation (np.abs(cdf_u - cdf_l) < 1e-15) correctly handles degenerate intervals, but the same guard is absent in the _energy_x shortcut path — if a point x lies exactly at a probability-0 region boundary, right_mean or left_mean could return nan and the nan_to_num(..., nan=0.0) silently masks what might be a meaningful divergence rather than a true zero-contribution interval.

Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

Can you please add a test to TestAllDistributions that checks that the method works? I think the method just has to be added to the right places.

Comment thread skpro/distributions/base/_base.py Outdated
if spl_arr.ndim == 1:
spl_arr = spl_arr.reshape(-1, 1, 1)
elif spl_arr.ndim == 2:
spl_arr = spl_arr.reshape(spl_arr.shape[0], 1, -1)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this correct? or is it a mistake?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, those reshape branches are dead code on my end. sample(1) always returns a DataFrame, so .values gives 2D, stacking gives 3D neither ndim == 1 nor ndim == 2 ever fires. I added them as defensive guards but they're unnecessary. Happy to remove them.

@patelchaitany patelchaitany force-pushed the feature/truncated-mean-interface branch from dde2b4d to da87e95 Compare March 28, 2026 07:54
Signed-off-by: Chaitany patel <patelchaitany93@gmail.com>
@patelchaitany patelchaitany force-pushed the feature/truncated-mean-interface branch from da87e95 to 4c21cf3 Compare March 28, 2026 07:59
@patelchaitany
Copy link
Copy Markdown
Contributor Author

@fkiraly, I have added truncated_mean to METHODS_SCALAR plus two new tests one checks that unbounded truncated_mean() matches mean(), the other verifies output format with actual bounds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants