Skip to content

ENH: implement pooled score feature#76

Open
FirePheonix wants to merge 8 commits intopysal:mainfrom
FirePheonix:pooled_score_
Open

ENH: implement pooled score feature#76
FirePheonix wants to merge 8 commits intopysal:mainfrom
FirePheonix:pooled_score_

Conversation

@FirePheonix
Copy link
Copy Markdown
Contributor

I realized there was a todo with pooled score requirement. This PR implements the same.

# TODO: score_ should be an alias of pooled_score_ - this is different from MGWR

@FirePheonix
Copy link
Copy Markdown
Contributor Author

If any test cases are to be added for this, or some more changes like other models are to be added with this, then please let me know, i'll do immediately..

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 25, 2026

Codecov Report

❌ Patch coverage is 96.22642% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.11%. Comparing base (5b8382d) to head (36fb0f2).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
gwlearn/linear_model.py 92.85% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #76      +/-   ##
==========================================
+ Coverage   91.36%   93.11%   +1.75%     
==========================================
  Files           6        6              
  Lines         799      872      +73     
==========================================
+ Hits          730      812      +82     
+ Misses         69       60       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@martinfleis martinfleis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR titles are used in release notes. Can you have a look how the titles of merged PRs look like and adjust yours, so I don't have to?

Comment thread gwlearn/ensemble.py Outdated
Comment on lines +313 to +317
if self.oob_y_pooled_.size == 0 or self.oob_pred_pooled_.size == 0:
return float("nan")
y_true = self.oob_y_pooled_.ravel()
y_pred = self.oob_pred_pooled_.ravel()
return (y_true == y_pred).mean()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use sklearn.metrics.acccuracy, do not reimplement.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you're right.. changed in latest commit.

Comment thread gwlearn/ensemble.py Outdated
Comment on lines +835 to +841
if len(self.oob_y_pooled_) == 0:
return float("nan")
y_true = self.oob_y_pooled_.ravel()
y_pred = self.oob_pred_pooled_.ravel()
ss_res = ((y_true - y_pred) ** 2).sum()
ss_tot = ((y_true - y_true.mean()) ** 2).sum()
return 1 - ss_res / ss_tot if ss_tot != 0 else float("nan")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use sklearn.metrics.r2_score, do not reimplement. I don't want to think if this is correct or not. Minimise maintenance burden.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above.

Comment thread gwlearn/ensemble.py Outdated
R² computed from all out-of-bag predictions pooled together.
"""
if len(self.oob_y_pooled_) == 0:
return float("nan")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return float("nan")
return np.nan

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in latest commit.

@FirePheonix FirePheonix changed the title pooled-score-feature ENH: implement pooled score feature Jan 25, 2026
@FirePheonix
Copy link
Copy Markdown
Contributor Author

may I add tests for this as well? asking because then codecov won't fail. I can make a file named test_pooled_score.py.

Copy link
Copy Markdown
Member

@martinfleis martinfleis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thinking whether this is actually wise or not. We can pool linear models, we can pool OOB values from RF but there's not much we can do about gradient boosting for example. And other models. While we can always leave out focal and alias score_ to focal score. I am not sure what is the best but would rather not rush to get something in before thinking it properly through. What is your take?

Comment thread gwlearn/linear_model.py Outdated
Comment on lines +519 to +521
# Store pooled y for score computation
self.y_pooled_ = y.values
self.pred_pooled_ = self.pred_.values
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. This not not pooled y, this is focal y and focal prediction.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread gwlearn/ensemble.py Outdated
Comment on lines +838 to +839
y_true = self.oob_y_pooled_.ravel()
y_pred = self.oob_pred_pooled_.ravel()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need ravel() here? How come? This is not flat array?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're rigth, i'll remove .ravel() from oob_pooled_score and move it to _get_oob_score_data.

@FirePheonix
Copy link
Copy Markdown
Contributor Author

FirePheonix commented Feb 8, 2026

I thinking whether this is actually wise or not. We can pool linear models, we can pool OOB values from RF but there's not much we can do about gradient boosting for example. And other models. While we can always leave out focal and alias score_ to focal score. I am not sure what is the best but would rather not rush to get something in before thinking it properly through. What is your take?

Yeah, pooling works well for linear models (local y + predictions) and Random Forest (OOB values), but there's no natural mechanism for Gradient Boosting.
So I think we can leave pooled_score_ out for models without proper pooling support rather than aliasing to focal score - less confusing that way.

@FirePheonix
Copy link
Copy Markdown
Contributor Author

i implemented it for linear models (local y + predictions) and Random Forest, with some tests..
I think, let's let this PR be for now (some other might use this as a reference..), will wait for a proper discussion on this, on what to do about Gradient Boosting and on before moving any forward..

@jigyasaba
Copy link
Copy Markdown
Contributor

Thanks for linking the earlier discussion. I’ll step back from this for now and follow the design discussion first.
Happy to help once the direction is clearer.

@FirePheonix
Copy link
Copy Markdown
Contributor Author

please lemme know if any updates on this?

@martinfleis
Copy link
Copy Markdown
Member

I am simply not convinced that this is the right thing to do, despite the to-do I left in the comment in the code. Both self.y_pooled_ and self.pred_pooled_ are exposed to user so passing them to sklearn function to get accuracy is trivial. The only reason to wrap it would be to have score_ available but again, I am just not sure this is what should be there as it is inconsistent across the models. Would rather have users doing these steps explicitly and knowingly themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants