Skip to content

Add method to facilitate tabulation of multiple performance evaluations.#1045

Open
ablaom wants to merge 3 commits intodevfrom
describe
Open

Add method to facilitate tabulation of multiple performance evaluations.#1045
ablaom wants to merge 3 commits intodevfrom
describe

Conversation

@ablaom
Copy link
Copy Markdown
Member

@ablaom ablaom commented Apr 5, 2026

This PR is a follow up for the enhancements at #1034.

This PR extends DataAPI.describe (also extended by DataFrames) to summarise an MLJ performance evaluation, as a named tuple. Multiple evaluations can then be combined in a table:

using MLJ
X, y = @load_iris # a vector and a table

# instantiate two models:
knn = (@load KNNClassifier pkg=NearestNeighborModels)()
tree = (@load DecisionTreeClassifier pkg=DecisionTree)()

named_models = [
    "Dummy" => ConstantClassifier(),  # a built-in model
    "K-nearest neighbors" => knn,
    "Decision Tree" => tree,
]
performance_evaluations = evaluate(named_models, X, y; measures=[brier_score, accuracy])
julia> describe(performance_evaluations[1])
(tag = "Dummy", BrierScore = -0.573 ± 0.1, Accuracy = 0.33 ± 0.23)

table = describe.(performance_evaluations)
julia> pretty(table)
┌─────────────────────┬──────────────────────┬──────────────────────┐
│ tag                 │ BrierScore           │ Accuracy             │
│ String              │ Measurement{Float64} │ Measurement{Float64} │
│ Textual             │ Continuous           │ Continuous           │
├─────────────────────┼──────────────────────┼──────────────────────┤
│ Dummy               │ -0.573±0.1           │ 0.33±0.23            │
│ K-nearest neighbors │ -0.21±0.21           │ 0.92±0.18            │
│ Decision Tree       │ -0.00118977±0.0      │ 1.0±0.0              │
└─────────────────────┴──────────────────────┴──────────────────────┘

I'm not wedded to the name describe. Maybe better is summarize but I don't know other examples of summarize in the ML ecosystem and I was reluctant to add yet another method to the namespace. Thoughts or suggestions welcome.

cc @OkonSamuel @LucasMatSP

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.08%. Comparing base (be6d92f) to head (24d7d65).

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1045      +/-   ##
==========================================
- Coverage   90.26%   89.08%   -1.19%     
==========================================
  Files          33       34       +1     
  Lines        2600     2574      -26     
==========================================
- Hits         2347     2293      -54     
- Misses        253      281      +28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tylerjthomas9
Copy link
Copy Markdown

The describe name is more intuitive and consistent with other similar naming patterns I've seen.

One small thing I noticed is that individualize iterates over the input twice. In practice, I don’t think it will cause issues here because nobody would feed in an IO stream or other non-re-iterable collection, but the docstring could be tweaked to reflect this

Overall, it looks great to me.

@LucasMatSP
Copy link
Copy Markdown
Collaborator

LucasMatSP commented Apr 6, 2026

Personally, I prefer summarize, but I don't have a strong opinion about this. The feature is nice

@ablaom
Copy link
Copy Markdown
Member Author

ablaom commented Apr 6, 2026

Thanks for the input! And thank you kindly @tylerjthomas9 for the review.

One small thing I noticed is that individualize iterates over the input twice.

Good catch. I have tweaked the docstring. But I also realised we already have a similar helper method in the pipeline code, called individuate. Therefore I will refactor this and move the method to "src/utilities.jl".

Regarding the name, I'm inclining towards describe, as I couldn't find summarize and Base.summary doesn't apply. But two other alternatives I thought of are:

  • overload NameTuple(::AbstractPerformanceEvaluation)
  • report, which we already have for machines

Does anyone like these better than describe?

@tylerjthomas9
Copy link
Copy Markdown

Regarding the name, I'm inclining towards describe, as I couldn't find summarize and Base.summary doesn't apply. But two other alternatives I thought of are:

  • overload NameTuple(::AbstractPerformanceEvaluation)

  • report, which we already have for machines

Does anyone like these better than describe?

I am a fan of describe, but I may be alone on this one. It's intuitive to me and fits with other libraries like DataFrames.jl's describe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants