Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dev #1045 +/- ##
==========================================
- Coverage 90.26% 89.08% -1.19%
==========================================
Files 33 34 +1
Lines 2600 2574 -26
==========================================
- Hits 2347 2293 -54
- Misses 253 281 +28 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
The describe name is more intuitive and consistent with other similar naming patterns I've seen. One small thing I noticed is that individualize iterates over the input twice. In practice, I don’t think it will cause issues here because nobody would feed in an IO stream or other non-re-iterable collection, but the docstring could be tweaked to reflect this Overall, it looks great to me. |
|
Personally, I prefer |
|
Thanks for the input! And thank you kindly @tylerjthomas9 for the review.
Good catch. I have tweaked the docstring. But I also realised we already have a similar helper method in the pipeline code, called Regarding the name, I'm inclining towards
Does anyone like these better than |
I am a fan of describe, but I may be alone on this one. It's intuitive to me and fits with other libraries like DataFrames.jl's describe. |
This PR is a follow up for the enhancements at #1034.
This PR extends
DataAPI.describe(also extended by DataFrames) to summarise an MLJ performance evaluation, as a named tuple. Multiple evaluations can then be combined in a table:using MLJ X, y = @load_iris # a vector and a table # instantiate two models: knn = (@load KNNClassifier pkg=NearestNeighborModels)() tree = (@load DecisionTreeClassifier pkg=DecisionTree)() named_models = [ "Dummy" => ConstantClassifier(), # a built-in model "K-nearest neighbors" => knn, "Decision Tree" => tree, ] performance_evaluations = evaluate(named_models, X, y; measures=[brier_score, accuracy]) julia> describe(performance_evaluations[1]) (tag = "Dummy", BrierScore = -0.573 ± 0.1, Accuracy = 0.33 ± 0.23) table = describe.(performance_evaluations) julia> pretty(table) ┌─────────────────────┬──────────────────────┬──────────────────────┐ │ tag │ BrierScore │ Accuracy │ │ String │ Measurement{Float64} │ Measurement{Float64} │ │ Textual │ Continuous │ Continuous │ ├─────────────────────┼──────────────────────┼──────────────────────┤ │ Dummy │ -0.573±0.1 │ 0.33±0.23 │ │ K-nearest neighbors │ -0.21±0.21 │ 0.92±0.18 │ │ Decision Tree │ -0.00118977±0.0 │ 1.0±0.0 │ └─────────────────────┴──────────────────────┴──────────────────────┘I'm not wedded to the name
describe. Maybe better issummarizebut I don't know other examples ofsummarizein the ML ecosystem and I was reluctant to add yet another method to the namespace. Thoughts or suggestions welcome.cc @OkonSamuel @LucasMatSP