feat: support average='binary' in precision_score()#2080
Conversation
bigframes/ml/metrics/_metrics.py
Outdated
| if y_true.drop_duplicates().count() != 2 or y_pred.drop_duplicates().count() != 2: | ||
| raise ValueError( | ||
| "Target is multiclass but average='binary'. Please choose another average setting." | ||
| ) | ||
|
|
||
| total_labels = set( | ||
| y_true.drop_duplicates().to_list() + y_pred.drop_duplicates().to_list() | ||
| ) |
There was a problem hiding this comment.
Should probably avoid drop_duplicates, it has overhead from trying to preserve ordering, try unique(keep_order=False) instead. Also try to minimize query count
There was a problem hiding this comment.
Code updated. This is the execution output: https://screenshot.googleplex.com/9aFGAUSHzuPDPtB.
It's weird that no query job links are provided.
bigframes/ml/metrics/_metrics.py
Outdated
| def _precision_score_binary_pos_only( | ||
| y_true: bpd.Series, y_pred: bpd.Series, pos_label: int | float | bool | str | ||
| ) -> float: | ||
| if y_true.drop_duplicates().count() != 2 or y_pred.drop_duplicates().count() != 2: |
There was a problem hiding this comment.
This may create extra queries with y_true.drop_duplicates().to_list() in line 340. We may want to merge them.
Can you take a look at how many queries are created when running this function?
There was a problem hiding this comment.
This is the result: https://screenshot.googleplex.com/9aFGAUSHzuPDPtB. it feels weird because no query jobs are printed out.
Fixes #437198383 🦕