Add --no_balance flag to not balance datasets#287
Conversation
lauritowal
left a comment
There was a problem hiding this comment.
Run with elk elicit gpt2 imdb --no_balance True --disable_cache --max_examples 100 100 --num_gpus 1 --max_inlp_iter 4 and seems to work.
Added some comments though
| binarize: bool = False | ||
| """Whether to binarize the dataset labels for multi-class datasets.""" | ||
|
|
||
| no_balance: bool = False |
There was a problem hiding this comment.
Why not just make it
balance: bool = True ?
There was a problem hiding this comment.
That would also avoid having that:
balance=not cfg.no_balance
There was a problem hiding this comment.
Because it would be unclear how to use the flag to disable balancing from the CLA. --balance False or something is weirder than --no_balance
There was a problem hiding this comment.
--balance False does not seem weirder than --no_balance True to me.
But okay, it's fine for me
There was a problem hiding this comment.
Yeah I think I agree with you now
|
|
||
| if max_iter is not None: | ||
| d = min(d, max_iter) | ||
| max_iter = max_iter or d |
There was a problem hiding this comment.
That's just some refactoring which has nothing to do with the balancing I guesS?
There was a problem hiding this comment.
right, I also added a max_iter flag and this was a necessary refactoring
|
|
||
| def train_supervised( | ||
| data: dict[str, tuple], device: str, mode: str | ||
| data: dict[str, tuple], device: str, mode: str, max_inlp_iter: int | None = None |
There was a problem hiding this comment.
that's a new feature not related to the balancing either, right?
No description provided.