-
Notifications
You must be signed in to change notification settings - Fork 69
[GRPO] Make dataloader deterministic #609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
LGTM. can you verify that this makes the dataloader deterministic beyond the first epoch too? you can test it by truncating the dataset for example. |
| f"Dataset epoch {self._epoch - 1} completed. Starting epoch {self._epoch}" | ||
| ) | ||
| self._base_dataset.set_epoch(self._epoch) | ||
| self._base_dataset.set_epoch(self._epoch) # for determinism |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self._base_dataset.set_epoch(self._epoch) # for determinism | |
| self._base_dataset.set_epoch(self._epoch) |
| self._base_dataset = self._base_dataset.map(gsm8k_transform) | ||
| self._base_dataset = self._base_dataset.shuffle() | ||
| self._base_dataset = self._base_dataset.shuffle(seed=self.seed) | ||
| self._base_dataset.set_epoch(self._epoch) # for determinism |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self._base_dataset.set_epoch(self._epoch) # for determinism | |
| self._base_dataset.set_epoch(self._epoch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i suggest we remove the "for determinism". It does a bit more than that: it reshuffles on every epoch, and the epoch number works as a seed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please just remove the comments, thanks for adding the seed on shuffle
This will make the dataloader deliver the data in the same order each time.
Test Plan
Added a breakpoint, and printed out the

promptin pdb:Before,
promptwas different across runs; after this PRpromptis the same across runs.