Skip to content

Conversation

@JasonCZH4
Copy link

No description provided.

JasonCZH4 and others added 2 commits February 7, 2026 14:47
Removed commented-out SFTDataset class and its methods.
@wxhcore
Copy link
Owner

wxhcore commented Feb 8, 2026

@JasonCZH4 Thank you very much for submitting a PR to this project! I think adding the sequence packing feature is a great optimization that can significantly improve GPU utilization and training efficiency. After reviewing the newly added code, I have the following suggestions:

  1. Several utility functions have been added in dataset.py; could we move these utility functions into a new file for unified management?

  2. I notice that only SFTDataset has been modified. Considering potential future support for other dataset types, I suggest using inheritance or the strategy pattern for extensibility. Alternatively, if you have a better approach, you could also refer to the implementation in supervised.py from LLaMA-Factory.

  3. Actually, your current code doesn’t work correctly: in base_trainer.py, the loss computation explicitly passes input_ids and attention_mask, but your code doesn’t pass position_ids to the model, so it may need to be changed to pass the batch using **batch.

  4. I see you’ve added two new parameters, so the corresponding YAML config and argparse setup likely need to be updated to ensure the parameters are passed correctly.

  5. Since I haven’t integrated CI on GitHub yet, I’ll need you to run the existing tests and add unit or functional tests for the new feature to ensure everything works as expected.

Finally, thank you again for contributing to this project!

@JasonCZH4
Copy link
Author

Thanks for reviewing! It is my first time to submit PR. There are still many issues with the current code, and I am working hard to fix them in the coming time. In fact, I am currently conducting many tests. Once everything is OK, I will let you know. Thanks again!

@wxhcore
Copy link
Owner

wxhcore commented Feb 8, 2026

Thanks for reviewing! It is my first time to submit PR. There are still many issues with the current code, and I am working hard to fix them in the coming time. In fact, I am currently conducting many tests. Once everything is OK, I will let you know. Thanks again!

Thank you very much for your efforts! This is also BumbleCore's first PR, so I attach great importance to it. Wish you all the best!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants