支持packing算法，提高训练速度 #1

JasonCZH4 · 2026-02-07T14:50:56Z

No description provided.

Removed commented-out SFTDataset class and its methods.

wxhcore · 2026-02-08T10:24:57Z

@JasonCZH4 Thank you very much for submitting a PR to this project! I think adding the sequence packing feature is a great optimization that can significantly improve GPU utilization and training efficiency. After reviewing the newly added code, I have the following suggestions:

Several utility functions have been added in dataset.py; could we move these utility functions into a new file for unified management?
I notice that only SFTDataset has been modified. Considering potential future support for other dataset types, I suggest using inheritance or the strategy pattern for extensibility. Alternatively, if you have a better approach, you could also refer to the implementation in supervised.py from LLaMA-Factory.
Actually, your current code doesn’t work correctly: in base_trainer.py, the loss computation explicitly passes input_ids and attention_mask, but your code doesn’t pass position_ids to the model, so it may need to be changed to pass the batch using **batch.
I see you’ve added two new parameters, so the corresponding YAML config and argparse setup likely need to be updated to ensure the parameters are passed correctly.
Since I haven’t integrated CI on GitHub yet, I’ll need you to run the existing tests and add unit or functional tests for the new feature to ensure everything works as expected.

Finally, thank you again for contributing to this project!

JasonCZH4 · 2026-02-08T10:31:54Z

Thanks for reviewing! It is my first time to submit PR. There are still many issues with the current code, and I am working hard to fix them in the coming time. In fact, I am currently conducting many tests. Once everything is OK, I will let you know. Thanks again!

wxhcore · 2026-02-08T10:38:01Z

Thanks for reviewing! It is my first time to submit PR. There are still many issues with the current code, and I am working hard to fix them in the coming time. In fact, I am currently conducting many tests. Once everything is OK, I will let you know. Thanks again!

Thank you very much for your efforts! This is also BumbleCore's first PR, so I attach great importance to it. Wish you all the best!

JasonCZH4 and others added 2 commits February 7, 2026 14:47

支持packing算法，提高训练速度

6f54f60

Remove commented-out SFTDataset class

5ea95d3

Removed commented-out SFTDataset class and its methods.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

支持packing算法，提高训练速度 #1

支持packing算法，提高训练速度 #1

JasonCZH4 commented Feb 7, 2026

Uh oh!

wxhcore commented Feb 8, 2026

Uh oh!

JasonCZH4 commented Feb 8, 2026

Uh oh!

wxhcore commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

支持packing算法，提高训练速度 #1

Are you sure you want to change the base?

支持packing算法，提高训练速度 #1

Conversation

JasonCZH4 commented Feb 7, 2026

Uh oh!

wxhcore commented Feb 8, 2026

Uh oh!

JasonCZH4 commented Feb 8, 2026

Uh oh!

wxhcore commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants