Hi! Thanks for your attention and enthusiasm.

The author said, (Note: dividing dt can make training unstable on some flagship models.). However, it might need more experiments to validate it.

From my experience, I think grpo-guard is overclaimed. So, I'm still settling for original GRPO loss, which works quite well. Maybe I'm wrong. We need more experiments to validate, as you said.

For combination of flow-cps and GRPO-Guard, you could actually refer to yifan123/flow_grpo#192 (comment)

Thanks for pointing this out! I'll look into this.

Also, I strongly suggest to make the reward computation more flexible than flow-grpo. For example, UnifiedReward-Think now is the most powerful open-source reward model that I have ever tested, it could get better results than any other open source models. Please check Pref-GRPO to see its good performance. But it is a pair-wise preference reward model, it can only be computed after whole epoch sampling process. And other point-wise rewards could be computed after very small mini-batch.

I know Pref-GRPO and also ran some experiments. My impression was that it's way too slow! Since it requires pair-wise computation among the whole group, resulting in quadratic time-complexity. But anyway, thanks for pointing this out and I will try to make it compatible. As far as I know, there are some other algorithms like Branch-GRPO/Tree-GRPO, I don't know how to add them yet. From my experience, Flow-GRPO-Fast/Mix-GRPO's idea provides the best performance and training speed, that's why I prioritize it.

And for very long view, the following method which do RL combined with distillation could be added:
Decoupled-DMD: The Acceleration Magic Behind Z-Image
DMDR: Fusing DMD with Reinforcement Learning

Great! I can try to add them.

After all, you guys are warmly welcomed to bring up any issues or create PRs for contribution.🤗 Thanks again.❤️

PS: About naming Flow-Factory, it is indeed named after Llama-factory, which is a very good repo. Also, ff is quite easy to type and makes the training launching easier, Haha.😃

👀 #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions