Skip to content

求问Z-image-turbo在grpo(lora)时的train/ratio_min=1 train/ratio_max=1 train/ratio_mean=1 train/ratio_std=0问题 #157

@hanbaobao950123

Description

@hanbaobao950123

配置文件如下:

default.yaml

Image Image

训练了400多steps,上图1中ratio一直为1,求问这种情况表示策略没有更新的,但是train_reward如下图2中确实在上升,似乎模型是在更新变好的,求问这是为什么呢?调试的时候是只需要监控train_reward的趋势吗?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions