Skip to content

How has the behavior cloning data been generated ? #14

@subhojitdas

Description

@subhojitdas

Hi ,

I am able to run the GRPO training from Train/verifier/script.py
According to the WebAgentR1 paper , the GRPO training is done on the data generated by behavor cloning.

Can you please refer me in the code , where and how the behavior cloning data has been generated ?

Thanks,
Subhojit

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions