Hi ,
I am able to run the GRPO training from Train/verifier/script.py
According to the WebAgentR1 paper , the GRPO training is done on the data generated by behavor cloning.
Can you please refer me in the code , where and how the behavior cloning data has been generated ?
Thanks,
Subhojit