How has the behavior cloning data been generated ?

Hi , 

I am able to run the GRPO training from Train/verifier/script.py
According to the WebAgentR1 paper , the GRPO training is done on the data generated by behavor cloning.

Can you please refer me in the code , where and how the behavior cloning data has been generated ?

Thanks,
Subhojit