Hi, thank you for the great work and code release.
I noticed that in your code, the agent is pretrained on the walker-walk task for 10 million steps. However, based on the evaluation rewards, it seems that the agent learns well before reaching 10M steps.
So I tried performing fine-tuning after pretraining for only 500k steps, but the performance was significantly worse.
Is there a reason why fine-tuning works better after the full 10M step pretraining?
Is it feasible to use a model pretrained for fewer steps (e.g., 1M or 2M) for fine-tuning without a significant drop in downstream performance? Or is the full pretraining necessary for good transferability?
Also, would it be possible to get access to the pretrained model weights?
Thanks in advance!

Hi, thank you for the great work and code release.
I noticed that in your code, the agent is pretrained on the walker-walk task for 10 million steps. However, based on the evaluation rewards, it seems that the agent learns well before reaching 10M steps.
So I tried performing fine-tuning after pretraining for only 500k steps, but the performance was significantly worse.
Is there a reason why fine-tuning works better after the full 10M step pretraining?
Is it feasible to use a model pretrained for fewer steps (e.g., 1M or 2M) for fine-tuning without a significant drop in downstream performance? Or is the full pretraining necessary for good transferability?
Also, would it be possible to get access to the pretrained model weights?
Thanks in advance!