- We are about to update a version of the paper that introduces resource usage. Overall, the pre-training stage uses 8 machines with 8 GPU cards each (64 GPUs total).
- Stage 1 involves training the VGM (WAN 2.2-5B) only. We have not open-sourced the training code for this part. We have not open-sourced the training code for the VAE part of Stage 2, but the code for Motus Stage 2 and subsequent downstream task fine-tuning is the same - you just need to prepare latent action data and configure Motus/configs/lerobot.yaml.
Originally posted by @HongzheBi in #9
Originally posted by @HongzheBi in #9