Create a conda environment with Python 3.13:
conda create -n meco python=3.13
conda activate mecoInstall torchtune as a submodule:
git submodule update --init --recursive
cd torchtune
pip install -e .
cd ../Install required dependencies:
pip install -r requirement_list.txtmkdir ckpt
cd ckpt
git clone https://huggingface.co/robinwitch/MECo_BEAT2_2_RVQVAE
git clone https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct
git clone https://huggingface.co/robinwitch/MECo_BEAT2_2_qwen2.5_0.5b_stage1
git clone https://huggingface.co/robinwitch/MECo_BEAT2_2_qwen2.5_0.5b_stage2
git clone https://huggingface.co/robinwitch/MECo_BEAT2_2_qwen2.5_0.5b_stage3
cd ../Download origin BEAT2 dataset:
huggingface-cli download H-Liu1997/BEAT2 \
--repo-type dataset \
--local-dir ./dataset/BEAT2 \
--include "beat_english_v2.0.0/*"Download additional dataset preprocessing model from Google Drive:
gdown https://drive.google.com/drive/folders/1MCks7CMNBtAzU2XihYezNmiGT_6pWex8?usp=drive_link -O ./dataset/hub --folderDownload tokenized BEAT2 dataset:
git clone https://huggingface.co/datasets/robinwitch/meco_mhubert1000_beat2_2 dataset/meco_mhubert1000_beat2_2mkdir ckpt/mhubert_base_1000
wget -P ckpt/mhubert_base_1000 https://dl.fbaipublicfiles.com/hubert/mhubert_base_vp_en_es_fr_it3.pt
wget -P ckpt/mhubert_base_1000 https://dl.fbaipublicfiles.com/hubert/mhubert_base_vp_en_es_fr_it3_L11_km1000.binClone fairseq repository:
mkdir refer
git clone https://github.com/facebookresearch/fairseq.git refer/fairseqWhen using fairseq with Python 3.13, you may encounter a ValueError related to mutable defaults. To fix this issue:
- Locate your Python dataclasses file:
/path/to/your/env/lib/python3.13/dataclasses.py - Comment out lines 859-861:
# if f._field_type is _FIELD and f.default.__class__.__hash__ is None:
# raise ValueError(f'mutable default {type(f.default)} for field '
# f'{f.name} is not allowed: use default_factory')Follow these steps to train the model:
python scripts_vqvae/0_get_mean_std.pypython scripts_vqvae/1_train_vqvae.pypython scripts/0_build_qwen2.5_0.5b.pypython scripts/1_detied_embedding.pypython scripts/2_build_dataset_mhubert1000.pyExecute the three training commands in scripts/3_train_llm.md sequentially to complete the three-stage training of the LLM.
python scripts/4_eval_gpt_decode_mhubert1000.pypython scripts/5_eval_gpt_decode_mhubert1000_withprompt.pyFollowing EMAGE, you can download SMPLX blender addon, and install it in your blender 3.x or 4.x. Click the button Add Animation to visualize the generated smplx file (like xxx.npz).
Due to the inherent diversity of GPT model inference:
- Stage 2 (Audio-only generation): FID scores typically range from 0.32 to 0.36
- Stage 3 (With motion example control): FID scores typically range from 0.27 to 0.29
If you find this work useful, please consider citing:
@inproceedings{chen2025meco,
author = {Bohong Chen and Yumeng Li and Youyi Zheng and Yao-Xiang Ding and Kun Zhou},
title = {Motion-example-controlled Co-speech Gesture Generation Leveraging Large Language Models},
year = {2025},
isbn = {9798400715402},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3721238.3730611},
doi = {10.1145/3721238.3730611},
booktitle = {Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers},
series = {SIGGRAPH Conference Papers '25}
}Thanks to EMAGE, torchtune, ichigo, T2M-GPT, MoMask, SynTalker, our code is partially borrowing from them. Please check these useful repos.