For Common Voice, download from: https://commonvoice.mozilla.org/en/datasets
Since some audio files in Common Voice are broken, you can use validated_common_voice.py to obtain validated ones. Make sure to replace root_dir, language, and split in the python file.
For NTUML2021, download from: https://huggingface.co/datasets/ky552/ML2021_ASR_ST
For Fisher, download from: https://catalog.ldc.upenn.edu/LDC2010S01
It is recommended to build a Python-3.10 virtual environment using conda
conda create --name csstllm python=3.10 -y
conda activate csstllm
cd xtuner
pip install -e '.[all]'
pip install -U openai-whisper
pip install evaluate
pip install sacrebleu
pip install jiwer==3.1.0
pip install peft==0.12.0
pip install torch==2.4.0
pip install torchvision==0.19.0
pip install datasets==2.21.0
pip install librosa==0.11.0 soundfile==0.13.0
pip install deepspeed==0.17.4Taking NTUML2021 as a example
NPROC_PER_NODE=4 xtuner train workspace/9b_llama3_chat_stage1_ntuml.py --deepspeed deepspeed_zero2
NPROC_PER_NODE=4 xtuner train workspace/9b_llama3_chat_stage2_ntuml.py --deepspeed deepspeed_zero2
NPROC_PER_NODE=4 xtuner train workspace/9b_llama3_chat_stage3_ntuml.py --deepspeed deepspeed_zero2
NPROC_PER_NODE=4 xtuner train workspace/9b_llama3_chat_stage4_ntuml.py --deepspeed deepspeed_zero2Make sure to replace root_dir in the python file.
NPROC_PER_NODE=4 xtuner test workspace/9b_llama3_chat_stage4_ntuml.py --checkpoint work_dir/9b_llama3_chat_stage4_ntuml/epoch_1.pth/mp_rank_00_model_states.ptXTuner: the codebase we built upon. We greatly appreciate the excellent foundation provided by the authors.
You can refer to XTuner for more detailed information.
The extended version with appendices is available on arXiv.
If you find this repository helpful for your research, please consider citing:
@article{gao2025towards,
title={Towards Fine-Grained Code-Switch Speech Translation with Semantic Space Alignment},
author={Gao, Yan and Yang, Yazheng and Lan, Zhibin and Chen, Yidong and Zhang, Min and Wei, Daimeng and Wong, Derek F and Su, Jinsong},
journal={arXiv preprint arXiv:2511.10670},
year={2025}
}