Dataset

For Common Voice, download from: https://commonvoice.mozilla.org/en/datasets

Since some audio files in Common Voice are broken, you can use validated_common_voice.py to obtain validated ones. Make sure to replace root_dir, language, and split in the python file.

For NTUML2021, download from: https://huggingface.co/datasets/ky552/ML2021_ASR_ST

For Fisher, download from: https://catalog.ldc.upenn.edu/LDC2010S01

Installation

It is recommended to build a Python-3.10 virtual environment using conda

conda create --name csstllm python=3.10 -y
conda activate csstllm
cd xtuner
pip install -e '.[all]'
pip install -U openai-whisper
pip install evaluate
pip install sacrebleu
pip install jiwer==3.1.0
pip install peft==0.12.0
pip install torch==2.4.0
pip install torchvision==0.19.0
pip install datasets==2.21.0
pip install librosa==0.11.0 soundfile==0.13.0
pip install deepspeed==0.17.4

Training

Taking NTUML2021 as a example

NPROC_PER_NODE=4 xtuner train workspace/9b_llama3_chat_stage1_ntuml.py --deepspeed deepspeed_zero2
NPROC_PER_NODE=4 xtuner train workspace/9b_llama3_chat_stage2_ntuml.py --deepspeed deepspeed_zero2
NPROC_PER_NODE=4 xtuner train workspace/9b_llama3_chat_stage3_ntuml.py --deepspeed deepspeed_zero2
NPROC_PER_NODE=4 xtuner train workspace/9b_llama3_chat_stage4_ntuml.py --deepspeed deepspeed_zero2

Make sure to replace root_dir in the python file.

Evaluation

NPROC_PER_NODE=4 xtuner test workspace/9b_llama3_chat_stage4_ntuml.py --checkpoint work_dir/9b_llama3_chat_stage4_ntuml/epoch_1.pth/mp_rank_00_model_states.pt

Acknowledgement

XTuner: the codebase we built upon. We greatly appreciate the excellent foundation provided by the authors.

You can refer to XTuner for more detailed information.

Citation

The extended version with appendices is available on arXiv.

If you find this repository helpful for your research, please consider citing:

@article{gao2025towards,
      title={Towards Fine-Grained Code-Switch Speech Translation with Semantic Space Alignment}, 
      author={Gao, Yan and Yang, Yazheng and Lan, Zhibin and Chen, Yidong and Zhang, Min and Wei, Daimeng and Wong, Derek F and Su, Jinsong},
      journal={arXiv preprint arXiv:2511.10670},
      year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
workspace		workspace
xtuner		xtuner
LICENSE		LICENSE
README.md		README.md
validated_common_voice.py		validated_common_voice.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dataset

Installation

Training

Evaluation

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dataset

Installation

Training

Evaluation

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages