Skip to content

XMUDeepLIT/CSST-SSA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dataset

For Common Voice, download from: https://commonvoice.mozilla.org/en/datasets

Since some audio files in Common Voice are broken, you can use validated_common_voice.py to obtain validated ones. Make sure to replace root_dir, language, and split in the python file.

For NTUML2021, download from: https://huggingface.co/datasets/ky552/ML2021_ASR_ST

For Fisher, download from: https://catalog.ldc.upenn.edu/LDC2010S01

Installation

It is recommended to build a Python-3.10 virtual environment using conda

conda create --name csstllm python=3.10 -y
conda activate csstllm
cd xtuner
pip install -e '.[all]'
pip install -U openai-whisper
pip install evaluate
pip install sacrebleu
pip install jiwer==3.1.0
pip install peft==0.12.0
pip install torch==2.4.0
pip install torchvision==0.19.0
pip install datasets==2.21.0
pip install librosa==0.11.0 soundfile==0.13.0
pip install deepspeed==0.17.4

Training

Taking NTUML2021 as a example

NPROC_PER_NODE=4 xtuner train workspace/9b_llama3_chat_stage1_ntuml.py --deepspeed deepspeed_zero2
NPROC_PER_NODE=4 xtuner train workspace/9b_llama3_chat_stage2_ntuml.py --deepspeed deepspeed_zero2
NPROC_PER_NODE=4 xtuner train workspace/9b_llama3_chat_stage3_ntuml.py --deepspeed deepspeed_zero2
NPROC_PER_NODE=4 xtuner train workspace/9b_llama3_chat_stage4_ntuml.py --deepspeed deepspeed_zero2

Make sure to replace root_dir in the python file.

Evaluation

NPROC_PER_NODE=4 xtuner test workspace/9b_llama3_chat_stage4_ntuml.py --checkpoint work_dir/9b_llama3_chat_stage4_ntuml/epoch_1.pth/mp_rank_00_model_states.pt

Acknowledgement

XTuner: the codebase we built upon. We greatly appreciate the excellent foundation provided by the authors.

You can refer to XTuner for more detailed information.

Citation

The extended version with appendices is available on arXiv.

If you find this repository helpful for your research, please consider citing:

@article{gao2025towards,
      title={Towards Fine-Grained Code-Switch Speech Translation with Semantic Space Alignment}, 
      author={Gao, Yan and Yang, Yazheng and Lan, Zhibin and Chen, Yidong and Zhang, Min and Wei, Daimeng and Wong, Derek F and Su, Jinsong},
      journal={arXiv preprint arXiv:2511.10670},
      year={2025}
}

About

Code for "Towards Fine-Grained Code-Switch Speech Translation with Semantic Space Alignment" (IJCAI 2026 Main Track)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages