Init GLM-4-Voice into utterance_metrics by Stanwang1210 · Pull Request #46 · wavlab-speech/versa

Stanwang1210 · 2025-06-24T09:13:06Z

🚀 This PR integrates GLM-4-Voice into the evaluation pipeline

The implementation follows the structure used in qwen2_audio.py, ensuring consistency with other audio model integrations.

✅ Key Features
• Preprocessing and tokenization of audio and text inputs using GLM-4-Voice’s tokenizer
• Maintains compatibility with the existing evaluation framework

⚠️ Known Issues
• Tokenization Pipeline: The current audio tokenization step may need further verification for correctness and completeness.
• Unexpected Output: The model is currently generating outputs like:
“I’m sorry, I can’t listen to or analyze audio. However, if you can transcribe the audio for me, I can help you count the number of distinct speakers. Let me know if you need assistance with that!”

It’s unclear whether this is caused by:
• A bug in the integration (e.g., incorrect formatting or tokenization)
• Or a limitation of the model’s current capabilities

init glm 4 voice

e12bb6a

Stanwang1210 force-pushed the glm_4_voice branch from a7bb4b8 to e12bb6a Compare November 25, 2025 23:56

add dependency instruction

cdd5b9d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Init GLM-4-Voice into utterance_metrics #46

Init GLM-4-Voice into utterance_metrics #46
Stanwang1210 wants to merge 2 commits intowavlab-speech:mainfrom
Stanwang1210:glm_4_voice

Stanwang1210 commented Jun 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Stanwang1210 commented Jun 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant