Skip to content

Init GLM-4-Voice into utterance_metrics #46

Open
Stanwang1210 wants to merge 2 commits intowavlab-speech:mainfrom
Stanwang1210:glm_4_voice
Open

Init GLM-4-Voice into utterance_metrics #46
Stanwang1210 wants to merge 2 commits intowavlab-speech:mainfrom
Stanwang1210:glm_4_voice

Conversation

@Stanwang1210
Copy link

🚀 This PR integrates GLM-4-Voice into the evaluation pipeline

The implementation follows the structure used in qwen2_audio.py, ensuring consistency with other audio model integrations.

✅ Key Features
• Preprocessing and tokenization of audio and text inputs using GLM-4-Voice’s tokenizer
• Maintains compatibility with the existing evaluation framework

⚠️ Known Issues
• Tokenization Pipeline: The current audio tokenization step may need further verification for correctness and completeness.
• Unexpected Output: The model is currently generating outputs like:
“I’m sorry, I can’t listen to or analyze audio. However, if you can transcribe the audio for me, I can help you count the number of distinct speakers. Let me know if you need assistance with that!”

It’s unclear whether this is caused by:
• A bug in the integration (e.g., incorrect formatting or tokenization)
• Or a limitation of the model’s current capabilities

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant