Skip to content

support vLLM >=0.11.0 (V1 engine) for better performance#1640

Merged
aluminumbox merged 2 commits intoFunAudioLLM:mainfrom
Jzz1943:main
Dec 31, 2025
Merged

support vLLM >=0.11.0 (V1 engine) for better performance#1640
aluminumbox merged 2 commits intoFunAudioLLM:mainfrom
Jzz1943:main

Conversation

@Jzz1943
Copy link
Copy Markdown
Contributor

@Jzz1943 Jzz1943 commented Nov 10, 2025

Support running CosyVoice2 inference with vLLM 0.11.0(V1 engine only) for better performance.
image
Under the same conditions, compared with vLLM 0.9.0 (V0 engine), the first-chunk latency for inference with vLLM 0.11.0 (V1 engine) is reduced by approximately 15+ ms. Additionally, the first-chunk latency is more stable, with much smaller fluctuations than the V0 engine.

@Jzz1943 Jzz1943 changed the title support vLLM >=0.11.0 (V1 engine only) support vLLM >=0.11.0 (V1 engine) for better performance Nov 13, 2025
ayutaz pushed a commit to ayutaz/CosyVoice that referenced this pull request Dec 10, 2025
Upstream improvements from FunAudioLLM/CosyVoice:

- PR FunAudioLLM#1640: Support vLLM 0.11.0+ (V1 engine) for better performance
  - First-chunk latency reduced by ~15ms
  - More stable latency with smaller fluctuations
  - Backward compatible with vLLM 0.9.0

- PR FunAudioLLM#1129: Add limited support for MPS devices (Apple Silicon)
  - Enables partial compatibility with M1/M2/M3/M4 Macs
  - Auto-enables JIT on MPS for better performance
  - ONNX models fall back to CPU (ONNX Runtime limitation)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@aluminumbox
Copy link
Copy Markdown
Collaborator

thanks for the update, will merge it soon

@aluminumbox aluminumbox merged commit 8811772 into FunAudioLLM:main Dec 31, 2025
oneliey pushed a commit to oneliey/CosyVoice that referenced this pull request Jan 11, 2026
Upstream improvements from FunAudioLLM/CosyVoice:

- PR FunAudioLLM#1640: Support vLLM 0.11.0+ (V1 engine) for better performance
  - First-chunk latency reduced by ~15ms
  - More stable latency with smaller fluctuations
  - Backward compatible with vLLM 0.9.0

- PR FunAudioLLM#1129: Add limited support for MPS devices (Apple Silicon)
  - Enables partial compatibility with M1/M2/M3/M4 Macs
  - Auto-enables JIT on MPS for better performance
  - ONNX models fall back to CPU (ONNX Runtime limitation)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants