-
-
Notifications
You must be signed in to change notification settings - Fork 16.1k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[KV Transfer] Add MooncakeStoreConnector for KV cache offloading via Mooncake distributed store
documentation
Improvements or additions to documentation
kv-connector
v1
#40900
opened Apr 26, 2026 by
LCAIZJ
Contributor
Loading…
DeepSeek V4 support on SM12x with Triton sparse MLA fallback
ci/build
deepseek
Related to DeepSeek models
documentation
Improvements or additions to documentation
frontend
gpt-oss
Related to GPT-OSS models
kv-connector
new-model
Requests to new models
nvidia
speculative-decoding
tool-calling
v1
#40899
opened Apr 26, 2026 by
jasl
Contributor
Loading…
[Spec Decode] Add Sliding Window Attention support to DFlash drafter
qwen
Related to Qwen models
speculative-decoding
v1
#40898
opened Apr 26, 2026 by
jianc99
Loading…
[Metrics] Export parallel config info
v1
#40895
opened Apr 26, 2026 by
voipmonitor
Contributor
•
Draft
feat: integrate Related to DeepSeek models
needs-rebase
qwen
Related to Qwen models
tool-calling
builtin_structural_tag to support more models' tool-calling.
deepseek
#40894
opened Apr 26, 2026 by
Seven-Streams
Loading…
3 of 5 tasks
[Bugfix] Size FlashInfer NVLink MNNVL workspace to EP group
bug
Something isn't working
deepseek
Related to DeepSeek models
kv-connector
ready
ONLY add when PR is ready to merge/full CI is needed
tool-calling
#40893
opened Apr 26, 2026 by
Dao007forever
Loading…
[ROCm][DSv4] Make AITER sparse MLA decode cudagraph-clean (follow-up to #40889)
ci/build
deepseek
Related to DeepSeek models
documentation
Improvements or additions to documentation
gpt-oss
Related to GPT-OSS models
kv-connector
needs-rebase
new-model
Requests to new models
nvidia
performance
Performance-related issues
rocm
Related to AMD ROCm
speculative-decoding
tool-calling
v1
#40892
opened Apr 26, 2026 by
ChuanLi1101
Collaborator
Loading…
[Core] Avoid using extra thread in
UniProcExecutor
v1
#40891
opened Apr 26, 2026 by
njhill
Member
Loading…
[ROCm] Add AITER-accelerated MLA decode for DeepSeek V4 on MI355X
ci/build
deepseek
Related to DeepSeek models
documentation
Improvements or additions to documentation
gpt-oss
Related to GPT-OSS models
kv-connector
needs-rebase
new-model
Requests to new models
nvidia
performance
Performance-related issues
rocm
Related to AMD ROCm
speculative-decoding
tool-calling
v1
#40889
opened Apr 25, 2026 by
ChuanLi1101
Collaborator
Loading…
3 of 5 tasks
[Bugfix] Run FlashInfer autotuning before KV cache allocation
bug
Something isn't working
v1
#40887
opened Apr 25, 2026 by
bhoomit
Contributor
Loading…
fix(gemma4): remap compressed-tensors AWQ MoE keys in _weight_iterator
#40886
opened Apr 25, 2026 by
tajwali
Loading…
[Doc] Clarify Qwen3-Omni OpenAI transcription client and docs (#29405)
documentation
Improvements or additions to documentation
qwen
Related to Qwen models
#40884
opened Apr 25, 2026 by
happybhati
Loading…
[vLLM IR] Fixes for Triton implementations
#40883
opened Apr 25, 2026 by
ProExpertProg
Collaborator
•
Draft
elastic_ep: stage/commit MoE prepare/finalize on reconfigure
#40881
opened Apr 25, 2026 by
itayalroy
Contributor
Loading…
[V1][Scheduler] Use list-slice compare in _has_repeating_pattern
v1
#40879
opened Apr 25, 2026 by
aaronagent
Loading…
3 tasks done
[CPU][Sampler] Drop redundant q.exponential_() in TopKTopPSampler.forward_cpu all-seeded branch
v1
#40878
opened Apr 25, 2026 by
aaronagent
Loading…
3 tasks done
[V1][Spec Decode] Skip global RNG fill in rejection sampler when all drafted requests are seeded
v1
#40877
opened Apr 25, 2026 by
aaronagent
Loading…
2 tasks done
[V1][Spec Decode] Avoid O(N*K) membership scan in NgramProposer.batch_propose
speculative-decoding
v1
#40876
opened Apr 25, 2026 by
aaronagent
Loading…
3 tasks done
[New Model][ROCm] Add AMD support for DeepSeek V4
ci/build
deepseek
Related to DeepSeek models
documentation
Improvements or additions to documentation
gpt-oss
Related to GPT-OSS models
kv-connector
needs-rebase
new-model
Requests to new models
nvidia
performance
Performance-related issues
rocm
Related to AMD ROCm
speculative-decoding
tool-calling
v1
[vLLM IR] Propagate lowering annotations through source_fn_stack
#40870
opened Apr 25, 2026 by
Goomelo
Loading…
Fix Cohere embed task prefix rendering
frontend
#40866
opened Apr 25, 2026 by
maxiaosong1124
Loading…
Previous Next
ProTip!
Follow long discussions with comments:>50.