-
Notifications
You must be signed in to change notification settings - Fork 175
Pull requests: alibaba/rtp-llm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Defer engine and RPC loop start until after full server init
#916
opened Apr 21, 2026 by
xinfei-shi
Collaborator
Loading…
Support batch_prefill && TPS bench mode
#914
opened Apr 21, 2026 by
alibaba-miji
Collaborator
Loading…
6 tasks done
fix: split CI timeout logic for PENDING and RUNNING states
#913
opened Apr 21, 2026 by
guoj14
Contributor
Loading…
perf: optimize MoE model weight loading (8.6x speedup)
#908
opened Apr 17, 2026 by
netaddi
Collaborator
Loading…
3 tasks
feat: support input_embeddings in inference pipeline
#905
opened Apr 17, 2026 by
KrisCheng9
Collaborator
Loading…
perf: add masked aware top-k op to boost perfermance of beam search with constrained decoding
#901
opened Apr 16, 2026 by
zhangjianning-zjn
Collaborator
Loading…
[ROCm] Optimize Qwen3.5 with fused kernel and allreduce merging
#900
opened Apr 16, 2026 by
chengshu-lcc
Collaborator
Loading…
feat: add Kimi Linear (KDA) model support
#899
opened Apr 16, 2026 by
theNiemand
Collaborator
Loading…
feat: Qwen3.5 Blackwell GDN prefill optimization
#897
opened Apr 15, 2026 by
netaddi
Collaborator
Loading…
3 tasks
fix: fix nvfp4 dp2 cuda graph smoke crash bug
#887
opened Apr 14, 2026 by
JackTan25
Collaborator
Loading…
Implement true EP (Expert Parallelism) mode for Qwen3 ROCm MoE
#884
opened Apr 13, 2026 by
Xu-Sheng-lin
Collaborator
Loading…
feat: [ROCm] support FP8 PTPC/PerBlock quantization for Qwen3.5
#882
opened Apr 13, 2026 by
chengshu-lcc
Collaborator
Loading…
feat - optimize gemm weights load logic
#880
opened Apr 13, 2026 by
alibaba-miji
Collaborator
Loading…
Previous Next
ProTip!
Filter pull requests by the default branch with base:main.