alibaba / rtp-llm Public

Notifications You must be signed in to change notification settings
Fork 175
Star 1.1k

Code
Issues 31
Pull requests 71
Discussions
Actions
Projects
Security and quality
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security and quality
Insights

Pull requests: alibaba/rtp-llm

Labels 10 Milestones 0

New pull request New

71 Open 730 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

update: update kvcm client

#918 opened Apr 21, 2026 by lucky-zzz Collaborator

Loading…

feat: refactor py model device

#917 opened Apr 21, 2026 by JackTan25 Collaborator

Loading…

Defer engine and RPC loop start until after full server init

#916 opened Apr 21, 2026 by xinfei-shi Collaborator

Loading…

Support batch_prefill && TPS bench mode

#914 opened Apr 21, 2026 by alibaba-miji Collaborator

Loading…

6 tasks done

fix: split CI timeout logic for PENDING and RUNNING states

#913 opened Apr 21, 2026 by guoj14 Contributor

Loading…

Feature/p2p connector complete

#910 opened Apr 17, 2026 by ZhihanYan Collaborator

Loading…

refactor: refactor codes

#909 opened Apr 17, 2026 by JackTan25 Collaborator

Loading…

perf: optimize MoE model weight loading (8.6x speedup)

#908 opened Apr 17, 2026 by netaddi Collaborator

Loading…

3 tasks

Feat/hybrid cp gdn

#906 opened Apr 17, 2026 by yang1556 Collaborator

Loading…

feat: support input_embeddings in inference pipeline

#905 opened Apr 17, 2026 by KrisCheng9 Collaborator

Loading…

feat: upgrade rocm6.4.3 to rocm7.2.0

#904 opened Apr 16, 2026 by liaocz Collaborator

Loading…

optimize beam search

#903 opened Apr 16, 2026 by parkerpang

Loading…

feat: support xgrammer

#902 opened Apr 16, 2026 by wanglining97 Collaborator

Loading…

perf: add masked aware top-k op to boost perfermance of beam search with constrained decoding

#901 opened Apr 16, 2026 by zhangjianning-zjn Collaborator

Loading…

[ROCm] Optimize Qwen3.5 with fused kernel and allreduce merging

#900 opened Apr 16, 2026 by chengshu-lcc Collaborator

Loading…

feat: add Kimi Linear (KDA) model support

#899 opened Apr 16, 2026 by theNiemand Collaborator

Loading…

feat: Qwen3.5 Blackwell GDN prefill optimization

#897 opened Apr 15, 2026 by netaddi Collaborator

Loading…

3 tasks

限制性解码修改

#893 opened Apr 14, 2026 by Glen11111Z

Loading…

feature: support tpsize > kv heads

#891 opened Apr 14, 2026 by ZhangZhiPku Collaborator

Loading…

Gb200 Qwen3.5 NVFP4

#888 opened Apr 14, 2026 by qqbbiu Collaborator

Loading…

fix: fix nvfp4 dp2 cuda graph smoke crash bug

#887 opened Apr 14, 2026 by JackTan25 Collaborator

Loading…

feat: more production robust

#885 opened Apr 13, 2026 by yyhclimacool

Loading…

Implement true EP (Expert Parallelism) mode for Qwen3 ROCm MoE

#884 opened Apr 13, 2026 by Xu-Sheng-lin Collaborator

Loading…

feat: [ROCm] support FP8 PTPC/PerBlock quantization for Qwen3.5

#882 opened Apr 13, 2026 by chengshu-lcc Collaborator

Loading…

feat - optimize gemm weights load logic

#880 opened Apr 13, 2026 by alibaba-miji Collaborator

Loading…

Previous 1 2 3 Next

Previous Next

ProTip! Filter pull requests by the default branch with base:main.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!