feat(RL): add RL support for verl by shihaobai · Pull Request #1298 · ModelTC/LightLLM

shihaobai · 2026-05-08T13:16:30Z

No description provided.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Co-authored-by: Weichao Luo <luoweichao@sensetime.com> Co-authored-by: shihaobai <1798930569@qq.com>

Co-authored-by: sufubao <sufubao@sensetime.com>

Adds multi-instance port isolation to allow multiple LightLLM servers on the same machine without port conflicts. Each instance gets a dedicated 1000-port range (instance 0: 10000-10999, etc.). Changes: - Added --lightllm_instance_id CLI arg (0-7) for instance selection - Refactored port allocation to use deterministic ranges instead of random selection via portpicker - Removed portpicker dependency from requirements.txt - Base port configurable via LIGHTLLM_BASE_PORT env var - Removed SO_REUSEADDR from port probe to avoid false positives - Simplified to single linear scan (removed ineffective retry logic)

gemini-code-assist

Code Review

This pull request introduces several significant enhancements to the LightLLM framework, including support for Multi-Token Prediction (MTP) with optimized Triton kernels, memory management improvements (including a torch memory saver), and support for new model architectures like NeoChat. It also adds a routing data capture mechanism for MoE models and improves the robustness of the server launch process. My review identified several critical issues in the new MTP state management and kernel logic, including incorrect assertion logic, device mismatch errors when indexing tensors, and incorrect method names being called. I have provided specific suggestions to address these bugs.

gemini-code-assist · 2026-05-08T13:23:24Z

+    ):
+        start_args = get_env_start_args()
+        if self.size is not None:
+            assert self.size < start_args.running_max_req_size * 2, (


The assertion logic here is inverted. It should check that self.size is at least running_max_req_size * 2, as indicated by the error message on the following lines.

Suggested change

assert self.size < start_args.running_max_req_size * 2, (

assert self.size >= start_args.running_max_req_size * 2, (

gemini-code-assist · 2026-05-08T13:23:24Z

+            if mask.sum() > 0:
+                actual_req_idxes = model_input.b_req_idx[b_req_mtp_start_loc[mask]]


There are two issues here:

mask.sum() > 0 on a boolean tensor returns a tensor. It's safer to use .any().

b_req_mtp_start_loc is a list (initialized at line 255), so it cannot be indexed by a boolean tensor mask. You should convert it to a tensor or use list comprehension.

Suggested change

if mask.sum() > 0:

actual_req_idxes = model_input.b_req_idx[b_req_mtp_start_loc[mask]]

if mask.any():

mask_cpu = mask.cpu()

actual_req_idxes = model_input.b_req_idx[[b_req_mtp_start_loc[i] for i, m in enumerate(mask_cpu) if m]]

gemini-code-assist · 2026-05-08T13:23:24Z

+                if mask.sum() > 0:
+                    actual_req_idxes = b_req_idx[b_req_mtp_start_loc[mask]]
+                    src_buffer_indexes = g_infer_context.req_manager.req_to_buffer_index[


Similar to the issue in chunked_prefill/impl.py:

mask.sum() > 0 should be mask.any().

b_req_mtp_start_loc is a list and cannot be indexed by a tensor.

mtp_accept_len[mask] should be moved to CPU to avoid device mismatch with req_to_buffer_index.

Suggested change

if mask.sum() > 0:

actual_req_idxes = b_req_idx[b_req_mtp_start_loc[mask]]

src_buffer_indexes = g_infer_context.req_manager.req_to_buffer_index[

if mask.any():

mask_cpu = mask.cpu()

actual_req_idxes = b_req_idx[[b_req_mtp_start_loc[i] for i, m in enumerate(mask_cpu) if m]]

src_buffer_indexes = g_infer_context.req_manager.req_to_buffer_index[

actual_req_idxes, mtp_accept_len[mask].cpu() - 1

]

gemini-code-assist · 2026-05-08T13:23:24Z

+    def resume_all(self):
+        torch.cuda.empty_cache()
+        gc.collect()
+        self.torch_memory_saver.resume(tag=MemoryTag.WEIGHT)
+        self.torch_memory_saver.resume(tag=MemoryTag.KV_CACHE)
+        self.torch_memory_saver.resume(tag=MemoryTag.GRAPH)
+        self.mem_manager.free_all()


The resume_all method is missing a call to self.req_manager.resume(), which is present in resume_kv_cache. Without this, the request manager might not be properly re-initialized after a memory resume operation.

Suggested change

def resume_all(self):

torch.cuda.empty_cache()

gc.collect()

self.torch_memory_saver.resume(tag=MemoryTag.WEIGHT)

self.torch_memory_saver.resume(tag=MemoryTag.KV_CACHE)

self.torch_memory_saver.resume(tag=MemoryTag.GRAPH)

self.mem_manager.free_all()

def resume_all(self):

torch.cuda.empty_cache()

gc.collect()

self.torch_memory_saver.resume(tag=MemoryTag.WEIGHT)

self.torch_memory_saver.resume(tag=MemoryTag.KV_CACHE)

self.torch_memory_saver.resume(tag=MemoryTag.GRAPH)

self.mem_manager.free_all()

self.req_manager.resume()

gemini-code-assist · 2026-05-08T13:23:24Z

                cur_group_reqs, is_busy, new_batch_first_router_need_tokens
            )
-            if ok_insert:
+            if ok_insert and False:


The condition if ok_insert and False: effectively disables adding new group requests to the running list. This looks like debug code that was accidentally left in.

Suggested change

if ok_insert and False:

if ok_insert:

gemini-code-assist · 2026-05-08T13:23:24Z

+                actual_req_idxes = model_input.b_req_idx[b_req_mtp_start_loc[mask]]
+                # Source: the accepted buffer (at index accept_len - 1)
+                src_buffer_indexes = g_infer_context.req_manager.req_to_buffer_index[
+                    actual_req_idxes, mtp_accept_len[mask] - 1


Potential device mismatch. mtp_accept_len is a GPU tensor, while req_to_buffer_index is likely a CPU tensor managed by ReqManager. You should move the indices to CPU before indexing.

src_buffer_indexes = g_infer_context.req_manager.req_to_buffer_index[ actual_req_idxes, mtp_accept_len[mask].cpu() - 1 ]

gemini-code-assist · 2026-05-08T13:23:24Z

+                    if hasattr(g_infer_context.req_manager.buffer_mem_manager, "copy_buffer_p2p"):
+                        g_infer_context.req_manager.buffer_mem_manager.copy_buffer_p2p(
+                            src_buffer_indexes, dst_buffer_indexes
+                        )


The method name in MambaCacheManager is copy_state_buffers, not copy_buffer_p2p. The current check and call will fail to execute the state copy.

Suggested change

if hasattr(g_infer_context.req_manager.buffer_mem_manager, "copy_buffer_p2p"):

g_infer_context.req_manager.buffer_mem_manager.copy_buffer_p2p(

src_buffer_indexes, dst_buffer_indexes

)

if hasattr(g_infer_context.req_manager.buffer_mem_manager, "copy_state_buffers"):

g_infer_context.req_manager.buffer_mem_manager.copy_state_buffers(

src_buffer_indexes, dst_buffer_indexes

)

gemini-code-assist · 2026-05-08T13:23:24Z

+    # NOTE: The pg_options parameter was renamed into backend_options in PyTorch 2.6.0
+    # https://github.com/pytorch/pytorch/commit/a0c7029a75628cd5fa8df83c0de0ea98ee7fd844
+    # We need to determine the appropriate parameter name based on PyTorch version
+    pg_options_param_name = "backend_options" if str(torch.__version__) >= "2.6" else "pg_options"


Using string comparison on torch.__version__ (e.g., '2.10' < '2.6') can lead to incorrect results. It is safer to use version.parse since it is already imported.

Suggested change

pg_options_param_name = "backend_options" if str(torch.__version__) >= "2.6" else "pg_options"

pg_options_param_name = "backend_options" if version.parse(torch.__version__) >= version.parse("2.6") else "pg_options"

shihaobai and others added 30 commits November 14, 2025 13:38

add /flush_cache (#1108)

b636310

Aborted reqs (#1113)

60c379e

flush cache mulit node (#1116)

4095831

[bugfix]: flush cache in single node (#1118)

ca9325f

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

add pause and continue (#1120)

9948925

add launch_server and StartArgs (#1119)

4b32287

Update weight (#1127)

27abcf5

Co-authored-by: Weichao Luo <luoweichao@sensetime.com> Co-authored-by: shihaobai <1798930569@qq.com>

release and resume (#1122)

c210c82

use portpicker (#1142)

094df8c

Rl weight (#1143)

560be02

Co-authored-by: sufubao <sufubao@sensetime.com>

add_cli

3d225d7

add 30b moe configs

499074a

update requirement

f737585

add-neo-chat

8a67a47

add-neo-chat

fdc1369

add-neo-chat

e8e7416

add-neo-chat

ba44983

add-neo-chat

4d41a33

fix-neo-chat

0e8845c

fix-neo-chat-position-ids-h

b48cd49

add-neo-chat-dense

7a904f3

add-neo-chat-dense

4b757dd

support verl.

e208733

improve0108

245357c

add min/max pixels sampling parameters

6503ac8

fix fused_moe not installed use pip.

07df460

add visual nccl port alloc

a6f00fb

fix0115

9360197

fix0115

920a741

fp8 online quant for moe

3aa5e18

shihaobai and others added 13 commits March 26, 2026 13:58

fix

334e3c4

merge the update of qwen3.5_clean

2f34bac

fix

0974ba9

fix: occasional accuracy drop in rollout

f4a0cb7

reset req manager

f4caa8f

fix typo

8794f43

add fp8 rl for qwen35

1abf95a

fix abort

901bd13

add logs for detoken

8de8baf

fix decode overflow

2dc39fa

fix bytes decode

8c20369

merge main

6cd300c

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

shihaobai added 6 commits May 8, 2026 13:23

remove neo

1f466c7

remove unused code

9e54f20

remove unused code

46d2ee2

remove unused code

f2c1a3e

slime code

e7c1475

slim code

1ecf015

shihaobai requested review from hiworldwzj and kingder May 9, 2026 05:34

shihaobai added 8 commits May 9, 2026 06:57

slime code

a93dcb6

slime radix cache

ccc8832

slime radixcache

11ea37a

slim code

f446e5b

remove unused code

998020a

fix

5a745e5

lazy init cache dir

90ed556

fix linear flush_cache

eb42e5b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(RL): add RL support for verl#1298

feat(RL): add RL support for verl#1298
shihaobai wants to merge 196 commits intomainfrom
rl_verl_rebase_main

shihaobai commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	assert self.size < start_args.running_max_req_size * 2, (
	assert self.size >= start_args.running_max_req_size * 2, (

		if mask.sum() > 0:
		actual_req_idxes = model_input.b_req_idx[b_req_mtp_start_loc[mask]]

-            if mask.sum() > 0:
-                actual_req_idxes = model_input.b_req_idx[b_req_mtp_start_loc[mask]]
+            if mask.any():
+                mask_cpu = mask.cpu()
+                actual_req_idxes = model_input.b_req_idx[[b_req_mtp_start_loc[i] for i, m in enumerate(mask_cpu) if m]]

	pg_options_param_name = "backend_options" if str(torch.__version__) >= "2.6" else "pg_options"
	pg_options_param_name = "backend_options" if version.parse(torch.__version__) >= version.parse("2.6") else "pg_options"

Conversation

shihaobai commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants