Propagate FINISHED_ERROR from detokenization init failure by sufubao · Pull Request #1299 · ModelTC/LightLLM

sufubao · 2026-05-09T04:06:25Z

Summary

When the detokenization process fails to initialize a request (shm link, decode-mode fix, token-healing init, or the entire _add_new_group_req_index recv batch), the error currently has no path back to the http server: the out-tokens queue stays empty, the client hangs until disconnect, and the shm slot leaks because can_released_mark is never set.

This PR introduces a terminal error state and wires it through:

FinishStatus.FINISHED_ERROR (status 3, rendered as "error" by the OpenAI finish-reason mapper) added to lightllm/server/core/objs/req.py, with is_finished extended to include it.
Per-req init failure — wrap the body of _add_new_group_req_index in try/except. On failure, mark finish_status = FINISHED_ERROR, set finish_token_index = input_len, push an empty sentinel into out_tokens_queue, and set can_released_mark = True so the http loop drains a terminal status and the shm slot is reclaimed. Continue with the rest of the group instead of aborting the whole batch.
Whole-batch failure — if the _add_new_group_req_index call itself raises (malformed recv_obj, etc.), still publish a wake-up sentinel via pub_to_httpserver.send_pyobj(None) so the http loop drains the error sentinels we just pushed instead of waiting on a queue that no longer has activity.
OpenAI API mapping (api_openai.py) — translate the new "error" finish reason into the standard OpenAI error envelope so streaming and non-streaming clients both see a clean error instead of a stalled connection.

Why a separate PR

This was originally bundled with the logging/cache-stats work in #1289 but changes request lifecycle semantics, API error mapping, and resource-release timing — all of which warrant independent review.

Test plan

Inject a synthetic exception in _add_new_group_req_index (e.g. raise inside link_prompt_ids_shm_array) and verify both /generate and /v1/chat/completions clients return promptly with an error envelope instead of hanging
Stream and non-stream paths both get finish_reason: "error" (or the OpenAI-mapped equivalent) without truncating to a prior chunk
Verify the failed req's shm slot is reclaimed (can_released_mark set, shm_req_manager.put_back_req_obj runs) — no leak across many failures
Whole-batch failure path (raise outside the inner loop): confirm the wake-up sentinel reaches the http loop and does not get re-interpreted as a real out-token frame
Confirm normal happy path is unchanged — no spurious FINISHED_ERROR reqs under load
Metrics sanity: FINISHED_ERROR reqs do not pollute output-TPS / EMA stats (router_statics)

Add FinishStatus.FINISHED_ERROR (status 3, render as "error") and wrap detokenization _add_new_group_req_index in try/except. On init failure (shm link, decode-mode fix, token-healing init), mark the req with FINISHED_ERROR so the http loop forwards a terminal status instead of hanging until client disconnect.

…r" to API error path - detoken: on _add_new_group_req_index failure, set FINISHED_ERROR, push an empty-string sentinel into out_tokens_queue at finish_token_index, mark can_released_mark, and continue with the rest of the group instead of re-raising. Without this the http loop stays blocked (queue empty, no finish ever forwarded) and the shm req leaks until client disconnect. - openai: surface FINISHED_ERROR as a controlled error response. Non-stream chat / completions return HTTP 500; streaming chat / completions yield an SSE error event followed by [DONE] and stop. Previously "error" leaked into ChatCompletionResponseChoice / CompletionChoice whose finish_reason literals reject it, raising Pydantic ValidationError.

gemini-code-assist

Code Review

This pull request implements structured error handling for internal pipeline failures by introducing a FINISHED_ERROR status. The OpenAI API and detokenization manager are updated to catch initialization and processing errors, ensuring they are reported to the client as controlled error responses or SSE payloads instead of causing hangs. A review comment points out a missing call to put_back_req_obj in the detokenization error path, which is necessary to prevent shared memory resource leaks.

gemini-code-assist · 2026-05-09T04:09:07Z

+                req.can_released_mark = True
+                failed_count += 1


The shared memory request object must be returned to the manager in the error path to prevent a reference count leak. In the successful path, this is handled in remove_finished_reqs (line 191). Without this call, the ref_count will not reach 1, and the HttpServerManager will not be able to reclaim the shared memory slot, eventually leading to resource exhaustion.

Suggested change

req.can_released_mark = True

failed_count += 1

req.can_released_mark = True

self.shm_req_manager.put_back_req_obj(req)

failed_count += 1

sufubao added 2 commits May 9, 2026 11:49

gemini-code-assist Bot reviewed May 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagate FINISHED_ERROR from detokenization init failure#1299

Propagate FINISHED_ERROR from detokenization init failure#1299
sufubao wants to merge 2 commits intoModelTC:mainfrom
sufubao:pr/detoken-error-propagation

sufubao commented May 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sufubao commented May 9, 2026

Summary

Why a separate PR

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant