Skip to content

Propagate FINISHED_ERROR from detokenization init failure#1299

Open
sufubao wants to merge 2 commits intoModelTC:mainfrom
sufubao:pr/detoken-error-propagation
Open

Propagate FINISHED_ERROR from detokenization init failure#1299
sufubao wants to merge 2 commits intoModelTC:mainfrom
sufubao:pr/detoken-error-propagation

Conversation

@sufubao
Copy link
Copy Markdown
Collaborator

@sufubao sufubao commented May 9, 2026

Summary

When the detokenization process fails to initialize a request (shm link, decode-mode fix, token-healing init, or the entire _add_new_group_req_index recv batch), the error currently has no path back to the http server: the out-tokens queue stays empty, the client hangs until disconnect, and the shm slot leaks because can_released_mark is never set.

This PR introduces a terminal error state and wires it through:

  • FinishStatus.FINISHED_ERROR (status 3, rendered as "error" by the OpenAI finish-reason mapper) added to lightllm/server/core/objs/req.py, with is_finished extended to include it.
  • Per-req init failure — wrap the body of _add_new_group_req_index in try/except. On failure, mark finish_status = FINISHED_ERROR, set finish_token_index = input_len, push an empty sentinel into out_tokens_queue, and set can_released_mark = True so the http loop drains a terminal status and the shm slot is reclaimed. Continue with the rest of the group instead of aborting the whole batch.
  • Whole-batch failure — if the _add_new_group_req_index call itself raises (malformed recv_obj, etc.), still publish a wake-up sentinel via pub_to_httpserver.send_pyobj(None) so the http loop drains the error sentinels we just pushed instead of waiting on a queue that no longer has activity.
  • OpenAI API mapping (api_openai.py) — translate the new "error" finish reason into the standard OpenAI error envelope so streaming and non-streaming clients both see a clean error instead of a stalled connection.

Why a separate PR

This was originally bundled with the logging/cache-stats work in #1289 but changes request lifecycle semantics, API error mapping, and resource-release timing — all of which warrant independent review.

Test plan

  • Inject a synthetic exception in _add_new_group_req_index (e.g. raise inside link_prompt_ids_shm_array) and verify both /generate and /v1/chat/completions clients return promptly with an error envelope instead of hanging
  • Stream and non-stream paths both get finish_reason: "error" (or the OpenAI-mapped equivalent) without truncating to a prior chunk
  • Verify the failed req's shm slot is reclaimed (can_released_mark set, shm_req_manager.put_back_req_obj runs) — no leak across many failures
  • Whole-batch failure path (raise outside the inner loop): confirm the wake-up sentinel reaches the http loop and does not get re-interpreted as a real out-token frame
  • Confirm normal happy path is unchanged — no spurious FINISHED_ERROR reqs under load
  • Metrics sanity: FINISHED_ERROR reqs do not pollute output-TPS / EMA stats (router_statics)

sufubao added 2 commits May 9, 2026 11:49
Add FinishStatus.FINISHED_ERROR (status 3, render as "error") and wrap
detokenization _add_new_group_req_index in try/except. On init failure
(shm link, decode-mode fix, token-healing init), mark the req with
FINISHED_ERROR so the http loop forwards a terminal status instead of
hanging until client disconnect.
…r" to API error path

- detoken: on _add_new_group_req_index failure, set FINISHED_ERROR, push an
  empty-string sentinel into out_tokens_queue at finish_token_index, mark
  can_released_mark, and continue with the rest of the group instead of
  re-raising. Without this the http loop stays blocked (queue empty, no
  finish ever forwarded) and the shm req leaks until client disconnect.
- openai: surface FINISHED_ERROR as a controlled error response. Non-stream
  chat / completions return HTTP 500; streaming chat / completions yield an
  SSE error event followed by [DONE] and stop. Previously "error" leaked
  into ChatCompletionResponseChoice / CompletionChoice whose finish_reason
  literals reject it, raising Pydantic ValidationError.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements structured error handling for internal pipeline failures by introducing a FINISHED_ERROR status. The OpenAI API and detokenization manager are updated to catch initialization and processing errors, ensuring they are reported to the client as controlled error responses or SSE payloads instead of causing hangs. A review comment points out a missing call to put_back_req_obj in the detokenization error path, which is necessary to prevent shared memory resource leaks.

Comment on lines +86 to +87
req.can_released_mark = True
failed_count += 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The shared memory request object must be returned to the manager in the error path to prevent a reference count leak. In the successful path, this is handled in remove_finished_reqs (line 191). Without this call, the ref_count will not reach 1, and the HttpServerManager will not be able to reclaim the shared memory slot, eventually leading to resource exhaustion.

Suggested change
req.can_released_mark = True
failed_count += 1
req.can_released_mark = True
self.shm_req_manager.put_back_req_obj(req)
failed_count += 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant