Propagate FINISHED_ERROR from detokenization init failure#1299
Propagate FINISHED_ERROR from detokenization init failure#1299sufubao wants to merge 2 commits intoModelTC:mainfrom
Conversation
Add FinishStatus.FINISHED_ERROR (status 3, render as "error") and wrap detokenization _add_new_group_req_index in try/except. On init failure (shm link, decode-mode fix, token-healing init), mark the req with FINISHED_ERROR so the http loop forwards a terminal status instead of hanging until client disconnect.
…r" to API error path - detoken: on _add_new_group_req_index failure, set FINISHED_ERROR, push an empty-string sentinel into out_tokens_queue at finish_token_index, mark can_released_mark, and continue with the rest of the group instead of re-raising. Without this the http loop stays blocked (queue empty, no finish ever forwarded) and the shm req leaks until client disconnect. - openai: surface FINISHED_ERROR as a controlled error response. Non-stream chat / completions return HTTP 500; streaming chat / completions yield an SSE error event followed by [DONE] and stop. Previously "error" leaked into ChatCompletionResponseChoice / CompletionChoice whose finish_reason literals reject it, raising Pydantic ValidationError.
There was a problem hiding this comment.
Code Review
This pull request implements structured error handling for internal pipeline failures by introducing a FINISHED_ERROR status. The OpenAI API and detokenization manager are updated to catch initialization and processing errors, ensuring they are reported to the client as controlled error responses or SSE payloads instead of causing hangs. A review comment points out a missing call to put_back_req_obj in the detokenization error path, which is necessary to prevent shared memory resource leaks.
| req.can_released_mark = True | ||
| failed_count += 1 |
There was a problem hiding this comment.
The shared memory request object must be returned to the manager in the error path to prevent a reference count leak. In the successful path, this is handled in remove_finished_reqs (line 191). Without this call, the ref_count will not reach 1, and the HttpServerManager will not be able to reclaim the shared memory slot, eventually leading to resource exhaustion.
| req.can_released_mark = True | |
| failed_count += 1 | |
| req.can_released_mark = True | |
| self.shm_req_manager.put_back_req_obj(req) | |
| failed_count += 1 |
Summary
When the detokenization process fails to initialize a request (shm link, decode-mode fix, token-healing init, or the entire
_add_new_group_req_indexrecv batch), the error currently has no path back to the http server: the out-tokens queue stays empty, the client hangs until disconnect, and the shm slot leaks becausecan_released_markis never set.This PR introduces a terminal error state and wires it through:
FinishStatus.FINISHED_ERROR(status3, rendered as"error"by the OpenAI finish-reason mapper) added tolightllm/server/core/objs/req.py, withis_finishedextended to include it._add_new_group_req_indexintry/except. On failure, markfinish_status = FINISHED_ERROR, setfinish_token_index = input_len, push an empty sentinel intoout_tokens_queue, and setcan_released_mark = Trueso the http loop drains a terminal status and the shm slot is reclaimed. Continue with the rest of the group instead of aborting the whole batch._add_new_group_req_indexcall itself raises (malformedrecv_obj, etc.), still publish a wake-up sentinel viapub_to_httpserver.send_pyobj(None)so the http loop drains the error sentinels we just pushed instead of waiting on a queue that no longer has activity.api_openai.py) — translate the new"error"finish reason into the standard OpenAI error envelope so streaming and non-streaming clients both see a clean error instead of a stalled connection.Why a separate PR
This was originally bundled with the logging/cache-stats work in #1289 but changes request lifecycle semantics, API error mapping, and resource-release timing — all of which warrant independent review.
Test plan
_add_new_group_req_index(e.g. raise insidelink_prompt_ids_shm_array) and verify both/generateand/v1/chat/completionsclients return promptly with an error envelope instead of hangingfinish_reason: "error"(or the OpenAI-mapped equivalent) without truncating to a prior chunkcan_released_markset,shm_req_manager.put_back_req_objruns) — no leak across many failuresFINISHED_ERRORreqs under loadFINISHED_ERRORreqs do not pollute output-TPS / EMA stats (router_statics)