Skip to content

fix(proxy): finalize completed responses streams after client abort#1251

Merged
ding113 merged 2 commits into
ding113:devfrom
Brisbanehuang:codex/cch-client-abort-sse-sanitize
Jun 11, 2026
Merged

fix(proxy): finalize completed responses streams after client abort#1251
ding113 merged 2 commits into
ding113:devfrom
Brisbanehuang:codex/cch-client-abort-sse-sanitize

Conversation

@Brisbanehuang

@Brisbanehuang Brisbanehuang commented Jun 6, 2026

Copy link
Copy Markdown

Summary

  • keep internal stream accounting alive briefly after downstream disconnects so completed upstream Responses streams can finalize with their real 2xx status and usage instead of being stored as 499 CLIENT_ABORTED
  • bound that post-disconnect drain window so a hung upstream cannot keep the internal reader alive indefinitely when idle timeout is disabled
  • filter inert chat.completion.chunk SSE data lines from /v1/responses output, while preserving non-empty chunks, finish markers, usage-bearing chunks, and non-Responses sessions

Related Issues:

Approach

  1. Client-abort finalization

    • On pure client disconnect, the internal accounting branch is not immediately cancelled.
    • finalizeDeferredStreamingFinalizationIfNeeded reclassifies client-aborted streams as success only when the upstream status is 2xx, no upstream error payload is detected, and positive usage was parsed.
    • True mid-stream aborts still finalize as 499, and timeout/upstream failures still finalize as failures.
    • A bounded drain timer aborts both the upstream responseController and local accounting loop if the drain window is exceeded.
  2. Responses SSE sanitizer

    • ResponseFixer strips only inert chat.completion.chunk data: lines for session.originalFormat === "response".
    • The immediately following blank SSE separator is also removed so filtering does not leave an orphan empty event frame.
    • Chunks with content, finish_reason, usage, or non-Responses format are preserved.
  3. Rebased on dev

Commits

  • fix(proxy): finalize complete responses after client abort
  • fix(proxy): sanitize inert chat chunks in responses streams

Validation

bunx @biomejs/biome@2.4.15 check src/app/v1/_lib/proxy/response-fixer/index.ts src/app/v1/_lib/proxy/response-fixer/response-fixer.test.ts src/app/v1/_lib/proxy/response-handler.ts tests/unit/proxy/response-handler-client-abort-drain.test.ts
bun run test src/app/v1/_lib/proxy/response-fixer/response-fixer.test.ts tests/unit/proxy/response-handler-client-abort-drain.test.ts
bun run build

Additional local check:

bun run test

This full local test run currently fails in two files outside this PR's changed paths:

  • tests/unit/k8s-deploy-shell-helpers.test.ts - macOS tr: Illegal byte sequence in the /dev/urandom fallback path
  • src/components/ui/__tests__/language-switcher.test.tsx - expected console.error call is not emitted in the blocked sessionStorage test

Those failures are reproducible when run individually and are not in files touched by this PR.

Greptile Summary

This PR addresses two distinct problems in the Responses streaming proxy path: (1) false 499 CLIENT_ABORTED status codes recorded for completed upstream streams when the downstream client disconnects before the internal accounting loop observes the final SSE event, and (2) schema-validation failures in strict Responses clients when OpenAI-compatible upstreams inject empty chat.completion.chunk events into a /v1/responses stream. It also introduces a hedge-loser billing feature that drains losing-race-attempt responses in the background to capture and bill their real token usage.

  • Client-abort drain: On client disconnect, the internal tee'd accounting branch now continues draining buffered data rather than being cancelled immediately; a new clientAbortCompleteSuccess gate reclassifies the stream as the real 2xx status when 2xx upstream code, no fake-error detected, and positive billable tokens are all confirmed.
  • SSE sanitizer: ResponseFixer.filterInertResponsesChatCompletionChunks strips inert chat.completion.chunk data lines (empty content, no tool calls, no finish reason, no usage) from Responses SSE output only when session.originalFormat === \"response\", with the blank-line separator following each filtered line also elided via the skipNextBlankLine flag.
  • Hedge-loser billing: Losing hedge attempts are kept alive to drain their response bodies, usage is parsed, and costs are accumulated onto the original request row via an idempotent SQL pattern (winner: winnerCost + SUM(hedge_losers); loser: costUsd += delta) that is commutative regardless of ordering.

Confidence Score: 5/5

Safe to merge; the drain-based reclassification is conservatively gated on three independent checks, the SSE sanitizer early-exits and returns original bytes when no inert chunk is present, and the hedge-loser billing SQL is commutative and idempotent regardless of winner/loser write ordering.

The clientAbortCompleteSuccess reclassification requires 2xx upstream status, no fake-error detected, and positive billable tokens — all three must hold simultaneously, making false reclassification of genuinely aborted streams effectively impossible. The SSE sanitizer's JSON.parse safety gate means malformed or partial chunks pass through unchanged. The concurrent-write SQL for hedge-loser billing converges to the correct grand total under all Postgres row-lock orderings because the winner uses an idempotent replacement (winnerCost + SUM(current losers)) and each loser uses an additive delta with a JSONB dedup guard. Four new unit tests cover the critical paths including the drain-timeout bound. Only a stale inline comment remains.

No files require special attention; the comment at line 558 of response-handler.ts is a documentation nit with no behavioral impact.

Important Files Changed

Filename Overview
src/app/v1/_lib/proxy/response-handler.ts Core drain logic: removes AsyncTaskManager.cancel/abortController.abort from client-disconnect handler, adds clientAbortDrainTimeoutMs safety bound, removes clientAbortSignal check from internal loop, and adds clientAbortCompleteSuccess reclassification. Also introduces finalizeHedgeLoserBilling and winnerLoserAware cost-write path. One inline comment (line 558) is stale after the new reclassification branch was added.
src/app/v1/_lib/proxy/response-fixer/index.ts Adds filterInertResponsesChatCompletionChunks with skipNextBlankLine to remove inert chat chunks and their SSE separator from Responses streams. Module-level UTF8_DECODER/ENCODER singletons used in stateless mode, correct for per-chunk processing.
src/app/v1/_lib/proxy/forwarder.ts Adds StreamingHedgeAttempt billing fields (billAsLoser, loserBillingStarted, firstChunk, billingSnapshot), startLoserBilling IIFE, handleAttemptFailure guard for billed-loser cleanup, commitWinner billing-context snapshot for initial-provider losers, and billHedgeLosers flag propagation. Idempotency guard and concurrent-write ordering are correct.
src/repository/message.ts Adds updateMessageRequestWinnerCost (replacement write: winnerCost + SUM(hedge_losers)) and addMessageRequestHedgeLoserCost (additive write with JSONB dedup guard). SQL design is commutative and idempotent under retry; both functions bypass the async write buffer for durability.
drizzle/0104_watery_thunderbird.sql Adds hedge_losers JSONB column and bill_hedge_losers boolean (default true). Recreates fn_is_message_request_finalized and fn_compute_message_request_success_rate_outcome to include hedge_loser_billed in finalization/exclusion logic. SQL stored procedures correctly updated.
src/lib/utils/hedge-billing.ts New utility: summarizeHedgeBilling, buildHedgeBillingTable, findHedgeLoserCost. Winner cost derived as total − sum(losers), clamped at 0. Clean pure-function implementation.
tests/unit/proxy/response-handler-client-abort-drain.test.ts Four new integration tests covering: completed stream finalized as 200 after pre-aborted client; reclassification when final usage received before stream error; genuinely aborted stream stays 499; drain timeout with fake timers bounds a hanging upstream and aborts it.

Sequence Diagram

sequenceDiagram
    participant Client
    participant ProxyResponseHandler
    participant InternalLoop as Internal Accounting Loop
    participant DB as Database

    Client->>ProxyResponseHandler: POST /v1/responses (streaming)
    ProxyResponseHandler->>ProxyResponseHandler: tee() body into client branch + internal branch
    ProxyResponseHandler-->>Client: stream chunks

    Note over Client: Client closes connection early
    Client--xProxyResponseHandler: disconnect (clientAbortSignal.aborted)

    ProxyResponseHandler->>ProxyResponseHandler: bindClientAbortListener fires
    Note over ProxyResponseHandler: start clientAbortDrainTimer (60s), do NOT cancel internal loop

    InternalLoop->>InternalLoop: continues draining buffered tee branch
    InternalLoop->>InternalLoop: reads response.completed with usage tokens

    alt "streamEndedNormally=false, clientAborted=true, 2xx + no fake error + positive usage"
        InternalLoop->>DB: "updateMessageRequestDetails(statusCode=200)"
        Note over InternalLoop: Reclassified as success
    else upstream truly aborted mid-stream
        InternalLoop->>DB: "updateMessageRequestDetails(statusCode=499, CLIENT_ABORTED)"
    end

    ProxyResponseHandler->>ProxyResponseHandler: clearClientAbortDrainTimer()
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
src/app/v1/_lib/proxy/response-handler.ts:558
**Stale inline comment after new reclassification branch**

The comment on this line states that skipping fake-200 detection means the stream "will still ultimately be finalized as abort/timeout failure below" (`最终仍会在下面按中断/超时视为失败结算`). The `clientAbortCompleteSuccess` path added just below can now reclassify a client-aborted stream as 2xx success, so the always-fails assertion no longer holds and should be updated to mention the new path.

Reviews (2): Last reviewed commit: "fix(proxy): sanitize inert chat chunks i..." | Re-trigger Greptile

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

PR 在 ResponseFixer 与 ResponseHandler 中修正 499 状态误记录问题。当客户端中止但上游已返回 HTTP 200 且包含有效 token 时,新增 clientAbortCompleteSuccess 逻辑将其重分类为成功;同时引入 SSE 惰性 chunk 过滤(仅 Responses 格式),并优化客户端断开边界的 drain 超时控制。

Changes

SSE 过滤与客户端中止处理

Layer / File(s) Summary
ResponseFixer 惰性 chunk 过滤实现
src/app/v1/_lib/proxy/response-fixer/index.ts
新增 UTF-8 编解码常量与 isInertChatCompletionPayload 等私有判定函数;在 SSE transform 与 flush 阶段调用 filterInertResponsesChatCompletionChunks,对满足"惰性"条件(finish_reason 为 null、delta 内仅包含 role、usage 与 choices 无实际内容)的 chat.completion.chunk 行进行过滤并设置 filtered_inert_chat_completion_chunk 详情。
ResponseFixer 惰性 chunk 过滤测试
src/app/v1/_lib/proxy/response-fixer/response-fixer.test.ts
新增 createSseResponse 辅助函数与多条测试用例,验证在 Responses 格式下过滤空 content chunk、保留含真实 content/finish_reason/usage 的 chunk、以及在非 Responses 格式下不过滤的行为;测试断言 specialSettings 的写入时机。
ResponseHandler 客户端中止完成成功检测
src/app/v1/_lib/proxy/response-handler.ts
finalizeDeferredStreamingFinalizationIfNeeded 中新增 clientAbortCompleteSuccess IIFE:当上游 HTTP 200 且满足客户端中止/非自然结束条件时,从 allContent 解析 usage;若存在正向可计费 token 则标记为成功,将 effectiveStatusCode 设为 200 且清空 errorMessage,从而避免误记录为 499 失败。同时调整 shouldClearSessionBindingOnFailure 与非自然结束处理前置条件,加入 !clientAbortCompleteSuccess 限制。
ResponseHandler 客户端 abort drain 超时机制
src/app/v1/_lib/proxy/response-handler.ts
在客户端断开处理 bindClientAbortListener 中引入 drain 超时窗口:不直接取消内部计费,改为启动定时器,超时后通过 responseController?.abort()abortController.abort() 终止上游与内部处理;调整读取循环检查仅监视 abortController.signal.aborted,并在 finally 中清理 clientAbortDrainTimeoutId
ResponseHandler 客户端 abort drain 测试套件
tests/unit/proxy/response-handler-client-abort-drain.test.ts
新增测试文件,通过 mock SSE Response 与 vitest 验证四个客户端 abort 场景:下游关闭但上游成功完成(不取消任务)、final usage 后重分类为成功(归类为 reason: "request_success")、上游真实中止报错(以 499 + CLIENT_ABORTED 更新)、以及 drain 超时后强制 abort 的行为;使用假计时器与 drainAsyncTasks() 确保异步任务完成与时序稳定。

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • ding113/claude-code-hub#1113: 该 PR 对 response-handler.ts 中的客户端中止监听器绑定与清理逻辑进行集中重构,与本 PR 的 abort drain 定时器与 finally 清理逻辑存在直接重叠。
  • ding113/claude-code-hub#759: 该 PR 同样修改 response-handler.ts 中的 abort/最终化逻辑及 abortReason 与有效状态码处理,覆盖与本 PR 相同的责任边界。
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed 标题准确概括了PR的主要变更:在客户端中断后完成Responses流的最终化处理。
Linked Issues check ✅ Passed 代码变更完整解决了所有关联问题:修复了客户端断开后错误记录499状态的问题[#1083,#985,#1242],并通过clientAbortCompleteSuccess重分类和Responses SSE过滤实现。
Out of Scope Changes check ✅ Passed 所有代码变更都在关联问题的范围内:客户端断开drain处理、SSE inert chunk过滤,以及相关测试覆盖,未包含超出范围的修改。
Description check ✅ Passed PR 描述详细说明了修复内容(客户端断开后的流账户处理、499 误报问题、SSE 过滤)及其关联问题,与变更集内容完全相符。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added bug Something isn't working area:core area:OpenAI labels Jun 6, 2026
@Brisbanehuang Brisbanehuang marked this pull request as ready for review June 6, 2026 14:39

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to filter out inert or empty chat completion chunks from SSE streams and updates the response handler to treat client-aborted streams as successful if final usage metrics have already been received. Unit tests have been added to verify both features. The review feedback suggests simplifying the isInertChatCompletionChoice helper function by removing a redundant check for the 'content' key, which is already covered by the fallback validation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +49 to +68
function isInertChatCompletionChoice(choice: unknown): boolean {
if (!isRecord(choice)) return false;
if (choice.finish_reason != null) return false;

const delta = choice.delta;
if (!isRecord(delta)) {
return true;
}

for (const [key, value] of Object.entries(delta)) {
if (key === "role") continue;
if (key === "content") {
if (hasMeaningfulValue(value)) return false;
continue;
}
if (hasMeaningfulValue(value)) return false;
}

return true;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The key === "content" check inside the loop is redundant because the fallback hasMeaningfulValue(value) check performs the exact same validation and returns false if the value is meaningful. We can simplify the loop by removing this redundant block.

function isInertChatCompletionChoice(choice: unknown): boolean {
  if (!isRecord(choice)) return false;
  if (choice.finish_reason != null) return false;

  const delta = choice.delta;
  if (!isRecord(delta)) {
    return true;
  }

  for (const [key, value] of Object.entries(delta)) {
    if (key === "role") continue;
    if (hasMeaningfulValue(value)) return false;
  }

  return true;
}

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1ac5195544

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +2235 to +2238
// Do not cancel internal accounting on pure client disconnect. If the
// upstream stream has already completed, the tee'd internal branch can
// still drain buffered final usage and record the request as successful.
// Idle/response timeout paths still abort via abortController.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep aborting unfinished streams on client disconnect

When the downstream client closes before the upstream SSE has finished, this listener now leaves the internal tee running instead of cancelling/aborting it. The internal reader can drain the upstream to done, and finalizeStream is then called with streamEndedNormally=true even though clientAborted=true, so the request is persisted as the upstream 2xx success rather than a 499 and the provider keeps generating a response the client will never receive. This only intended to rescue already-completed streams, but it changes normal early disconnects into completed billable requests.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/unit/proxy/response-handler-client-abort-drain.test.ts (1)

91-106: ⚡ Quick win

缺少 emitProxyLangfuseTrace 的 mock

ProxyResponseHandler 在 finalize 流程中会调用 emitProxyLangfuseTrace(来自 @/lib/langfuse/emit-proxy-trace),但此测试文件未 mock 该模块。

如果该模块有外部依赖(如网络调用或配置读取),可能导致测试不稳定或意外失败。建议添加:

vi.mock("`@/lib/langfuse/emit-proxy-trace`", () => ({
  emitProxyLangfuseTrace: vi.fn(),
}));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/proxy/response-handler-client-abort-drain.test.ts` around lines 91
- 106, The test is missing a mock for the emitProxyLangfuseTrace call used by
ProxyResponseHandler.finalize; add a mock for "`@/lib/langfuse/emit-proxy-trace`"
that exports emitProxyLangfuseTrace as a vi.fn() so the finalize flow won't
perform external work—look for tests referencing ProxyResponseHandler.finalize
and add vi.mock("`@/lib/langfuse/emit-proxy-trace`", () => ({
emitProxyLangfuseTrace: vi.fn() })) alongside the other vi.mock calls.
src/app/v1/_lib/proxy/response-fixer/response-fixer.test.ts (1)

208-262: 💤 Low value

可选:考虑补充边界情况测试覆盖

当前测试用例验证了核心场景(惰性 chunk 被过滤),但可以考虑补充以下边界情况的测试,以增强置信度:

  1. 非惰性 chunk 不被过滤:验证包含实际 content 的 chunk 被保留
  2. 带 finish_reason 的 chunk 不被过滤:验证完成标记的 chunk 被保留
  3. 带 usage 的 chunk 不被过滤:验证包含用量数据的 chunk 被保留
  4. originalFormat !== "response" 时不触发过滤:验证过滤逻辑仅在 Responses 格式时生效

不过,当前测试已覆盖主流路径,上述补充可在后续迭代中添加。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/app/v1/_lib/proxy/response-fixer/response-fixer.test.ts` around lines 208
- 262, Add additional unit tests in response-fixer.test.ts that cover the edge
cases the reviewer suggested: create tests that (1) send a chat.completion.chunk
with a non-empty choices[0].delta.content and assert
ResponseFixer.process(session, response).text() still includes that chunk (use
createSession() with session.originalFormat = "response"), (2) send a chunk
containing a finish_reason field and assert it is preserved, (3) send a chunk
containing usage data and assert it is preserved, and (4) create a session with
originalFormat !== "response" and assert that the filtering logic does not run
(i.e., the empty chat chunk remains). For each test reuse the existing pattern
(ReadableStream + SSE lines, ResponseFixer.process, await fixed.text()) and
assert presence/absence of identifiers like "chat.completion.chunk",
"finish_reason", and usage keys to locate behavior around ResponseFixer.process
and createSession.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/app/v1/_lib/proxy/response-fixer/response-fixer.test.ts`:
- Around line 208-262: Add additional unit tests in response-fixer.test.ts that
cover the edge cases the reviewer suggested: create tests that (1) send a
chat.completion.chunk with a non-empty choices[0].delta.content and assert
ResponseFixer.process(session, response).text() still includes that chunk (use
createSession() with session.originalFormat = "response"), (2) send a chunk
containing a finish_reason field and assert it is preserved, (3) send a chunk
containing usage data and assert it is preserved, and (4) create a session with
originalFormat !== "response" and assert that the filtering logic does not run
(i.e., the empty chat chunk remains). For each test reuse the existing pattern
(ReadableStream + SSE lines, ResponseFixer.process, await fixed.text()) and
assert presence/absence of identifiers like "chat.completion.chunk",
"finish_reason", and usage keys to locate behavior around ResponseFixer.process
and createSession.

In `@tests/unit/proxy/response-handler-client-abort-drain.test.ts`:
- Around line 91-106: The test is missing a mock for the emitProxyLangfuseTrace
call used by ProxyResponseHandler.finalize; add a mock for
"`@/lib/langfuse/emit-proxy-trace`" that exports emitProxyLangfuseTrace as a
vi.fn() so the finalize flow won't perform external work—look for tests
referencing ProxyResponseHandler.finalize and add
vi.mock("`@/lib/langfuse/emit-proxy-trace`", () => ({ emitProxyLangfuseTrace:
vi.fn() })) alongside the other vi.mock calls.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5647b540-ded8-4997-ba41-aa5e0c546862

📥 Commits

Reviewing files that changed from the base of the PR and between e891e9a and 1ac5195.

📒 Files selected for processing (4)
  • src/app/v1/_lib/proxy/response-fixer/index.ts
  • src/app/v1/_lib/proxy/response-fixer/response-fixer.test.ts
  • src/app/v1/_lib/proxy/response-handler.ts
  • tests/unit/proxy/response-handler-client-abort-drain.test.ts

@github-actions github-actions Bot added the size/L Large PR (< 1000 lines) label Jun 6, 2026
// Do not cancel internal accounting on pure client disconnect. If the
// upstream stream has already completed, the tee'd internal branch can
// still drain buffered final usage and record the request as successful.
// Idle/response timeout paths still abort via abortController.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Medium] [PERFORMANCE-ISSUE] Unbounded resource usage when idle timeout is disabled and upstream hangs

Why this is a problem: The client abort handler no longer calls abortController.abort() or AsyncTaskManager.cancel(taskId). The processing loop at line 2620 also no longer checks session.clientAbortSignal?.aborted, relying entirely on abortController being triggered by the idle timeout. However, when streamingIdleTimeoutMs is 0 (disabled), startIdleTimer() returns early and no timeout is ever set. In this configuration, if the client disconnects and the upstream hangs (stops sending data but does not close the connection), reader.read() blocks indefinitely:

  • The finally block at line 2883 never executes, so the reader lock is never released
  • Accumulated chunks remain in memory
  • The processing promise never resolves
  • AsyncTaskManager.cleanup() at line 2896 never runs

The AsyncTaskManager has a 10-minute stale threshold that calls cancel(), but that aborts the task's own AbortController (returned by register()), not the local abortController that the processing loop checks, so it does not break the loop.

Suggested fix: Add a bounded drain window in the client abort handler that acts as a safety net when idle timeout is disabled:

let drainTimeoutId: NodeJS.Timeout | null = null;

const cleanupClientAbortListener = bindClientAbortListener(session.clientAbortSignal, () => {
  logger.debug("ResponseHandler: Client disconnected, cleaning up", {
    taskId,
    providerId: provider.id,
    messageId: messageContext.id,
  });

  // Allow internal stream to drain for finalization, but cap the drain
  // window to prevent indefinite resource usage when idle timeout is
  // disabled and the upstream hangs.
  const maxDrainMs = idleTimeoutMs === Infinity ? 60_000 : idleTimeoutMs;
  drainTimeoutId = setTimeout(() => {
    if (\!abortController.signal.aborted) {
      logger.info("ResponseHandler: Max drain window exceeded after client disconnect", {
        taskId,
        providerId: provider.id,
      });
      abortController.abort();
    }
  }, maxDrainMs);
});

// ... then in the finally block, add:
if (drainTimeoutId) {
  clearTimeout(drainTimeoutId);
}

This reuses the configured idle timeout when available and falls back to 60 seconds when disabled, preserving the drain behavior while bounding resource usage.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{"body": "## Code Review Summary\n\nThis PR introduces two independent features: (1) allowing internal stream accounting to continue after client disconnect so completed upstream Responses streams can be finalized with their real 2xx status and usage, and (2) filtering inert chat.completion.chunk data lines from Responses SSE output. The implementation is well-structured with conservative reclassification logic and good test coverage. One resource-leak concern was identified when idle timeout is disabled.\n\n### PR Size: L\n- Lines changed: 651\n- Files changed: 4\n\nSplit suggestion: This PR contains two independent changes that could be reviewed separately:\n1. Response-fixer SSE filtering (response-fixer/index.ts + test) - filters inert chat completion chunks from Responses streams\n2. Client abort drain (response-handler.ts + test) - allows internal accounting to continue after client disconnect\n\n### Issues Found\n\n| Category | Critical | High | Medium | Low |\n|----------|----------|------|--------|-----|\n| Logic/Bugs | 0 | 0 | 0 | 0 |\n| Security | 0 | 0 | 0 | 0 |\n| Error Handling | 0 | 0 | 0 | 0 |\n| Types | 0 | 0 | 0 | 0 |\n| Comments/Docs | 0 | 0 | 0 | 0 |\n| Tests | 0 | 0 | 0 | 0 |\n| Performance | 0 | 0 | 1 | 0 |\n| Simplification | 0 | 0 | 0 | 0 |\n\n### Medium Priority Issues (Should Fix)\n\n1. Unbounded resource usage when idle timeout disabled (response-handler.ts:2238)\n - When streamingIdleTimeoutMs is 0 (disabled) and a client disconnects, the internal processing loop has no safety net to abort if the upstream hangs. The old code called abortController.abort() on client disconnect; the new code intentionally skips this but relies on idle timeout as the sole safety mechanism. When idle timeout is disabled, reader.read() can block indefinitely, preventing the finally block from releasing the reader lock, freeing memory, or cleaning up the AsyncTaskManager entry. The suggested fix is to add a bounded drain window (e.g., 60s fallback) in the client abort handler.\n\n### Review Coverage\n- [x] Logic and correctness\n- [x] Security (OWASP Top 10)\n- [x] Error handling\n- [x] Type safety\n- [x] Documentation accuracy\n- [x] Test coverage\n- [x] Code clarity\n\n---\nAutomated review by Claude AI"}

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR introduces two independent features: (1) allowing internal stream accounting to continue after client disconnect so completed upstream Responses streams can be finalized with their real 2xx status and usage, and (2) filtering inert chat.completion.chunk data lines from Responses SSE output. The implementation is well-structured with conservative reclassification logic and good test coverage. One resource-leak concern was identified when idle timeout is disabled.

PR Size: L

  • Lines changed: 651
  • Files changed: 4

Split suggestion: This PR contains two independent changes that could be reviewed separately:

  1. Response-fixer SSE filtering (response-fixer/index.ts + test) - filters inert chat completion chunks from Responses streams
  2. Client abort drain (response-handler.ts + test) - allows internal accounting to continue after client disconnect

Issues Found

Category Critical High Medium Low
Logic/Bugs 0 0 0 0
Security 0 0 0 0
Error Handling 0 0 0 0
Types 0 0 0 0
Comments/Docs 0 0 0 0
Tests 0 0 0 0
Performance 0 0 1 0
Simplification 0 0 0 0

Medium Priority Issues (Should Fix)

  1. Unbounded resource usage when idle timeout disabled (response-handler.ts:2238)
    • When streamingIdleTimeoutMs is 0 (disabled) and a client disconnects, the internal processing loop has no safety net to abort if the upstream hangs. The old code called abortController.abort() on client disconnect; the new code intentionally skips this but relies on idle timeout as the sole safety mechanism. When idle timeout is disabled, reader.read() can block indefinitely, preventing the finally block from releasing the reader lock, freeing memory, or cleaning up the AsyncTaskManager entry. The suggested fix is to add a bounded drain window (e.g., 60s fallback) in the client abort handler.

Review Coverage

  • Logic and correctness
  • Security (OWASP Top 10)
  • Error handling
  • Type safety
  • Documentation accuracy
  • Test coverage
  • Code clarity

Automated review by Claude AI

@Brisbanehuang Brisbanehuang force-pushed the codex/cch-client-abort-sse-sanitize branch from 1ac5195 to 0e07d38 Compare June 6, 2026 16:20
@Brisbanehuang Brisbanehuang changed the base branch from main to dev June 6, 2026 16:20
@coderabbitai coderabbitai Bot requested a review from ding113 June 6, 2026 16:21
@chatgpt-codex-connector

Copy link
Copy Markdown

💡 Codex Review

const actualServiceTier = parseServiceTierFromResponseText(allContent);
const priorityServiceTierApplied =
(await resolveCodexPriorityBillingDecision(loserSession, actualServiceTier))
?.effectivePriority ?? false;

P2 Badge Preserve loser service tier when billing drained Codex losers

When the initial provider loses to an alternative, commitWinner calls syncWinningAttemptSession, which overwrites the shared loserSession with the winner's provider/request; this call then decides Codex priority billing from that polluted session instead of from the loser provider/snapshot. In a hedge where an initial Codex service_tier: "priority" attempt is drained and billed after a non-Codex (or non-priority) alternative wins, effectivePriority becomes false and the loser cost/Redis spend is under-billed even though the loser actually used priority service tier.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@Brisbanehuang

Copy link
Copy Markdown
Author

This looks like a valid issue in the hedge-loser billing path, but it belongs to the current dev base from #1247 rather than this PR’s changes. This branch only adds client-abort finalization and Responses SSE sanitizing; it does not change the finalizeHedgeLoserBilling block around these lines.

I’ll track this separately so #1251 stays scoped to the 499/client-abort fix.

@ding113 ding113 merged commit 9fcb5ac into ding113:dev Jun 11, 2026
2 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in Claude Code Hub Roadmap Jun 11, 2026
@github-actions github-actions Bot mentioned this pull request Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core area:OpenAI bug Something isn't working size/L Large PR (< 1000 lines)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

为啥大部分请求都是499 调用gpt 5.5模型日志显示499,但是中转调用成功 状态499问题

2 participants