fix(proxy): finalize completed responses streams after client abort by Brisbanehuang · Pull Request #1251 · ding113/claude-code-hub

Brisbanehuang · 2026-06-06T14:37:40Z

Summary

keep internal stream accounting alive briefly after downstream disconnects so completed upstream Responses streams can finalize with their real 2xx status and usage instead of being stored as 499 CLIENT_ABORTED
bound that post-disconnect drain window so a hung upstream cannot keep the internal reader alive indefinitely when idle timeout is disabled
filter inert chat.completion.chunk SSE data lines from /v1/responses output, while preserving non-empty chunks, finish markers, usage-bearing chunks, and non-Responses sessions

Related Issues:

Fixes 为啥大部分请求都是499 #1083 - completed Responses streams can be misreported as 499 CLIENT_ABORTED
Fixes 状态499问题 #985 - upstream succeeds but CCH records 499 status
Fixes 调用gpt 5.5模型日志显示499，但是中转调用成功 #1242 - GPT-5.5 proxy calls show 499 in logs despite successful completion
Related to Codex /v1/responses stream can crash CCH process via uncaught write EPIPE #1234 - same Codex /v1/responses stream area; EPIPE crash addressed separately in fix(lifecycle): don't crash process on benign broken-pipe from streaming disconnect (#1234) #1239
Alternative approach to fix: false 499 on completed Codex Responses streams #1249 - this PR preserves the internal drain path instead of intercepting terminal state before tee()

Approach

Client-abort finalization
- On pure client disconnect, the internal accounting branch is not immediately cancelled.
- finalizeDeferredStreamingFinalizationIfNeeded reclassifies client-aborted streams as success only when the upstream status is 2xx, no upstream error payload is detected, and positive usage was parsed.
- True mid-stream aborts still finalize as 499, and timeout/upstream failures still finalize as failures.
- A bounded drain timer aborts both the upstream responseController and local accounting loop if the drain window is exceeded.
Responses SSE sanitizer
- ResponseFixer strips only inert chat.completion.chunk data: lines for session.originalFormat === "response".
- The immediately following blank SSE separator is also removed so filtering does not leave an orphan empty event frame.
- Chunks with content, finish_reason, usage, or non-Responses format are preserved.
Rebased on dev
- This branch is now based on dev and preserves feat(billing): 供应商竞速输家计费(保活拿回响应 + 幂等累加回写) #1247's hedge-loser billing semantics.
- The winner cost update still uses finalized.billHedgeLosers, so client-abort success reclassification does not clobber additive loser billing.

Commits

fix(proxy): finalize complete responses after client abort
fix(proxy): sanitize inert chat chunks in responses streams

Validation

bunx @biomejs/biome@2.4.15 check src/app/v1/_lib/proxy/response-fixer/index.ts src/app/v1/_lib/proxy/response-fixer/response-fixer.test.ts src/app/v1/_lib/proxy/response-handler.ts tests/unit/proxy/response-handler-client-abort-drain.test.ts
bun run test src/app/v1/_lib/proxy/response-fixer/response-fixer.test.ts tests/unit/proxy/response-handler-client-abort-drain.test.ts
bun run build

Additional local check:

bun run test

This full local test run currently fails in two files outside this PR's changed paths:

tests/unit/k8s-deploy-shell-helpers.test.ts - macOS tr: Illegal byte sequence in the /dev/urandom fallback path
src/components/ui/__tests__/language-switcher.test.tsx - expected console.error call is not emitted in the blocked sessionStorage test

Those failures are reproducible when run individually and are not in files touched by this PR.

Greptile Summary

This PR addresses two distinct problems in the Responses streaming proxy path: (1) false 499 CLIENT_ABORTED status codes recorded for completed upstream streams when the downstream client disconnects before the internal accounting loop observes the final SSE event, and (2) schema-validation failures in strict Responses clients when OpenAI-compatible upstreams inject empty chat.completion.chunk events into a /v1/responses stream. It also introduces a hedge-loser billing feature that drains losing-race-attempt responses in the background to capture and bill their real token usage.

Client-abort drain: On client disconnect, the internal tee'd accounting branch now continues draining buffered data rather than being cancelled immediately; a new clientAbortCompleteSuccess gate reclassifies the stream as the real 2xx status when 2xx upstream code, no fake-error detected, and positive billable tokens are all confirmed.
SSE sanitizer: ResponseFixer.filterInertResponsesChatCompletionChunks strips inert chat.completion.chunk data lines (empty content, no tool calls, no finish reason, no usage) from Responses SSE output only when session.originalFormat === \"response\", with the blank-line separator following each filtered line also elided via the skipNextBlankLine flag.
Hedge-loser billing: Losing hedge attempts are kept alive to drain their response bodies, usage is parsed, and costs are accumulated onto the original request row via an idempotent SQL pattern (winner: winnerCost + SUM(hedge_losers); loser: costUsd += delta) that is commutative regardless of ordering.

Confidence Score: 5/5

Safe to merge; the drain-based reclassification is conservatively gated on three independent checks, the SSE sanitizer early-exits and returns original bytes when no inert chunk is present, and the hedge-loser billing SQL is commutative and idempotent regardless of winner/loser write ordering.

The clientAbortCompleteSuccess reclassification requires 2xx upstream status, no fake-error detected, and positive billable tokens — all three must hold simultaneously, making false reclassification of genuinely aborted streams effectively impossible. The SSE sanitizer's JSON.parse safety gate means malformed or partial chunks pass through unchanged. The concurrent-write SQL for hedge-loser billing converges to the correct grand total under all Postgres row-lock orderings because the winner uses an idempotent replacement (winnerCost + SUM(current losers)) and each loser uses an additive delta with a JSONB dedup guard. Four new unit tests cover the critical paths including the drain-timeout bound. Only a stale inline comment remains.

No files require special attention; the comment at line 558 of response-handler.ts is a documentation nit with no behavioral impact.

Important Files Changed

Filename	Overview
src/app/v1/_lib/proxy/response-handler.ts	Core drain logic: removes AsyncTaskManager.cancel/abortController.abort from client-disconnect handler, adds clientAbortDrainTimeoutMs safety bound, removes clientAbortSignal check from internal loop, and adds clientAbortCompleteSuccess reclassification. Also introduces finalizeHedgeLoserBilling and winnerLoserAware cost-write path. One inline comment (line 558) is stale after the new reclassification branch was added.
src/app/v1/_lib/proxy/response-fixer/index.ts	Adds filterInertResponsesChatCompletionChunks with skipNextBlankLine to remove inert chat chunks and their SSE separator from Responses streams. Module-level UTF8_DECODER/ENCODER singletons used in stateless mode, correct for per-chunk processing.
src/app/v1/_lib/proxy/forwarder.ts	Adds StreamingHedgeAttempt billing fields (billAsLoser, loserBillingStarted, firstChunk, billingSnapshot), startLoserBilling IIFE, handleAttemptFailure guard for billed-loser cleanup, commitWinner billing-context snapshot for initial-provider losers, and billHedgeLosers flag propagation. Idempotency guard and concurrent-write ordering are correct.
src/repository/message.ts	Adds updateMessageRequestWinnerCost (replacement write: winnerCost + SUM(hedge_losers)) and addMessageRequestHedgeLoserCost (additive write with JSONB dedup guard). SQL design is commutative and idempotent under retry; both functions bypass the async write buffer for durability.
drizzle/0104_watery_thunderbird.sql	Adds hedge_losers JSONB column and bill_hedge_losers boolean (default true). Recreates fn_is_message_request_finalized and fn_compute_message_request_success_rate_outcome to include hedge_loser_billed in finalization/exclusion logic. SQL stored procedures correctly updated.
src/lib/utils/hedge-billing.ts	New utility: summarizeHedgeBilling, buildHedgeBillingTable, findHedgeLoserCost. Winner cost derived as total − sum(losers), clamped at 0. Clean pure-function implementation.
tests/unit/proxy/response-handler-client-abort-drain.test.ts	Four new integration tests covering: completed stream finalized as 200 after pre-aborted client; reclassification when final usage received before stream error; genuinely aborted stream stays 499; drain timeout with fake timers bounds a hanging upstream and aborts it.

Sequence Diagram

sequenceDiagram
    participant Client
    participant ProxyResponseHandler
    participant InternalLoop as Internal Accounting Loop
    participant DB as Database

    Client->>ProxyResponseHandler: POST /v1/responses (streaming)
    ProxyResponseHandler->>ProxyResponseHandler: tee() body into client branch + internal branch
    ProxyResponseHandler-->>Client: stream chunks

    Note over Client: Client closes connection early
    Client--xProxyResponseHandler: disconnect (clientAbortSignal.aborted)

    ProxyResponseHandler->>ProxyResponseHandler: bindClientAbortListener fires
    Note over ProxyResponseHandler: start clientAbortDrainTimer (60s), do NOT cancel internal loop

    InternalLoop->>InternalLoop: continues draining buffered tee branch
    InternalLoop->>InternalLoop: reads response.completed with usage tokens

    alt "streamEndedNormally=false, clientAborted=true, 2xx + no fake error + positive usage"
        InternalLoop->>DB: "updateMessageRequestDetails(statusCode=200)"
        Note over InternalLoop: Reclassified as success
    else upstream truly aborted mid-stream
        InternalLoop->>DB: "updateMessageRequestDetails(statusCode=499, CLIENT_ABORTED)"
    end

    ProxyResponseHandler->>ProxyResponseHandler: clearClientAbortDrainTimer()

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
src/app/v1/_lib/proxy/response-handler.ts:558
**Stale inline comment after new reclassification branch**

The comment on this line states that skipping fake-200 detection means the stream "will still ultimately be finalized as abort/timeout failure below" (`最终仍会在下面按中断/超时视为失败结算`). The `clientAbortCompleteSuccess` path added just below can now reclassify a client-aborted stream as 2xx success, so the always-fails assertion no longer holds and should be updated to mention the new path.

_{Reviews (2): Last reviewed commit: "fix(proxy): sanitize inert chat chunks i..." | Re-trigger Greptile}

coderabbitai · 2026-06-06T14:37:53Z

📝 Walkthrough

Walkthrough

PR 在 ResponseFixer 与 ResponseHandler 中修正 499 状态误记录问题。当客户端中止但上游已返回 HTTP 200 且包含有效 token 时，新增 clientAbortCompleteSuccess 逻辑将其重分类为成功；同时引入 SSE 惰性 chunk 过滤（仅 Responses 格式），并优化客户端断开边界的 drain 超时控制。

Changes

SSE 过滤与客户端中止处理

Layer / File(s)	Summary
ResponseFixer 惰性 chunk 过滤实现 `src/app/v1/_lib/proxy/response-fixer/index.ts`	新增 UTF-8 编解码常量与 `isInertChatCompletionPayload` 等私有判定函数；在 SSE transform 与 flush 阶段调用 `filterInertResponsesChatCompletionChunks`，对满足"惰性"条件（finish_reason 为 null、delta 内仅包含 role、usage 与 choices 无实际内容）的 chat.completion.chunk 行进行过滤并设置 `filtered_inert_chat_completion_chunk` 详情。
ResponseFixer 惰性 chunk 过滤测试 `src/app/v1/_lib/proxy/response-fixer/response-fixer.test.ts`	新增 `createSseResponse` 辅助函数与多条测试用例，验证在 Responses 格式下过滤空 content chunk、保留含真实 content/finish_reason/usage 的 chunk、以及在非 Responses 格式下不过滤的行为；测试断言 `specialSettings` 的写入时机。
ResponseHandler 客户端中止完成成功检测 `src/app/v1/_lib/proxy/response-handler.ts`	在 `finalizeDeferredStreamingFinalizationIfNeeded` 中新增 `clientAbortCompleteSuccess` IIFE：当上游 HTTP 200 且满足客户端中止/非自然结束条件时，从 `allContent` 解析 usage；若存在正向可计费 token 则标记为成功，将 `effectiveStatusCode` 设为 200 且清空 `errorMessage`，从而避免误记录为 499 失败。同时调整 `shouldClearSessionBindingOnFailure` 与非自然结束处理前置条件，加入 `!clientAbortCompleteSuccess` 限制。
ResponseHandler 客户端 abort drain 超时机制 `src/app/v1/_lib/proxy/response-handler.ts`	在客户端断开处理 `bindClientAbortListener` 中引入 drain 超时窗口：不直接取消内部计费，改为启动定时器，超时后通过 `responseController?.abort()` 与 `abortController.abort()` 终止上游与内部处理；调整读取循环检查仅监视 `abortController.signal.aborted`，并在 finally 中清理 `clientAbortDrainTimeoutId`。
ResponseHandler 客户端 abort drain 测试套件 `tests/unit/proxy/response-handler-client-abort-drain.test.ts`	新增测试文件，通过 mock SSE Response 与 vitest 验证四个客户端 abort 场景：下游关闭但上游成功完成（不取消任务）、final usage 后重分类为成功（归类为 reason: "request_success"）、上游真实中止报错（以 499 + CLIENT_ABORTED 更新）、以及 drain 超时后强制 abort 的行为；使用假计时器与 `drainAsyncTasks()` 确保异步任务完成与时序稳定。

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

ding113/claude-code-hub#1113: 该 PR 对 response-handler.ts 中的客户端中止监听器绑定与清理逻辑进行集中重构，与本 PR 的 abort drain 定时器与 finally 清理逻辑存在直接重叠。
ding113/claude-code-hub#759: 该 PR 同样修改 response-handler.ts 中的 abort/最终化逻辑及 abortReason 与有效状态码处理，覆盖与本 PR 相同的责任边界。

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 6.25% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	标题准确概括了PR的主要变更：在客户端中断后完成Responses流的最终化处理。
Linked Issues check	✅ Passed	代码变更完整解决了所有关联问题：修复了客户端断开后错误记录499状态的问题[`#1083`,`#985`,`#1242`]，并通过clientAbortCompleteSuccess重分类和Responses SSE过滤实现。
Out of Scope Changes check	✅ Passed	所有代码变更都在关联问题的范围内：客户端断开drain处理、SSE inert chunk过滤，以及相关测试覆盖，未包含超出范围的修改。
Description check	✅ Passed	PR 描述详细说明了修复内容（客户端断开后的流账户处理、499 误报问题、SSE 过滤）及其关联问题，与变更集内容完全相符。

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a mechanism to filter out inert or empty chat completion chunks from SSE streams and updates the response handler to treat client-aborted streams as successful if final usage metrics have already been received. Unit tests have been added to verify both features. The review feedback suggests simplifying the isInertChatCompletionChoice helper function by removing a redundant check for the 'content' key, which is already covered by the fallback validation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-06T14:39:34Z

+function isInertChatCompletionChoice(choice: unknown): boolean {
+  if (!isRecord(choice)) return false;
+  if (choice.finish_reason != null) return false;
+
+  const delta = choice.delta;
+  if (!isRecord(delta)) {
+    return true;
+  }
+
+  for (const [key, value] of Object.entries(delta)) {
+    if (key === "role") continue;
+    if (key === "content") {
+      if (hasMeaningfulValue(value)) return false;
+      continue;
+    }
+    if (hasMeaningfulValue(value)) return false;
+  }
+
+  return true;
+}


The key === "content" check inside the loop is redundant because the fallback hasMeaningfulValue(value) check performs the exact same validation and returns false if the value is meaningful. We can simplify the loop by removing this redundant block.

function isInertChatCompletionChoice(choice: unknown): boolean { if (!isRecord(choice)) return false; if (choice.finish_reason != null) return false; const delta = choice.delta; if (!isRecord(delta)) { return true; } for (const [key, value] of Object.entries(delta)) { if (key === "role") continue; if (hasMeaningfulValue(value)) return false; } return true; }

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1ac5195544

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-06T14:41:17Z

+      // Do not cancel internal accounting on pure client disconnect. If the
+      // upstream stream has already completed, the tee'd internal branch can
+      // still drain buffered final usage and record the request as successful.
+      // Idle/response timeout paths still abort via abortController.


Keep aborting unfinished streams on client disconnect

When the downstream client closes before the upstream SSE has finished, this listener now leaves the internal tee running instead of cancelling/aborting it. The internal reader can drain the upstream to done, and finalizeStream is then called with streamEndedNormally=true even though clientAborted=true, so the request is persisted as the upstream 2xx success rather than a 499 and the provider keeps generating a response the client will never receive. This only intended to rescue already-completed streams, but it changes normal early disconnects into completed billable requests.

Useful? React with 👍 / 👎.

coderabbitai

🧹 Nitpick comments (2)

tests/unit/proxy/response-handler-client-abort-drain.test.ts (1)
91-106: ⚡ Quick win

缺少 emitProxyLangfuseTrace 的 mock

ProxyResponseHandler 在 finalize 流程中会调用 emitProxyLangfuseTrace（来自 @/lib/langfuse/emit-proxy-trace），但此测试文件未 mock 该模块。

如果该模块有外部依赖（如网络调用或配置读取），可能导致测试不稳定或意外失败。建议添加：
vi.mock("`@/lib/langfuse/emit-proxy-trace`", () => ({
  emitProxyLangfuseTrace: vi.fn(),
}));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/proxy/response-handler-client-abort-drain.test.ts` around lines 91
- 106, The test is missing a mock for the emitProxyLangfuseTrace call used by
ProxyResponseHandler.finalize; add a mock for "`@/lib/langfuse/emit-proxy-trace`"
that exports emitProxyLangfuseTrace as a vi.fn() so the finalize flow won't
perform external work—look for tests referencing ProxyResponseHandler.finalize
and add vi.mock("`@/lib/langfuse/emit-proxy-trace`", () => ({
emitProxyLangfuseTrace: vi.fn() })) alongside the other vi.mock calls.
src/app/v1/_lib/proxy/response-fixer/response-fixer.test.ts (1)
208-262: 💤 Low value

可选：考虑补充边界情况测试覆盖

当前测试用例验证了核心场景（惰性 chunk 被过滤），但可以考虑补充以下边界情况的测试，以增强置信度：

非惰性 chunk 不被过滤：验证包含实际 content 的 chunk 被保留

带 finish_reason 的 chunk 不被过滤：验证完成标记的 chunk 被保留

带 usage 的 chunk 不被过滤：验证包含用量数据的 chunk 被保留

originalFormat !== "response" 时不触发过滤：验证过滤逻辑仅在 Responses 格式时生效

不过，当前测试已覆盖主流路径，上述补充可在后续迭代中添加。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/app/v1/_lib/proxy/response-fixer/response-fixer.test.ts` around lines 208
- 262, Add additional unit tests in response-fixer.test.ts that cover the edge
cases the reviewer suggested: create tests that (1) send a chat.completion.chunk
with a non-empty choices[0].delta.content and assert
ResponseFixer.process(session, response).text() still includes that chunk (use
createSession() with session.originalFormat = "response"), (2) send a chunk
containing a finish_reason field and assert it is preserved, (3) send a chunk
containing usage data and assert it is preserved, and (4) create a session with
originalFormat !== "response" and assert that the filtering logic does not run
(i.e., the empty chat chunk remains). For each test reuse the existing pattern
(ReadableStream + SSE lines, ResponseFixer.process, await fixed.text()) and
assert presence/absence of identifiers like "chat.completion.chunk",
"finish_reason", and usage keys to locate behavior around ResponseFixer.process
and createSession.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/app/v1/_lib/proxy/response-fixer/response-fixer.test.ts`:
- Around line 208-262: Add additional unit tests in response-fixer.test.ts that
cover the edge cases the reviewer suggested: create tests that (1) send a
chat.completion.chunk with a non-empty choices[0].delta.content and assert
ResponseFixer.process(session, response).text() still includes that chunk (use
createSession() with session.originalFormat = "response"), (2) send a chunk
containing a finish_reason field and assert it is preserved, (3) send a chunk
containing usage data and assert it is preserved, and (4) create a session with
originalFormat !== "response" and assert that the filtering logic does not run
(i.e., the empty chat chunk remains). For each test reuse the existing pattern
(ReadableStream + SSE lines, ResponseFixer.process, await fixed.text()) and
assert presence/absence of identifiers like "chat.completion.chunk",
"finish_reason", and usage keys to locate behavior around ResponseFixer.process
and createSession.

In `@tests/unit/proxy/response-handler-client-abort-drain.test.ts`:
- Around line 91-106: The test is missing a mock for the emitProxyLangfuseTrace
call used by ProxyResponseHandler.finalize; add a mock for
"`@/lib/langfuse/emit-proxy-trace`" that exports emitProxyLangfuseTrace as a
vi.fn() so the finalize flow won't perform external work—look for tests
referencing ProxyResponseHandler.finalize and add
vi.mock("`@/lib/langfuse/emit-proxy-trace`", () => ({ emitProxyLangfuseTrace:
vi.fn() })) alongside the other vi.mock calls.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5647b540-ded8-4997-ba41-aa5e0c546862

📥 Commits

Reviewing files that changed from the base of the PR and between e891e9a and 1ac5195.

📒 Files selected for processing (4)

src/app/v1/_lib/proxy/response-fixer/index.ts
src/app/v1/_lib/proxy/response-fixer/response-fixer.test.ts
src/app/v1/_lib/proxy/response-handler.ts
tests/unit/proxy/response-handler-client-abort-drain.test.ts

github-actions · 2026-06-06T14:53:21Z

+      // Do not cancel internal accounting on pure client disconnect. If the
+      // upstream stream has already completed, the tee'd internal branch can
+      // still drain buffered final usage and record the request as successful.
+      // Idle/response timeout paths still abort via abortController.


[Medium] [PERFORMANCE-ISSUE] Unbounded resource usage when idle timeout is disabled and upstream hangs

Why this is a problem: The client abort handler no longer calls abortController.abort() or AsyncTaskManager.cancel(taskId). The processing loop at line 2620 also no longer checks session.clientAbortSignal?.aborted, relying entirely on abortController being triggered by the idle timeout. However, when streamingIdleTimeoutMs is 0 (disabled), startIdleTimer() returns early and no timeout is ever set. In this configuration, if the client disconnects and the upstream hangs (stops sending data but does not close the connection), reader.read() blocks indefinitely:

The finally block at line 2883 never executes, so the reader lock is never released

Accumulated chunks remain in memory

The processing promise never resolves

AsyncTaskManager.cleanup() at line 2896 never runs

The AsyncTaskManager has a 10-minute stale threshold that calls cancel(), but that aborts the task's own AbortController (returned by register()), not the local abortController that the processing loop checks, so it does not break the loop.

Suggested fix: Add a bounded drain window in the client abort handler that acts as a safety net when idle timeout is disabled:

let drainTimeoutId: NodeJS.Timeout | null = null; const cleanupClientAbortListener = bindClientAbortListener(session.clientAbortSignal, () => { logger.debug("ResponseHandler: Client disconnected, cleaning up", { taskId, providerId: provider.id, messageId: messageContext.id, }); // Allow internal stream to drain for finalization, but cap the drain // window to prevent indefinite resource usage when idle timeout is // disabled and the upstream hangs. const maxDrainMs = idleTimeoutMs === Infinity ? 60_000 : idleTimeoutMs; drainTimeoutId = setTimeout(() => { if (\!abortController.signal.aborted) { logger.info("ResponseHandler: Max drain window exceeded after client disconnect", { taskId, providerId: provider.id, }); abortController.abort(); } }, maxDrainMs); }); // ... then in the finally block, add: if (drainTimeoutId) { clearTimeout(drainTimeoutId); }

This reuses the configured idle timeout when available and falls back to 60 seconds when disabled, preserving the drain behavior while bounding resource usage.

github-actions

{"body": "## Code Review Summary\n\nThis PR introduces two independent features: (1) allowing internal stream accounting to continue after client disconnect so completed upstream Responses streams can be finalized with their real 2xx status and usage, and (2) filtering inert chat.completion.chunk data lines from Responses SSE output. The implementation is well-structured with conservative reclassification logic and good test coverage. One resource-leak concern was identified when idle timeout is disabled.\n\n### PR Size: L\n- Lines changed: 651\n- Files changed: 4\n\nSplit suggestion: This PR contains two independent changes that could be reviewed separately:\n1. Response-fixer SSE filtering (response-fixer/index.ts + test) - filters inert chat completion chunks from Responses streams\n2. Client abort drain (response-handler.ts + test) - allows internal accounting to continue after client disconnect\n\n### Issues Found\n\n| Category | Critical | High | Medium | Low |\n|----------|----------|------|--------|-----|\n| Logic/Bugs | 0 | 0 | 0 | 0 |\n| Security | 0 | 0 | 0 | 0 |\n| Error Handling | 0 | 0 | 0 | 0 |\n| Types | 0 | 0 | 0 | 0 |\n| Comments/Docs | 0 | 0 | 0 | 0 |\n| Tests | 0 | 0 | 0 | 0 |\n| Performance | 0 | 0 | 1 | 0 |\n| Simplification | 0 | 0 | 0 | 0 |\n\n### Medium Priority Issues (Should Fix)\n\n1. Unbounded resource usage when idle timeout disabled (response-handler.ts:2238)\n - When streamingIdleTimeoutMs is 0 (disabled) and a client disconnects, the internal processing loop has no safety net to abort if the upstream hangs. The old code called abortController.abort() on client disconnect; the new code intentionally skips this but relies on idle timeout as the sole safety mechanism. When idle timeout is disabled, reader.read() can block indefinitely, preventing the finally block from releasing the reader lock, freeing memory, or cleaning up the AsyncTaskManager entry. The suggested fix is to add a bounded drain window (e.g., 60s fallback) in the client abort handler.\n\n### Review Coverage\n- [x] Logic and correctness\n- [x] Security (OWASP Top 10)\n- [x] Error handling\n- [x] Type safety\n- [x] Documentation accuracy\n- [x] Test coverage\n- [x] Code clarity\n\n---\nAutomated review by Claude AI"}

github-actions

Code Review Summary

This PR introduces two independent features: (1) allowing internal stream accounting to continue after client disconnect so completed upstream Responses streams can be finalized with their real 2xx status and usage, and (2) filtering inert chat.completion.chunk data lines from Responses SSE output. The implementation is well-structured with conservative reclassification logic and good test coverage. One resource-leak concern was identified when idle timeout is disabled.

PR Size: L

Lines changed: 651
Files changed: 4

Split suggestion: This PR contains two independent changes that could be reviewed separately:

Response-fixer SSE filtering (response-fixer/index.ts + test) - filters inert chat completion chunks from Responses streams
Client abort drain (response-handler.ts + test) - allows internal accounting to continue after client disconnect

Issues Found

Category	Critical	High	Medium	Low
Logic/Bugs	0	0	0	0
Security	0	0	0	0
Error Handling	0	0	0	0
Types	0	0	0	0
Comments/Docs	0	0	0	0
Tests	0	0	0	0
Performance	0	0	1	0
Simplification	0	0	0	0

Medium Priority Issues (Should Fix)

Unbounded resource usage when idle timeout disabled (response-handler.ts:2238)
- When streamingIdleTimeoutMs is 0 (disabled) and a client disconnects, the internal processing loop has no safety net to abort if the upstream hangs. The old code called abortController.abort() on client disconnect; the new code intentionally skips this but relies on idle timeout as the sole safety mechanism. When idle timeout is disabled, reader.read() can block indefinitely, preventing the finally block from releasing the reader lock, freeing memory, or cleaning up the AsyncTaskManager entry. The suggested fix is to add a bounded drain window (e.g., 60s fallback) in the client abort handler.

Review Coverage

Automated review by Claude AI

chatgpt-codex-connector · 2026-06-06T16:25:41Z

💡 Codex Review

claude-code-hub/src/app/v1/_lib/proxy/response-handler.ts

Lines 3881 to 3884 in 0e07d38

    
           const actualServiceTier = parseServiceTierFromResponseText(allContent); 
        
           const priorityServiceTierApplied = 
        
             (await resolveCodexPriorityBillingDecision(loserSession, actualServiceTier)) 
        
               ?.effectivePriority ?? false;

Preserve loser service tier when billing drained Codex losers

When the initial provider loses to an alternative, commitWinner calls syncWinningAttemptSession, which overwrites the shared loserSession with the winner's provider/request; this call then decides Codex priority billing from that polluted session instead of from the loser provider/snapshot. In a hedge where an initial Codex service_tier: "priority" attempt is drained and billed after a non-Codex (or non-priority) alternative wins, effectivePriority becomes false and the loser cost/Redis spend is under-billed even though the loser actually used priority service tier.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Brisbanehuang · 2026-06-06T17:01:22Z

This looks like a valid issue in the hedge-loser billing path, but it belongs to the current dev base from #1247 rather than this PR’s changes. This branch only adds client-abort finalization and Responses SSE sanitizing; it does not change the finalizeHedgeLoserBilling block around these lines.

I’ll track this separately so #1251 stays scoped to the 499/client-abort fix.

github-project-automation Bot added this to Claude Code Hub Roadmap Jun 6, 2026

github-project-automation Bot moved this to Backlog in Claude Code Hub Roadmap Jun 6, 2026

github-actions Bot added bug Something isn't working area:core area:OpenAI labels Jun 6, 2026

Brisbanehuang marked this pull request as ready for review June 6, 2026 14:39

gemini-code-assist Bot reviewed Jun 6, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 6, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 6, 2026

View reviewed changes

coderabbitai Bot approved these changes Jun 6, 2026

View reviewed changes

github-actions Bot added the size/L Large PR (< 1000 lines) label Jun 6, 2026

github-actions Bot reviewed Jun 6, 2026

View reviewed changes

Brisbanehuang added 2 commits June 7, 2026 00:19

fix(proxy): finalize complete responses after client abort

7efbef0

fix(proxy): sanitize inert chat chunks in responses streams

0e07d38

Brisbanehuang force-pushed the codex/cch-client-abort-sse-sanitize branch from 1ac5195 to 0e07d38 Compare June 6, 2026 16:20

Brisbanehuang changed the base branch from main to dev June 6, 2026 16:20

coderabbitai Bot requested a review from ding113 June 6, 2026 16:21

ding113 merged commit 9fcb5ac into ding113:dev Jun 11, 2026
2 checks passed

github-project-automation Bot moved this from Backlog to Done in Claude Code Hub Roadmap Jun 11, 2026

github-actions Bot mentioned this pull request Jun 11, 2026

release v0.8.6 #1273

Open

Uh oh!

Conversation

Brisbanehuang commented Jun 6, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Approach

Commits

Validation

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

coderabbitai Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Code Review Summary

PR Size: L

Issues Found

Medium Priority Issues (Should Fix)

Review Coverage

Uh oh!

chatgpt-codex-connector Bot commented Jun 6, 2026

💡 Codex Review

Uh oh!

Brisbanehuang commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Brisbanehuang commented Jun 6, 2026 •

edited by greptile-apps Bot

Loading

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading