fix: false 499 on completed Codex Responses streams#1249
Conversation
Codex CLI closes the connection immediately after receiving the terminal SSE event (response.completed/incomplete). Previously CCH observed the client abort before the upstream reader finished, so successful requests were logged as 499 CLIENT_ABORTED. Root cause: the internal reader that builds allContent breaks early on clientAbort, so allContent is truncated and misses the terminal event. Fix: track Codex terminal state, usage, service_tier and prompt_cache_key incrementally in a TransformStream before tee(), so the data is captured regardless of abort timing. Changes: - response-handler.ts: add createCodexResponsesTerminalStateTracker (incremental SSE parser, last-event-wins, finalize() for trailing data without boundary newline). tracker is the primary source of truth for Codex deferred streams; allContent is the fallback. codexTerminalNonSuccess (failed/error) -> 502 + recordFailure. codexTerminalFinished (completed/incomplete) + clientAbort -> 200. providerType captured in deferred meta at set-time for failover. - stream-finalization.ts: add optional providerType to DeferredStreamingFinalization; normalize at set-time from meta or session provider. - tests: 30 cases covering full terminal-state matrix (completed/failed/incomplete/error/response.error, abort timing, provider-mismatch, multiline SSE, trailing-\n\n-less event, non-Codex unaffected, prompt_cache_key preservation). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
📝 WalkthroughWalkthrough该 PR 为 Codex "Responses" SSE 流添加了终态追踪与最终化决策:实现按 SSE 边界解析终态/usage/service_tier/prompt_cache_key 的 tracker,forward-side 实时捕获并把终态注入 deferred finalization,收紧假200检测并调整失败早返回与 session 清理,附带大量单元测试覆盖终态与 abort 时序。 ChangesCodex 终态追踪与流最终化
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a terminal state tracker for Codex responses to robustly handle SSE streams and finalize deferred streaming, especially during client aborts or abnormal stream terminations. It also adds comprehensive unit tests to validate these scenarios. The review feedback focuses on enhancing the robustness of the finalization logic in the terminal state tracker, suggesting a fallback to raw remainder text when parsing data lines and removing unnecessary guards to ensure metrics and cache keys are consistently extracted even if JSON parsing fails.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: da9bf030fd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Code Review Summary
This PR implements a robust solution to fix false 499 CLIENT_ABORTED accounting for Codex Responses SSE streams. The implementation is well-architected with comprehensive test coverage.
PR Size: XL
- Lines changed: 1581 (1540 additions, 41 deletions)
- Files changed: 3
Note on PR size: While this PR is XL-sized, it is appropriately scoped. The bulk of the changes (1297 lines) are comprehensive unit tests (30 test cases covering the full terminal-state matrix). The core implementation is ~280 lines of well-structured production code. Given that this fixes a critical accounting bug in stream finalization, the extensive test coverage is justified and demonstrates high quality engineering practice.
Review Coverage
- Logic and correctness - Clean
- Security (OWASP Top 10) - Clean
- Error handling - Clean
- Type safety - Clean
- Documentation accuracy - Clean
- Test coverage - Excellent (30 comprehensive test cases)
- Code clarity - Good
Key Strengths
-
Excellent Test Coverage: 30 unit tests covering all terminal state combinations, timing variations, provider mismatch scenarios, and edge cases (multiline SSE, trailing-newline-less events).
-
Proper Separation of Concerns: The
createCodexResponsesTerminalStateTrackeris placed beforetee()in the stream pipeline, ensuring terminal state is captured independently of internal reader timing. -
Correct Failover Handling: Uses
deferredFinalizationMeta.providerTypeas source of truth, ensuring correct behavior under provider failover scenarios. -
Incremental SSE Parsing: The tracker properly handles incomplete SSE events with buffering and boundary detection, including a
finalize()method for trailing events without closing newlines. -
Last-Event-Wins Semantics: Terminal state correctly applies last-event-wins, allowing a
response.failedto override an earlierresponse.completed. -
Type Safety: Optional
providerTypefield properly added toDeferredStreamingFinalizationwith normalization at set-time.
Validation Performed
- ✅ All 6 review perspectives applied (Comment Analyzer, Test Analyzer, Silent Failure Hunter, Type Design Auditor, General Code Reviewer, Code Simplifier)
- ✅ Validated against false positive filters
- ✅ Verified terminal state logic against SSE parsing implementation
- ✅ Confirmed proper circuit breaker and session binding handling
- ✅ Checked confidence scoring (no issues met 80+ threshold)
Automated review by Claude AI
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
tests/unit/proxy/response-handler-abort-listener-cleanup.test.ts (2)
210-225: ⚡ Quick win让
providerChain的测试桩保持自洽。这里的
getProviderChain()永远返回空数组,而addProviderToChain也不会回写providerChain。如果ProxyResponseHandler在同一次处理里先追加 provider chain 再读取它,这个 helper 会和真实ProxySession行为分叉,后续很容易漏掉链路原因/重试计数相关的回归。建议让addProviderToChain写入同一个数组,并让getProviderChain()返回该数组。Also applies to: 233-237
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/proxy/response-handler-abort-listener-cleanup.test.ts` around lines 210 - 225, The test stub uses a separate immutable providerChain and a getProviderChain() that always returns [] while addProviderToChain doesn't mutate that array; update the stub so providerChain is a single array variable referenced by both getProviderChain and addProviderToChain (i.e., make addProviderToChain push into the same providerChain and have getProviderChain return that array) so the test harness mirrors ProxySession/ProxyResponseHandler behavior; apply the same change to the other identical stub block around the later occurrence (lines noted in the review).
1413-1457: ⚡ Quick win这个
response.errorabort 场景已经被前面的用例覆盖了。Line 1219-1263 已经验证了相同输入和相同断言;保留两份会增加维护成本,也让后续调整错误语义时更容易漏改。建议删掉其一,或者把
response.failed/response.error合并成参数化用例。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/proxy/response-handler-abort-listener-cleanup.test.ts` around lines 1413 - 1457, The test labeled "client abort after response.error terminal event — recorded as 502" duplicates an earlier test that exercises the same input and assertions against ProxyResponseHandler.dispatch, readFirstClientChunk, updateMessageRequestDetails, and session.addProviderToChain; remove this redundant test or convert both into a single parameterized test (covering "response.error" and "response.failed") so the same expectations (502 / CODEX_RESPONSE_ERROR and retry_failed behavior) are asserted once, and adjust any shared setup helpers like makeSession/createHangingStream accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/app/v1/_lib/proxy/response-handler.ts`:
- Around line 553-563: The current logic returns the first prompt_cache_key seen
(in getCodexPromptCacheKeyFromSSEText) and the tracker uses the
nullish-assignment (??=) so the first value wins; change both to implement
last-event-wins: in getCodexPromptCacheKeyFromSSEText iterate all parseSSEData
events and keep the most recent non-null promptCacheKey (do not return on first
match), and update usages that do "foo ??= value" (and the fallback break logic
in the tracker) to overwrite with the newest non-null value (use simple
assignment or logic that prefers the later value and do not break out on the
first hit); reference SessionManager.extractCodexPromptCacheKey,
getCodexPromptCacheKeyFromSSEText, and the tracker fallback block where ??= is
used so the final saved session key is the last event's key.
- Around line 541-547: The handler is treating incomplete/unparseable SSE
fragments as terminal by falling back to event.event; change the logic in parse
loop (using parseSSEData, event.data, eventType,
applyCodexResponsesTerminalEvent, terminalState) so that you only derive
eventType from data.type when data?.type is a string and, if that check fails,
skip applying any terminal event (i.e., continue to next event) instead of using
event.event; apply the same change to the similar block around the other
occurrence (lines referenced near where applyCodexResponsesTerminalEvent is
called again).
---
Nitpick comments:
In `@tests/unit/proxy/response-handler-abort-listener-cleanup.test.ts`:
- Around line 210-225: The test stub uses a separate immutable providerChain and
a getProviderChain() that always returns [] while addProviderToChain doesn't
mutate that array; update the stub so providerChain is a single array variable
referenced by both getProviderChain and addProviderToChain (i.e., make
addProviderToChain push into the same providerChain and have getProviderChain
return that array) so the test harness mirrors ProxySession/ProxyResponseHandler
behavior; apply the same change to the other identical stub block around the
later occurrence (lines noted in the review).
- Around line 1413-1457: The test labeled "client abort after response.error
terminal event — recorded as 502" duplicates an earlier test that exercises the
same input and assertions against ProxyResponseHandler.dispatch,
readFirstClientChunk, updateMessageRequestDetails, and
session.addProviderToChain; remove this redundant test or convert both into a
single parameterized test (covering "response.error" and "response.failed") so
the same expectations (502 / CODEX_RESPONSE_ERROR and retry_failed behavior) are
asserted once, and adjust any shared setup helpers like
makeSession/createHangingStream accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7a9488a4-8f64-40f3-9bb7-b6e92fe645ae
📒 Files selected for processing (3)
src/app/v1/_lib/proxy/response-handler.tssrc/app/v1/_lib/proxy/stream-finalization.tstests/unit/proxy/response-handler-abort-listener-cleanup.test.ts
- Gate fake-200 heuristic on streamEndedNormally instead of streamCompletedForFinalization: empty/truncated allContent on client-abort no longer triggers EMPTY_BODY→502 when the forwarded tracker already confirmed response.completed (chatgpt-codex @763) - finalize(): skip terminal classification when JSON parse fails (truncated mid-event buffer) to avoid misidentifying partial data as a completed state; remove redundant parsedObject guard for usage/service_tier (coderabbit @547/@660, gemini @672) - prompt_cache_key: switch to last-event-wins in getCodexPromptCacheKeyFromSSEText, tracker push(), and allContent fallback loop — consistent with usage/service_tier semantics (coderabbit @563) - Flush TextDecoder before tracker finalize() to drain pending multi-byte UTF-8 sequences that abort skips (greptile @2507) - Add explanatory comment on intentional event.event fallback in getCodexResponsesTerminalState (greptile @668) - Add 3 regression tests: P1 fix verification, truncation safety, last-wins prompt_cache_key (33 tests total, all pass) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
The fix requires some work. User-generated response.failed errors can cause issues with provider shutdown. I'm not sure I can provide a reliable solution. |
Summary
Fixes false `499 CLIENT_ABORTED` accounting for Codex Responses SSE streams. Codex CLI closes the HTTP connection immediately after receiving the terminal `response.completed` event, which CCH previously logged as a client abort — making successful requests look aborted in logs and provider-chain / circuit-breaker accounting.
Closes #1083. Related: #1242, #985.
Problem
For a streaming response CCH `tee()`s the upstream body into two branches: one forwarded to the client, one read internally to accumulate `allContent` (used for usage/cost, fake-200 detection, session binding). The internal reader breaks early when `clientAbortSignal` fires (`response-handler.ts`, stream loop). Codex CLI disconnects right after the terminal SSE event, so:
Solution
Capture the Codex terminal signal independently of the internal reader, on the client-forwarding path before `tee()`, so timing of the client disconnect no longer matters.
Terminal-state matrix
Changes
Testing
Checklist
Greptile Summary
This PR fixes false
499 CLIENT_ABORTEDaccounting for Codex Responses SSE streams. When Codex CLI closes the HTTP connection immediately after receiving the terminalresponse.completedevent, CCH was previously logging successful requests as client aborts because the internal reader broke early before consuming the terminal chunk. The fix inserts aTransformStreambeforetee()that feeds an incremental terminal-state tracker (createCodexResponsesTerminalStateTracker), making the terminal verdict available independently of whether the internal reader consumed all content.response-handler.ts: AddscreateCodexResponsesTerminalStateTrackerwith incremental SSE parsing,finalize()for trailing events without boundary newlines, and uses tracker-first priority for usage/service_tier/prompt_cache_key infinalizeStream;shouldTrackForwardedCodexTerminalStateguards use deferred metaproviderType(correct under failover).stream-finalization.ts: Adds optionalproviderTypefield toDeferredStreamingFinalization, normalized at set-time from meta or current session provider.tests/: 30 new test cases covering all terminal states, abort timing, provider mismatch, multiline SSE data, trailing-newline-less terminal events, and non-Codex isolation.Confidence Score: 4/5
Safe to merge; the fix is well-scoped, non-Codex streams are entirely unaffected, and the 30-case test suite covers the full terminal-state matrix including abort timing and provider failover.
The core mechanism is sound. The one gap is in
finalize(): trailing plain-texterrorevents that arrive without a closing blank line after a priorresponse.completedare silently ignored, recording the request as 200 instead of 502. This is an edge-case but a real correctness gap on the changed path.The
finalize()method insrc/app/v1/_lib/proxy/response-handler.ts(lines 629-676) warrants a second look for its handling of plain-text error events without a trailing blank-line boundary.Important Files Changed
createCodexResponsesTerminalStateTrackerand wires it into the stream forwarding path beforetee();finalize()has a minor correctness gap for trailing plain-text error events without a closing blank line that follow a priorresponse.completed.providerTypetoDeferredStreamingFinalizationand normalizes it at set-time from the meta or current session provider.Sequence Diagram
sequenceDiagram participant US as Upstream participant TS as TransformStream<br/>(+ terminal tracker) participant CS as clientStream<br/>(→ Codex CLI) participant IS as internalStream<br/>(internal reader) participant FS as finalizeStream() participant FD as finalizeDeferredStreaming...() US->>TS: SSE chunk (response.completed\n\n) TS->>TS: tracker.push(chunk) Note over TS: tracker state → "completed" TS->>CS: forward chunk TS->>IS: tee branch CS-->>CS: Codex CLI receives response.completed CS-->>CS: Codex CLI closes connection (abort signal fires) Note over IS: abortController.abort()<br/>internal reader breaks early<br/>allContent may be truncated IS->>FS: "streamEndedNormally=false, clientAborted=true" FS->>FS: "forwardedCodexTerminalDecoder.decode()<br/>tracker.finalize()" FS->>FD: getTerminalState() → "completed" FD->>FD: "codexTerminalFinished=true<br/>streamCompletedForFinalization=true<br/>effectiveStatusCode=200" FD-->>FS: "{effectiveStatusCode: 200, ...}" Note over FS: Request recorded as 200 (not 499)Prompt To Fix All With AI
Reviews (2): Last reviewed commit: "fix: address review comments on codex te..." | Re-trigger Greptile