fix(ws): 完善 Responses WebSocket 会话生命周期与测试覆盖#1154
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthrough引入并传播每连接的 Responses WS 会话 id(x-cch-responses-ws-session),导出 isNextDevMode(nodeEnv),重构 server.js 的 WebSocket 关闭/清理与内部转发语义;在 upstream-adapter 中新增持久化/复用上游 WS 会话及其清理接口;调整转发黑名单与会话提取;新增/迁移大量测试与测试工具,并将示例/文档模型统一为 gpt-5.4。 ChangesResponses WS 边缘与内部转发(server.js)
上游 WebSocket 适配器:持久化会话与复用(upstream-adapter.ts)
转发层与会话提取(forwarder / eligibility / internal-secret)
E2E 与 CCH edge harness(tests/e2e/responses-ws-codex-cli-transport.test.ts)
测试工具:Bash 脚本运行器(tests/helpers/bash.ts)与迁移
文档、本地化与测试夹具:模型名与 K8s 超时示例更新
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 45a5071873
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } catch { | ||
| // ignore | ||
| } | ||
| requestClose(1008, "too_many_requests"); |
There was a problem hiding this comment.
Abort active upstream request when closing on overflow
Switching this path from ws.close(...) to requestClose(...) changes teardown semantics: requestClose sets closed = true immediately, so the later ws.on("close", finalize) becomes a no-op and never destroys currentInternalReq. In the overflow/binary-close scenario where one response.create is already in flight, the internal /v1/responses HTTP request now continues running after the client is closed, which can burn provider tokens/work the client can no longer receive. Please ensure the active internal request is explicitly aborted when initiating these protocol/error closes.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request refactors WebSocket connection handling in server.js by centralizing the closing logic into a requestClose function. It also introduces a new end-to-end test for Codex CLI transport probing and expands unit tests for WebSocket close-handshake scenarios. Feedback highlights a potential resource leak in the new requestClose implementation where active internal requests are not terminated, and a possible TypeError in the unit tests if environment restoration occurs before initialization.
| } catch { | ||
| // ignore | ||
| } | ||
| requestClose(1003, "binary_not_supported"); |
There was a problem hiding this comment.
| restoreEnvVar("PORT", originalEnv.PORT); | ||
| restoreEnvVar("HOSTNAME", originalEnv.HOSTNAME); | ||
| restoreEnvVar("NODE_ENV", originalEnv.NODE_ENV); |
There was a problem hiding this comment.
There was a problem hiding this comment.
Code Review Summary
This PR improves WebSocket close handling by centralizing close operations through requestClose() and adds comprehensive test coverage including an E2E Codex CLI transport probe. The changes are well-structured and follow existing patterns in the codebase.
PR Size: L
- Lines changed: 536 (510 additions + 26 deletions)
- Files changed: 3
Note: This PR is borderline M/L at 536 lines. Consider splitting future PRs by separating the core fix (server.js changes) from test additions for easier review.
Files Changed
server.js- Refactors WebSocket close paths to use centralizedrequestClose()tests/e2e/responses-ws-codex-cli-transport.test.ts- New opt-in E2E test (410 lines)tests/unit/server-ws-close-handshake.test.ts- Additional unit tests for close scenarios
Review Coverage
- Logic and correctness - Clean. The refactoring to
requestClose()removes code duplication and standardizes error handling paths. - Security (OWASP Top 10) - Clean. No security concerns identified in WebSocket handling changes.
- Error handling - Acceptable. Test files use empty catch patterns consistent with existing codebase conventions.
- Type safety - Clean. TypeScript types are well-defined in the new E2E test file.
- Documentation accuracy - Clean. Comments accurately describe the opt-in E2E test behavior.
- Test coverage - Good. New tests cover binary frame rejection, queue overflow, and transport detection.
- Code clarity - Good. The
requestClose()abstraction improves readability over inline try/catch blocks.
Observations (Non-blocking)
-
E2E Test Design: The
responses-ws-codex-cli-transport.test.tsE2E test is well-designed with:- Environment-gated execution (
CCH_CODEX_E2E=1) - Flexible transport assertion (
any/http/websocket) - Proper resource cleanup with
probe.close()
- Environment-gated execution (
-
Test Coverage: New unit tests validate:
- Binary frame protocol errors (1003 close code)
- Queue overflow handling (1008 close code)
- Synchronous queue clearing on overflow
-
Consistency: The refactoring from inline
try/catch { ws.close() }torequestClose()improves maintainability by centralizing the close logic.
No significant issues identified. PR is ready for merge pending any additional reviewer feedback.
Automated review by Claude AI
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server.js`:
- Around line 291-292: 请求关闭路径在调用 requestClose(1008, ...) 后因将 closed 设为 true 导致
ws.on("close") 回调里的 finalize() 提前返回,从而跳过 currentInternalReq.destroy(),需在
requestClose 内同步中止任何 in-flight 请求。修改 requestClose 函数以在设置 closed = true
之前或紧接其后检查并立即调用 currentInternalReq.destroy()(或调用负责中止的 helper),并确保与 finalize()
中的逻辑不重复使得 double-destroy 安全(比如先判断 currentInternalReq
存在且未被销毁后再销毁);参考符号:requestClose, closed, ws.on("close"), finalize,
currentInternalReq.destroy()。
In `@tests/unit/server-ws-close-handshake.test.ts`:
- Around line 31-35: The variable originalEnv is declared but not initialized,
so if beforeAll throws the afterAll cleanup (which reads
originalEnv.PORT/HOSTNAME/NODE_ENV) can throw a secondary error; initialize
originalEnv at declaration with a safe snapshot (e.g., { PORT: process.env.PORT,
HOSTNAME: process.env.HOSTNAME, NODE_ENV: process.env.NODE_ENV } or explicit
undefined strings) so afterAll can always restore safely, and ensure the
afterAll restoration logic (which reads originalEnv.PORT/ HOSTNAME/ NODE_ENV)
uses those initialized values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e7ccb61a-b1e4-487a-b3fc-f228a6f6ef2d
📒 Files selected for processing (3)
server.jstests/e2e/responses-ws-codex-cli-transport.test.tstests/unit/server-ws-close-handshake.test.ts
| observedTransport: transport, | ||
| events: probe.events, | ||
| stderrTail: result.stderr.slice(-2000), | ||
| }, | ||
| null, | ||
| 2 | ||
| ) |
There was a problem hiding this comment.
sawCleanWsClose reason check is always false
The ws.on('close', (code, reason)) event on the probe's server-side WebSocket fires with the code/reason from the client's echoed close frame — not from the server's ws.close(1000, "response_completed") call. Codex CLI (tungstenite-rs or similar) will echo back code 1000 with an empty reason string, so event.reason === "response_completed" will never be true. This means sawCleanWsClose is always false, causing expect(sawCleanWsClose).toBe(true) to fail whenever transport actually is "websocket" — precisely the case this assertion is meant to guard.
| observedTransport: transport, | |
| events: probe.events, | |
| stderrTail: result.stderr.slice(-2000), | |
| }, | |
| null, | |
| 2 | |
| ) | |
| const sawCleanWsClose = probe.events.some( | |
| (event) => event.type === "ws_close" && event.code === 1000 | |
| ); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/e2e/responses-ws-codex-cli-transport.test.ts
Line: 390-396
Comment:
**`sawCleanWsClose` reason check is always false**
The `ws.on('close', (code, reason))` event on the probe's server-side WebSocket fires with the code/reason from the **client's** echoed close frame — not from the server's `ws.close(1000, "response_completed")` call. Codex CLI (tungstenite-rs or similar) will echo back code 1000 with an empty reason string, so `event.reason === "response_completed"` will never be true. This means `sawCleanWsClose` is always `false`, causing `expect(sawCleanWsClose).toBe(true)` to fail whenever transport actually is `"websocket"` — precisely the case this assertion is meant to guard.
```suggestion
const sawCleanWsClose = probe.events.some(
(event) => event.type === "ws_close" && event.code === 1000
);
```
How can I resolve this? If you propose a fix, please make it concise.| expect(upstreamCalls).toBe(0); | ||
| }); |
There was a problem hiding this comment.
Fixed-delay
setTimeout makes the assertion timing-sensitive
Waiting a hard-coded 50 ms after the close event to assert upstreamCalls === 0 can fail on an overloaded or slow CI runner where the pipelined text frame is still in-flight through Node's event loop at assertion time. Consider using a deterministic signal — e.g., a server-side close listener — instead of a fixed delay. The same pattern appears in the overflow test (150 ms window).
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/unit/server-ws-close-handshake.test.ts
Line: 357-358
Comment:
**Fixed-delay `setTimeout` makes the assertion timing-sensitive**
Waiting a hard-coded 50 ms after the close event to assert `upstreamCalls === 0` can fail on an overloaded or slow CI runner where the pipelined text frame is still in-flight through Node's event loop at assertion time. Consider using a deterministic signal — e.g., a server-side close listener — instead of a fixed delay. The same pattern appears in the overflow test (150 ms window).
How can I resolve this? If you propose a fix, please make it concise.|
补充处理了本轮 review 反馈:
本地验证:
|
|
补充一个模型 ID 清理提交:
验证:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5b7fe3acb7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| await new Promise<void>((resolve) => wss.close(() => resolve())); | ||
| await new Promise<void>((resolve) => server.close(() => resolve())); |
There was a problem hiding this comment.
Terminate probe WS clients before awaiting server close
probe.close() awaits wss.close() directly, but with noServer mode the ws docs state that existing connections are not closed automatically by server.close([callback]). If runCodex() times out and kills the child while a WebSocket is still open, this callback may never fire, so the finally block hangs and the test fails on the outer timeout with a non-diagnostic error. Explicitly terminating wss.clients before awaiting wss.close() avoids this stuck cleanup path.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/`[locale]/usage-doc/page.tsx:
- Around line 1254-1255: The locale entries droid.configuration.switching.steps
still reference the old model name ("GPT-5-Codex [cch]") while the configuration
example changed model_display_name to "GPT-5.4 [cch]"; update the step text
(step 3) in all five locale files (messages/zh-CN/usage.json,
messages/zh-TW/usage.json, messages/en/usage.json, messages/ja/usage.json,
messages/ru/usage.json) so the string mentions "GPT-5.4 [cch]" (and keeps Sonnet
4.5 [cch] as appropriate) to match the new model_display_name under the
droid.configuration.switching.steps key.
In `@tests/e2e/responses-ws-codex-cli-transport.test.ts`:
- Around line 345-351: The observedTransport function currently treats a
ws_message with frameType === "response.create" as the sole indicator of
WebSocket usage; change it to detect WebSocket transport by checking for probe
events with type "ws_connection" or "ws_upgrade" that target the "/v1/responses"
path (use the event.path or equivalent field) and treat frameType only as
diagnostic info—if such ws_connection/ws_upgrade events exist return
"websocket", else if any event.type === "http_responses" return "http",
otherwise "none"; update references to ProbeEvent properties (type, path,
frameType) in observedTransport so the decision is based on
ws_connection/ws_upgrade targeting "/v1/responses" rather than frameType ===
"response.create".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7a2ef977-2d5f-45e9-a961-c69e85759b1f
📒 Files selected for processing (49)
messages/en/settings/prices.jsonmessages/en/settings/providers/form/modelSelect.jsonmessages/en/usage.jsonmessages/ja/settings/prices.jsonmessages/ja/settings/providers/form/modelSelect.jsonmessages/ja/settings/providers/form/strings.jsonmessages/ja/usage.jsonmessages/ru/settings/prices.jsonmessages/ru/settings/providers/form/modelSelect.jsonmessages/ru/settings/providers/form/strings.jsonmessages/ru/usage.jsonmessages/zh-CN/settings/prices.jsonmessages/zh-CN/settings/providers/form/modelSelect.jsonmessages/zh-CN/settings/providers/form/strings.jsonmessages/zh-CN/usage.jsonmessages/zh-TW/settings/prices.jsonmessages/zh-TW/settings/providers/form/modelSelect.jsonmessages/zh-TW/settings/providers/form/strings.jsonmessages/zh-TW/usage.jsonsrc/actions/providers.tssrc/app/[locale]/dashboard/sessions/[sessionId]/messages/_components/session-messages-client-actions.test.tsxsrc/app/[locale]/dashboard/sessions/[sessionId]/messages/_components/session-messages-client.test.tsxsrc/app/[locale]/settings/providers/_components/forms/api-test-button.tsxsrc/app/[locale]/usage-doc/page.tsxsrc/app/v1/_lib/responses-ws/__tests__/server-helpers.test.tssrc/app/v1/_lib/responses-ws/__tests__/upstream-adapter.test.tssrc/lib/model-vendor-icons.test.tssrc/lib/provider-testing/data/cx_base.jsonsrc/lib/provider-testing/data/cx_codex_basic.jsonsrc/lib/provider-testing/presets.tssrc/lib/provider-testing/test-service.test.tssrc/lib/provider-testing/utils/test-prompts.tssrc/lib/session-manager-detail-snapshots.test.tstests/api/v1/providers/providers.read.test.tstests/e2e/responses-ws-codex-cli-transport.test.tstests/integration/non-chat-endpoint-fallback-observability.test.tstests/unit/actions/active-sessions-detail-snapshots.test.tstests/unit/actions/model-prices.test.tstests/unit/codex/session-completer.test.tstests/unit/proxy/actual-response-model.test.tstests/unit/proxy/codex-provider-overrides.test.tstests/unit/proxy/non-chat-endpoint-fallback.test.tstests/unit/proxy/non-chat-endpoint-session-context.test.tstests/unit/proxy/proxy-forwarder-large-chunked-response.test.tstests/unit/proxy/proxy-forwarder-nonok-body-hang.test.tstests/unit/proxy/proxy-forwarder-raw-passthrough-regression.test.tstests/unit/server-ws-close-handshake.test.tstests/unit/settings/providers/api-test-button.test.tsxtests/unit/usage-doc/opencode-usage-doc.test.tsx
✅ Files skipped from review due to trivial changes (34)
- src/lib/provider-testing/data/cx_base.json
- src/lib/provider-testing/data/cx_codex_basic.json
- tests/integration/non-chat-endpoint-fallback-observability.test.ts
- messages/zh-CN/settings/providers/form/strings.json
- src/app/[locale]/settings/providers/_components/forms/api-test-button.tsx
- tests/unit/settings/providers/api-test-button.test.tsx
- messages/ja/settings/providers/form/strings.json
- messages/ja/settings/prices.json
- messages/en/usage.json
- messages/ru/settings/providers/form/strings.json
- messages/zh-TW/settings/prices.json
- messages/ru/usage.json
- tests/unit/proxy/non-chat-endpoint-session-context.test.ts
- src/lib/provider-testing/presets.ts
- messages/zh-TW/settings/providers/form/strings.json
- messages/zh-CN/settings/prices.json
- messages/zh-CN/settings/providers/form/modelSelect.json
- messages/ru/settings/providers/form/modelSelect.json
- messages/zh-TW/settings/providers/form/modelSelect.json
- messages/ru/settings/prices.json
- messages/en/settings/providers/form/modelSelect.json
- messages/ja/usage.json
- tests/unit/proxy/proxy-forwarder-large-chunked-response.test.ts
- tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts
- src/actions/providers.ts
- tests/unit/proxy/actual-response-model.test.ts
- src/lib/model-vendor-icons.test.ts
- messages/ja/settings/providers/form/modelSelect.json
- tests/api/v1/providers/providers.read.test.ts
- src/app/v1/_lib/responses-ws/tests/server-helpers.test.ts
- messages/en/settings/prices.json
- messages/zh-TW/usage.json
- src/lib/session-manager-detail-snapshots.test.ts
- tests/unit/codex/session-completer.test.ts
| "model_display_name": "GPT-5.4 [cch]", | ||
| "model": "gpt-5.4", |
There was a problem hiding this comment.
Droid 配置中 model_display_name 已更新为 "GPT-5.4 [cch]",但各语言 locale 文件中的切换步骤仍指向旧名称。
droid.configuration.switching.steps 在各语言文件(如 messages/zh-CN/usage.json 第 621 行)中写的是 "选择 GPT-5-Codex [cch] 或 Sonnet 4.5 [cch]",而本 PR 将配置示例里的 model_display_name 改为 "GPT-5.4 [cch]"。用户按指南配置好 Droid 之后,实际的模型列表里会出现 GPT-5.4 [cch],但切换步骤却让他们去找 GPT-5-Codex [cch],导致用户困惑。
建议将全部 5 个语言文件(zh-CN、zh-TW、en、ja、ru)中 droid.configuration.switching.steps 的第 3 步同步更新,例如:
- "选择 GPT-5-Codex [cch] 或 Sonnet 4.5 [cch]",
+ "选择 GPT-5.4 [cch] 或 Sonnet 4.5 [cch]",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/app/`[locale]/usage-doc/page.tsx around lines 1254 - 1255, The locale
entries droid.configuration.switching.steps still reference the old model name
("GPT-5-Codex [cch]") while the configuration example changed model_display_name
to "GPT-5.4 [cch]"; update the step text (step 3) in all five locale files
(messages/zh-CN/usage.json, messages/zh-TW/usage.json, messages/en/usage.json,
messages/ja/usage.json, messages/ru/usage.json) so the string mentions "GPT-5.4
[cch]" (and keeps Sonnet 4.5 [cch] as appropriate) to match the new
model_display_name under the droid.configuration.switching.steps key.
|
已处理 CodeRabbit 新一轮的 2 条 actionable comment,追加提交:
复验:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/e2e/responses-ws-codex-cli-transport.test.ts`:
- Around line 248-256: The Windows branch uses execFileSync("where.exe",
["codex.cmd"]) which throws when codex.cmd is not found so the friendly check
never runs; wrap the execFileSync call in a try/catch (or use spawnSync and test
.status) inside the process.platform === "win32" block, and on error or no
result throw the clearer Error("Cannot find codex.cmd on PATH. Install Codex CLI
or set CCH_CODEX_E2E_BIN.") before returning nodeInvocationForWindowsCmd;
reference execFileSync, codex.cmd, nodeInvocationForWindowsCmd and the
process.platform === "win32" branch to locate where to add the try/catch and
error translation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1f79f74d-84ff-4f19-aa30-75f4475b3aef
📒 Files selected for processing (6)
messages/en/usage.jsonmessages/ja/usage.jsonmessages/ru/usage.jsonmessages/zh-CN/usage.jsonmessages/zh-TW/usage.jsontests/e2e/responses-ws-codex-cli-transport.test.ts
✅ Files skipped from review due to trivial changes (1)
- messages/ja/usage.json
🚧 Files skipped from review as they are similar to previous changes (4)
- messages/ru/usage.json
- messages/en/usage.json
- messages/zh-TW/usage.json
- messages/zh-CN/usage.json
|
已处理 CodeRabbit 最新意见,追加提交:
复验:
|
|
本轮继续把 Codex 对照依据:
实现内容:
验证结果:
没有新开 WSL2 虚拟机:当前本机 opt-in E2E 已能覆盖 Codex CLI -> CCH WS tunnel -> upstream WS mock 的全链路;考虑到现有两个 WSL2 VM 都有业务在跑,本轮没有打扰它们。 |
|
@coderabbitai review @codex review |
|
Tip For best results, initiate chat on the files or code changes.
[review] |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 24c52e2826
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const dev = isNextDevMode(process.env.NODE_ENV); | ||
| const hostname = process.env.HOSTNAME || "0.0.0.0"; | ||
| const port = parseInt(process.env.PORT || (dev ? "13500" : "3000"), 10); |
There was a problem hiding this comment.
Preserve non-production default port behavior
Changing dev to only NODE_ENV === "development" also changes the default PORT fallback from 13500 to 3000 whenever NODE_ENV is unset or set to values like test/staging. In this repo, npm/bun run start does not set PORT, while docs/tests and helper defaults still target localhost:13500, so this introduces a silent runtime mismatch where the server starts on a different port than the rest of the project expects.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/e2e/responses-ws-codex-cli-transport.test.ts`:
- Around line 1399-1402: The test incorrectly assumes the raw ws_message
ordering when detecting warmup/follow-up; instead filter probe.events for
ws_message frames that are actual response.create events and then inspect those
frames' generate and previousResponseId properties to decide warmup vs
follow-up. Specifically, replace uses of wsMessages (from probe.events.filter(e
=> e.type === "ws_message")) with a filtered list of response.create frames
(e.g., filter where event.type === "ws_message" && event.name ===
"response.create" or where payload/type === "response.create"), then use that
list to check generate and previousResponseId; apply the same change to the
similar block around the other occurrence mentioned (1435-1440).
- Around line 201-227: 当前实现对任何入站 WS 帧都回写一整套
response.created/response.completed,导致收到非请求控制帧(如 session.update)时也会错误地标记 probe
成功;修改 ws.on("message", ...) 中的逻辑:基于已解析的 frameType(变量 frameType)只在 frameType ===
"response.create" 时才执行 responseSeq 增加并调用 responseEvents(...)/ws.send(...)
回写模拟响应,同时保留对 generate(变量 generate)为 false 的判断来控制 includeOutput,这样只有真正的
response.create 请求会得到 response.created/response.completed 的模拟回复。
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ec803e56-296e-4c11-8ca0-afe9985e2a0e
📒 Files selected for processing (1)
tests/e2e/responses-ws-codex-cli-transport.test.ts
| const { randomUUID } = require("node:crypto"); | ||
| const { parse } = require("node:url"); | ||
|
|
||
| const dev = process.env.NODE_ENV !== "production"; | ||
| function isNextDevMode(nodeEnv) { | ||
| return nodeEnv === "development"; | ||
| } | ||
|
|
There was a problem hiding this comment.
Behavioral regression:
dev is now false when NODE_ENV is unset or "test"
The old guard was process.env.NODE_ENV !== "production", so any value other than "production" (including undefined, "test", "staging", etc.) enabled dev mode (Next compiler, port 13500). The new guard nodeEnv === "development" silently flips those envs to production mode (port 3000, expects built artifacts).
Running node server.js without setting NODE_ENV used to serve the dev HMR server on 13500; it now tries to serve a prebuilt bundle on 3000. Any deployment that relied on the old default — Docker images that don't set NODE_ENV=production, CI run scripts, or local npm start — will either fail to start (missing .next build output) or unexpectedly move to port 3000.
Prompt To Fix With AI
This is a comment left during a code review.
Path: server.js
Line: 26-32
Comment:
**Behavioral regression: `dev` is now `false` when `NODE_ENV` is unset or `"test"`**
The old guard was `process.env.NODE_ENV !== "production"`, so any value other than `"production"` (including `undefined`, `"test"`, `"staging"`, etc.) enabled dev mode (Next compiler, port 13500). The new guard `nodeEnv === "development"` silently flips those envs to production mode (port 3000, expects built artifacts).
Running `node server.js` without setting `NODE_ENV` used to serve the dev HMR server on 13500; it now tries to serve a prebuilt bundle on 3000. Any deployment that relied on the old default — Docker images that don't set `NODE_ENV=production`, CI run scripts, or local `npm start` — will either fail to start (missing `.next` build output) or unexpectedly move to port 3000.
How can I resolve this? If you propose a fix, please make it concise.|
@coderabbitai review |
✅ Actions performedReview triggered.
|
| persistentEntry.lastUsedAt = Date.now(); | ||
| if (persistentEntry.idleTimer) { | ||
| clearTimeout(persistentEntry.idleTimer); | ||
| persistentEntry.idleTimer = null; | ||
| } | ||
| ws = existing.ws; | ||
| reused = true; |
There was a problem hiding this comment.
Active-session close in
else branch can abort an in-flight upstream request
The else branch fires for three distinct cases: fingerprint mismatch, WS already closing/closed, and existing.active === true. The first two are safe to close eagerly; the third is not. When existing.active is true a request is mid-flight on that upstream WS, and closePersistentEntry(existing, 1000) will trigger onClose → socketClosed = true → resolve(null) in the message-loop, causing the in-progress SSE stream to end with stream_ended_without_terminal and a 1011 close to the client.
The sequential drain queue normally prevents the same sessionId from being active twice, but the condition can be reached if an abort signal or a prior forgetPersistentSession call lands between active = true being set and finishRequest resetting it. Splitting the condition makes the invariant explicit and prevents accidentally closing in-flight work.
// suggested split:
if (
existing.fingerprint === fingerprint &&
!existing.active &&
!isWsClosingOrClosed(existing.ws)
) {
// reuse
} else if (existing.active) {
// Safety: drain queue guarantees sequential execution per session,
// so active===true here indicates a programming error. Log and fall
// through to creating a fresh connection rather than closing in-flight work.
logger.warn("[ResponsesWsAdapter] unexpected active session on new request – creating fresh", { sessionId });
forgetPersistentSession(sessionId, existing.ws); // unregister without closing
} else {
closePersistentEntry(existing, 1000);
}Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/responses-ws/upstream-adapter.ts
Line: 362-368
Comment:
**Active-session close in `else` branch can abort an in-flight upstream request**
The `else` branch fires for three distinct cases: fingerprint mismatch, WS already closing/closed, and `existing.active === true`. The first two are safe to close eagerly; the third is not. When `existing.active` is `true` a request is mid-flight on that upstream WS, and `closePersistentEntry(existing, 1000)` will trigger `onClose` → `socketClosed = true` → `resolve(null)` in the message-loop, causing the in-progress SSE stream to end with `stream_ended_without_terminal` and a 1011 close to the client.
The sequential drain queue normally prevents the same `sessionId` from being active twice, but the condition can be reached if an abort signal or a prior `forgetPersistentSession` call lands between `active = true` being set and `finishRequest` resetting it. Splitting the condition makes the invariant explicit and prevents accidentally closing in-flight work.
```ts
// suggested split:
if (
existing.fingerprint === fingerprint &&
!existing.active &&
!isWsClosingOrClosed(existing.ws)
) {
// reuse
} else if (existing.active) {
// Safety: drain queue guarantees sequential execution per session,
// so active===true here indicates a programming error. Log and fall
// through to creating a fresh connection rather than closing in-flight work.
logger.warn("[ResponsesWsAdapter] unexpected active session on new request – creating fresh", { sessionId });
forgetPersistentSession(sessionId, existing.ws); // unregister without closing
} else {
closePersistentEntry(existing, 1000);
}
```
How can I resolve this? If you propose a fix, please make it concise.|
补充本轮 review 修复与最终验证结果:
本地验证:
远端检查已通过:Code Quality Check、Docker Build Test、Unit Tests、API Tests、Integration Tests、Test Summary、CodeRabbit、Greptile Review 全部 green。 |
|
本轮已处理 AI review 中与本 PR 修改直接相关且有价值的一条 Greptile P1 建议:
已重新验证:
@coderabbitai review |
|
Tip For best results, initiate chat on the files or code changes.
[review] |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: db67f8293e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| return persistentSessions.size; | ||
| } | ||
|
|
||
| globalThis.__cchCleanupResponsesWsSession = cleanupResponsesWsSession; |
There was a problem hiding this comment.
Keep session cleanup hook stable across module reloads
Assigning globalThis.__cchCleanupResponsesWsSession directly to this module’s closure means any hot-reload/re-import swaps the hook to a new persistentSessions map, while sessions created by the previous module instance become unreachable by server.js cleanup calls. In Next.js dev/test reload scenarios, client disconnects will no longer close those older upstream sockets immediately, so they linger until the 65-minute idle timer, which can accumulate leaked upstream WS connections.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 29d254121b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| async function pickFreePort(): Promise<number> { | ||
| return new Promise((resolve, reject) => { | ||
| const probe = http.createServer(); | ||
| probe.listen(0, "127.0.0.1", () => { | ||
| const address = probe.address(); | ||
| if (!address || typeof address !== "object") { | ||
| probe.close(); | ||
| reject(new Error("failed to allocate local port")); | ||
| return; | ||
| } | ||
| const port = address.port; | ||
| probe.close(() => resolve(port)); | ||
| }); |
There was a problem hiding this comment.
Avoid TOCTOU when choosing test server ports
pickFreePort() binds an ephemeral port, closes the probe server, and later reuses that numeric port for startIsolatedCchEdgeHarness/fault harness startup. Another process (or another test worker) can claim the port in the gap, causing intermittent EADDRINUSE failures that make this E2E suite flaky under concurrent CI load. Allocate with listen(0) on the actual server you are about to use (or keep the probe socket open until handoff) so reservation and use are atomic.
Useful? React with 👍 / 👎.
| await withTimeout( | ||
| serverConnection.close, | ||
| 3000, | ||
| "server WebSocket did not close after binary protocol close" | ||
| ); | ||
| expect(upstreamCalls).toBe(0); | ||
| }); |
There was a problem hiding this comment.
Race condition in binary-close test:
upstreamCalls === 0 asserted without synchronizing on frame processing
The test awaits serverConnection.close to confirm the server-side WebSocket closed, then immediately checks upstreamCalls === 0. However, serverConnection.close resolves as soon as the ws.on('close') event fires — this happens before handleWebSocketConnection has necessarily finished draining its pending queue. The pipelined text frame is still inside the message event handler's void drain().catch(...) microtask at this point, so upstreamCalls could be 0 for the wrong reason (drain hasn't run yet), and the assertion is not verifying the intended invariant. A deterministic synchronization point (e.g., a server-side signal after drain settles) is needed.
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/unit/server-ws-close-handshake.test.ts
Line: 475-481
Comment:
**Race condition in binary-close test: `upstreamCalls === 0` asserted without synchronizing on frame processing**
The test awaits `serverConnection.close` to confirm the server-side WebSocket closed, then immediately checks `upstreamCalls === 0`. However, `serverConnection.close` resolves as soon as the `ws.on('close')` event fires — this happens before `handleWebSocketConnection` has necessarily finished draining its pending queue. The pipelined text frame is still inside the `message` event handler's `void drain().catch(...)` microtask at this point, so `upstreamCalls` could be 0 for the wrong reason (drain hasn't run yet), and the assertion is not verifying the intended invariant. A deterministic synchronization point (e.g., a server-side signal after drain settles) is needed.
How can I resolve this? If you propose a fix, please make it concise.
Summary
Completes the WebSocket close path fixes for
/v1/responsesinbound bridge by standardizing all close operations through therequestClose()helper. Adds regression tests covering binary frame handling, pipelined frame dropping, and pending queue overflow scenarios. Includes an opt-in E2E transport probe for Codex CLI to document and validate transport behavior.Related Issues
dev分支 + CPA 上游:/v1/responsesWebSocket 连接成功后 Codex 仍报 Connection reset without closing handshake #1150 - Fixes remaining close path gaps identified in PR fix(responses-ws): send close frame after terminal event; raise WS payload cap (#1150) #1153 review feedbackProblem
PR #1153 introduced
requestClose()to ensure proper WebSocket closing handshake, but several error paths still used inlinews.close()try/catch blocks:drain().catch()error handler calledws.close()directly, bypassing the synchronousclosed=trueand pending queue cleanupAdditionally, there was no way to verify whether Codex CLI actually uses WebSocket or HTTP transport when feature flags are enabled.
Solution
Core Changes
server.jsto userequestClose()helper:drain().catch()error handler (code 1011)Testing Improvements
PORT,HOSTNAME,NODE_ENV)E2E Transport Probe
tests/e2e/responses-ws-codex-cli-transport.test.tsCCH_CODEX_E2E_EXPECT_TRANSPORTenv var (any,http,websocket)Changes
server.jsrequestClose()helpertests/unit/server-ws-close-handshake.test.tstests/e2e/responses-ws-codex-cli-transport.test.tsVerification
Test Results
E2E Observations
Local testing with Codex CLI 0.128.0 shows that even with
responses_websocketsandresponses_websockets_v2feature flags enabled, Codex still sends HTTPPOST /v1/responsesinstead of WebSocket upgrade. The E2E probe preserves this evidence and allows future validation when Codex builds support true WebSocket transport.Checklist
Description enhanced by Claude AI
Greptile Summary
This PR completes the WebSocket close-path standardization from #1153 by routing all
ws.close()call sites throughrequestClose(), and introduces the larger feature of persistent upstream WebSocket sessions keyed by a per-client UUID (x-cch-responses-ws-session). Terminal events (success and error) no longer close the client connection; the upstream WS is pooled and reused acrossresponse.createturns for the same client. Theforwarder.tseligibility check is correctly moved tosession.headers(the inbound headers that carry the internal secret) rather thanprocessedHeaders(the filtered outgoing headers).Confidence Score: 4/5
Safe to merge with minor follow-ups; no P0/P1 bugs found beyond already-tracked previous review threads.
All critical close paths are now routed through requestClose(); the persistent session pool logic handles concurrent-busy, fingerprint-mismatch, and eviction cases correctly. The forwarder.ts eligibility bug fix (session.headers vs processedHeaders) is a meaningful correctness improvement. Remaining findings are all P2.
server.js (non-spec status field, missing ws_turn_completed log for JSON path); src/app/v1/_lib/responses-ws/upstream-adapter.ts (permanent close/error listeners on pooled sockets)
Important Files Changed
Sequence Diagram
sequenceDiagram participant C as Codex CLI (client WS) participant S as server.js participant N as Next.js app participant U as Upstream WS (OpenAI) C->>S: WS upgrade /v1/responses S->>S: generate responsesWsSessionId (UUID) C->>S: response.create (turn 1) S->>N: POST /v1/responses + x-cch-responses-ws-session N->>U: WS upgrade (new persistent session) U-->>N: SSE events (response.created … response.completed) N-->>S: SSE stream S-->>C: forward events (WS stays OPEN) N->>N: persistentSessions.set(sessionId, entry) C->>S: response.create (turn 2) S->>N: POST /v1/responses + x-cch-responses-ws-session N->>U: send frame (REUSED upstream WS) U-->>N: SSE events N-->>S: SSE stream S-->>C: forward events (WS stays OPEN) C->>S: WS close(1000) S->>S: finalize() → abortCurrentInternalReq + cleanupUpstreamWsSession S->>U: WS close(1000) via globalThis.__cchCleanupResponsesWsSessionComments Outside Diff (1)
src/app/v1/_lib/responses-ws/upstream-adapter.ts, line 724-731 (link)When
existing.active && !isWsClosingOrClosed(existing.ws)is true, the code only logs a warning and falls through so a fresh WS is opened. Later,registerPersistentSessioncallspersistentSessions.set(sessionId, entry), which overwrites the map while the oldpersistentEntryis still active.When the old request's
finishRequest()runs,persistentSessions.get(sessionId) === persistentEntryis now false (retainedForReuse = false), socloseDetachedEntry = !options?.closeCodebecomestrueand the old WS is closed with code 1000 mid-stream. From the upstream's perspective the connection was closed cleanly during a live response, which can cause a truncated stream rather than a clean terminal event.Prompt To Fix With AI
Prompt To Fix All With AI
Reviews (20): Last reviewed commit: "ci: 恢复 Codex 自动化默认模型" | Re-trigger Greptile