fix(proxy): relax missing-tool-call replay guard to 2MB / 1000 items#584
Merged
Conversation
The 250KB / 80-item gate was 413'ing legitimate client-driven full replays from Codex CLI / compact, which routinely fall in the 300-800KB / 100-800 item range. Loops that we actually want to block typically blow past several MB before becoming obvious, so the new thresholds still catch runaway behavior without breaking normal conversation continuation after compaction. The integration test that asserts the guard fires keeps its red/green shape — padding is bumped from 80 to 1010 items so the request still crosses the new item-count threshold.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/v1/responses的 missing-tool-call full-history replay guard 阈值从 250KB / 80 items 提到 2MB / 1000 items。背景
Codex CLI 在用户
/compact或者会话跨 proxy 重启时会主动发完整 input + 不带previous_response_id的 fallback 请求。proxy 这边的 implicit-resume 会查 session-affinity 强行接 prev_response_id,input 里 function_call_output 的 call_id 跟 stored functionCallIds 对不上 →resume=off:missing_tool_calls→ guard 触发 → 413 → 客户端连续重试都被拦下,对话卡死。实测这种 client-driven full replay 一般落在 300-800KB / 100-800 items。原来 250KB / 80 items 的死阈值把它们一律误判成 runaway 风暴。
guard 本意是挡"误触发 missing_tool_calls 后的全量 replay 循环烧 token"。真正失控的 loop 在变成问题前通常已经 multi-MB,所以新阈值还能挡,但合法 fallback 不再误伤。
根因侧(implicit-resume 不该把 client 主动 full replay 当 resume 来匹配)是另一个独立 PR 的事。本 PR 是 hotfix。
Changes
src/routes/shared/proxy-handler.ts—PAYLOAD_GUARD_BYTES250K → 2M;PAYLOAD_GUARD_ITEMS80 → 1000。注释解释为啥放宽。tests/integration/proxy-handler.test.ts— guard 触发测试的 padding 从 80 items 提到 1010 items,保持 red/green 语义。Test Plan
npx vitest run tests/integration/proxy-handler.test.ts— 34 passnpx vitest run— 2266 pass / 1 skipped / 0 fail (230 files)Notes
短期 hotfix。implicit-resume 在 client 主动 full replay 时应该直接放弃 prev 而不是强行匹配 → mismatch → guard,这个根因修复留给后续 PR。