fix(websocket): skip stale state merge after client-side compact#2266
fix(websocket): skip stale state merge after client-side compact#2266DragonFSKY wants to merge 3 commits intorouter-for-me:mainfrom
Conversation
After a Codex CLI compact, the client sends a full conversation
transcript (with compaction items or assistant messages) as input.
Previously, normalizeResponseSubsequentRequest() unconditionally
merged this with stale lastRequest/lastResponseOutput, breaking
function_call/function_call_output pairings and causing 400 errors
("No tool output found for function call").
Add inputContainsFullTranscript() heuristic that detects compaction
items (type=compaction/compaction_summary) or assistant messages in
the input array, and bypasses the merge when a full transcript is
present.
Fixes router-for-me#2207
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical bug in the websocket response normalization logic that occurred after client-side conversation compaction. By intelligently detecting when a full conversation transcript is provided, the system now correctly avoids merging it with outdated state, preventing data corruption and ensuring the stability of function call interactions. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly addresses an issue where stale state was being merged with full conversation transcripts from the client after a compaction, leading to errors. The introduction of the inputContainsFullTranscript heuristic to detect and bypass this merge for full transcripts is a solid approach. The new unit tests are comprehensive and cover the detection logic, the merge bypass, and ensure that the normal incremental flow remains unaffected.
I've added a few suggestions to improve code structure and test maintainability, primarily by refactoring to use a switch statement and adopting a table-driven approach for some of the new tests.
| for _, item := range input.Array() { | ||
| t := item.Get("type").String() | ||
| if t == "message" && item.Get("role").String() == "assistant" { | ||
| return true | ||
| } | ||
| if t == "compaction" || t == "compaction_summary" { | ||
| return true | ||
| } | ||
| } |
There was a problem hiding this comment.
For better readability and structure, you can refactor this loop to use a switch statement on the item's type. This makes the logic clearer and is more idiomatic in Go for handling multiple cases based on a string value.
| for _, item := range input.Array() { | |
| t := item.Get("type").String() | |
| if t == "message" && item.Get("role").String() == "assistant" { | |
| return true | |
| } | |
| if t == "compaction" || t == "compaction_summary" { | |
| return true | |
| } | |
| } | |
| for _, item := range input.Array() { | |
| switch item.Get("type").String() { | |
| case "message": | |
| if item.Get("role").String() == "assistant" { | |
| return true | |
| } | |
| case "compaction", "compaction_summary": | |
| return true | |
| } | |
| } |
| for _, typ := range []string{"compaction", "compaction_summary"} { | ||
| input := gjson.Parse(`[{"type":"message","role":"user","content":"hello"},{"type":"` + typ + `","encrypted_content":"summary"}]`) | ||
| if !inputContainsFullTranscript(input) { | ||
| t.Fatalf("expected full transcript for type=%s", typ) | ||
| } | ||
| } |
There was a problem hiding this comment.
To improve test clarity and maintainability, consider converting this test to a table-driven test using t.Run. This will create a separate sub-test for each compaction type, making it easier to identify which case fails if the test breaks in the future.
testCases := []struct {
name string
typ string
}{
{name: "compaction", typ: "compaction"},
{name: "compaction_summary", typ: "compaction_summary"},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
input := gjson.Parse(`[{"type":"message","role":"user","content":"hello"},{"type":"` + tc.typ + `","encrypted_content":"summary"}]`)
if !inputContainsFullTranscript(input) {
t.Errorf("expected full transcript for type=%s", tc.typ)
}
})
}| for _, raw := range []string{ | ||
| `[{"type":"function_call_output","call_id":"call-1","output":"result"}]`, | ||
| `[{"type":"message","role":"user","content":"next question"}]`, | ||
| `[]`, | ||
| } { | ||
| if inputContainsFullTranscript(gjson.Parse(raw)) { | ||
| t.Fatalf("incremental input must not be detected as full transcript: %s", raw) | ||
| } | ||
| } |
There was a problem hiding this comment.
This test can be improved by converting it into a table-driven test with t.Run. This approach makes the test more organized and provides clearer output on failures, as each input case will run as a distinct sub-test.
testCases := []struct {
name string
input string
}{
{name: "function call output", input: `[{"type":"function_call_output","call_id":"call-1","output":"result"}]`},
{name: "user message", input: `[{"type":"message","role":"user","content":"next question"}]`},
{name: "empty input", input: `[]`},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
if inputContainsFullTranscript(gjson.Parse(tc.input)) {
t.Errorf("incremental input must not be detected as full transcript: %s", tc.input)
}
})
}There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 26bed8cc1b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if inputContainsFullTranscript(nextInput) { | ||
| log.Infof("responses websocket: full transcript detected, skipping stale merge (input items=%d)", len(nextInput.Array())) | ||
| mergedInput = nextInput.Raw |
There was a problem hiding this comment.
Keep appending history when new input includes assistant items
This branch now treats any payload with an assistant message as a "full transcript" and skips the merge entirely. That regresses valid response.append / follow-up response.create requests that append an assistant exemplar or repaired assistant turn: the upstream request will contain only the new items, and all prior context from lastRequest/lastResponseOutput is dropped. Assistant messages are still accepted input elsewhere in this repo, so this is not just a compaction-only shape.
Useful? React with 👍 / 👎.
| t := item.Get("type").String() | ||
| if t == "message" && item.Get("role").String() == "assistant" { | ||
| return true |
There was a problem hiding this comment.
Recognize role-only assistant messages in the full-transcript check
inputContainsFullTranscript() only matches assistant items when type == "message", but the rest of the Responses pipeline accepts shorthand message objects with just role/content and no explicit type. In that case a replayed full transcript like [{"role":"assistant",...}] will fall through to the stale-merge path, reintroducing the duplicated history / broken tool-call pairing that this fix is trying to avoid.
Useful? React with 👍 / 👎.
luispater
left a comment
There was a problem hiding this comment.
Thanks for the fix and the focused tests around compact replay behavior.
I’m requesting changes for one correctness risk in the new detection heuristic:
- Blocking: The new
inputContainsFullTranscript()logic marks any assistant-role message as “full transcript,” which causesnormalizeResponseSubsequentRequest()to skip merging prior state. That can drop required historical context in incremental (non-compact) requests that legitimately contain assistant items, resulting in behavior changes outside the compact scenario this PR targets.
Suggested adjustment
- Narrow full-transcript detection to stronger replay signals (for example,
compaction/compaction_summary), or gate assistant-message detection with additional conditions that distinguish replay from incremental append.
Tests
- Please add one regression test where incremental input includes an assistant message but should still preserve prior merged history. Current tests cover compact replay and user/function_call_output incremental flows, but not this ambiguity.
Once this heuristic is tightened, I’m happy to re-review.
|
Thanks, that was a valid regression risk. I tightened the full-transcript detection so we now only treat compact replay markers ( I also added a regression test covering incremental assistant input to make sure prior history is still merged in that case, while keeping the compact replay test in place. Targeted verification: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: abbff2f524
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if inputContainsFullTranscript(nextInput) { | ||
| log.Infof("responses websocket: full transcript detected, skipping stale merge (input items=%d)", len(nextInput.Array())) | ||
| mergedInput = nextInput.Raw |
There was a problem hiding this comment.
Do not drop merged history for backends that ignore compaction
When /v1/responses/ws is used with a non-Codex backend after a client-side compact, this branch now sends nextInput as-is and skips the lastRequest/lastResponseOutput merge. I checked the downstream request translators in internal/translator/gemini/openai/responses/gemini_openai-responses_request.go:118 and internal/translator/claude/openai/responses/claude_openai-responses_request.go:180: their switch handles message/function_call/function_call_output only, so compaction and compaction_summary items are silently discarded. In that scenario the upstream request loses the prior conversation/tool state entirely and only sees the fresh user items, whereas the pre-change merge path still preserved the old transcript for those providers.
Useful? React with 👍 / 👎.
Summary
compaction/compaction_summaryitems).normalizeResponseSubsequentRequest()was unconditionally merging this with stalelastRequest/lastResponseOutput, breakingfunction_call/function_call_outputpairings and causing 400 errors ("No tool output found for function call").inputContainsFullTranscript()heuristic to detect full transcripts (compaction items or assistant messages), bypassing the stale merge when present.Test plan
full transcript detected, skipping stale merge (input items=127)in logs with no subsequent 400 errorsFixes #2207