Skip to content

fix(websocket): skip stale state merge after client-side compact#2266

Open
DragonFSKY wants to merge 3 commits intorouter-for-me:mainfrom
DragonFSKY:fix/ws-compact-tool-output-mismatch
Open

fix(websocket): skip stale state merge after client-side compact#2266
DragonFSKY wants to merge 3 commits intorouter-for-me:mainfrom
DragonFSKY:fix/ws-compact-tool-output-mismatch

Conversation

@DragonFSKY
Copy link
Copy Markdown
Contributor

Summary

  • After Codex CLI compact, the client sends a full conversation transcript as input (with compaction/compaction_summary items). normalizeResponseSubsequentRequest() was unconditionally merging this with stale lastRequest/lastResponseOutput, breaking function_call/function_call_output pairings and causing 400 errors ("No tool output found for function call").
  • Added inputContainsFullTranscript() heuristic to detect full transcripts (compaction items or assistant messages), bypassing the stale merge when present.
  • Normal incremental flow (user messages / function_call_output only) is unaffected.

Test plan

  • Unit tests: 5 new tests covering detection heuristic + merge bypass + incremental merge preservation
  • Production verification: deployed to staging, triggered compact via Codex CLI, confirmed full transcript detected, skipping stale merge (input items=127) in logs with no subsequent 400 errors

Fixes #2207

After a Codex CLI compact, the client sends a full conversation
transcript (with compaction items or assistant messages) as input.
Previously, normalizeResponseSubsequentRequest() unconditionally
merged this with stale lastRequest/lastResponseOutput, breaking
function_call/function_call_output pairings and causing 400 errors
("No tool output found for function call").

Add inputContainsFullTranscript() heuristic that detects compaction
items (type=compaction/compaction_summary) or assistant messages in
the input array, and bypasses the merge when a full transcript is
present.

Fixes router-for-me#2207
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical bug in the websocket response normalization logic that occurred after client-side conversation compaction. By intelligently detecting when a full conversation transcript is provided, the system now correctly avoids merging it with outdated state, preventing data corruption and ensuring the stability of function call interactions.

Highlights

  • Stale State Merge Fix: Addressed an issue where normalizeResponseSubsequentRequest() incorrectly merged stale lastRequest/lastResponseOutput with full conversation transcripts after client-side compaction, causing function_call/function_call_output pairing breakage and 400 errors.
  • Full Transcript Detection: Introduced an inputContainsFullTranscript() heuristic to identify full conversation transcripts (containing compaction items or assistant messages), allowing the system to bypass the problematic stale merge in these cases.
  • Incremental Flow Preservation: Ensured that the normal incremental conversation flow, involving only user messages or function call outputs, remains unaffected by these changes, preserving existing functionality.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses an issue where stale state was being merged with full conversation transcripts from the client after a compaction, leading to errors. The introduction of the inputContainsFullTranscript heuristic to detect and bypass this merge for full transcripts is a solid approach. The new unit tests are comprehensive and cover the detection logic, the merge bypass, and ensure that the normal incremental flow remains unaffected.

I've added a few suggestions to improve code structure and test maintainability, primarily by refactoring to use a switch statement and adopting a table-driven approach for some of the new tests.

Comment on lines +623 to +631
for _, item := range input.Array() {
t := item.Get("type").String()
if t == "message" && item.Get("role").String() == "assistant" {
return true
}
if t == "compaction" || t == "compaction_summary" {
return true
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability and structure, you can refactor this loop to use a switch statement on the item's type. This makes the logic clearer and is more idiomatic in Go for handling multiple cases based on a string value.

Suggested change
for _, item := range input.Array() {
t := item.Get("type").String()
if t == "message" && item.Get("role").String() == "assistant" {
return true
}
if t == "compaction" || t == "compaction_summary" {
return true
}
}
for _, item := range input.Array() {
switch item.Get("type").String() {
case "message":
if item.Get("role").String() == "assistant" {
return true
}
case "compaction", "compaction_summary":
return true
}
}

Comment on lines +677 to +682
for _, typ := range []string{"compaction", "compaction_summary"} {
input := gjson.Parse(`[{"type":"message","role":"user","content":"hello"},{"type":"` + typ + `","encrypted_content":"summary"}]`)
if !inputContainsFullTranscript(input) {
t.Fatalf("expected full transcript for type=%s", typ)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve test clarity and maintainability, consider converting this test to a table-driven test using t.Run. This will create a separate sub-test for each compaction type, making it easier to identify which case fails if the test breaks in the future.

testCases := []struct {
		name string
		typ  string
	}{
		{name: "compaction", typ: "compaction"},
		{name: "compaction_summary", typ: "compaction_summary"},
	}

	for _, tc := range testCases {
		t.Run(tc.name, func(t *testing.T) {
			input := gjson.Parse(`[{"type":"message","role":"user","content":"hello"},{"type":"` + tc.typ + `","encrypted_content":"summary"}]`)
			if !inputContainsFullTranscript(input) {
				t.Errorf("expected full transcript for type=%s", tc.typ)
			}
		})
	}

Comment on lines +687 to +695
for _, raw := range []string{
`[{"type":"function_call_output","call_id":"call-1","output":"result"}]`,
`[{"type":"message","role":"user","content":"next question"}]`,
`[]`,
} {
if inputContainsFullTranscript(gjson.Parse(raw)) {
t.Fatalf("incremental input must not be detected as full transcript: %s", raw)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test can be improved by converting it into a table-driven test with t.Run. This approach makes the test more organized and provides clearer output on failures, as each input case will run as a distinct sub-test.

testCases := []struct {
		name  string
		input string
	}{
		{name: "function call output", input: `[{"type":"function_call_output","call_id":"call-1","output":"result"}]`},
		{name: "user message", input: `[{"type":"message","role":"user","content":"next question"}]`},
		{name: "empty input", input: `[]`},
	}

	for _, tc := range testCases {
		t.Run(tc.name, func(t *testing.T) {
			if inputContainsFullTranscript(gjson.Parse(tc.input)) {
				t.Errorf("incremental input must not be detected as full transcript: %s", tc.input)
			}
		})
	}

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26bed8cc1b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +311 to +313
if inputContainsFullTranscript(nextInput) {
log.Infof("responses websocket: full transcript detected, skipping stale merge (input items=%d)", len(nextInput.Array()))
mergedInput = nextInput.Raw
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep appending history when new input includes assistant items

This branch now treats any payload with an assistant message as a "full transcript" and skips the merge entirely. That regresses valid response.append / follow-up response.create requests that append an assistant exemplar or repaired assistant turn: the upstream request will contain only the new items, and all prior context from lastRequest/lastResponseOutput is dropped. Assistant messages are still accepted input elsewhere in this repo, so this is not just a compaction-only shape.

Useful? React with 👍 / 👎.

Comment on lines +624 to +626
t := item.Get("type").String()
if t == "message" && item.Get("role").String() == "assistant" {
return true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Recognize role-only assistant messages in the full-transcript check

inputContainsFullTranscript() only matches assistant items when type == "message", but the rest of the Responses pipeline accepts shorthand message objects with just role/content and no explicit type. In that case a replayed full transcript like [{"role":"assistant",...}] will fall through to the stale-merge path, reintroducing the duplicated history / broken tool-call pairing that this fix is trying to avoid.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

@luispater luispater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix and the focused tests around compact replay behavior.

I’m requesting changes for one correctness risk in the new detection heuristic:

  • Blocking: The new inputContainsFullTranscript() logic marks any assistant-role message as “full transcript,” which causes normalizeResponseSubsequentRequest() to skip merging prior state. That can drop required historical context in incremental (non-compact) requests that legitimately contain assistant items, resulting in behavior changes outside the compact scenario this PR targets.

Suggested adjustment

  • Narrow full-transcript detection to stronger replay signals (for example, compaction / compaction_summary), or gate assistant-message detection with additional conditions that distinguish replay from incremental append.

Tests

  • Please add one regression test where incremental input includes an assistant message but should still preserve prior merged history. Current tests cover compact replay and user/function_call_output incremental flows, but not this ambiguity.

Once this heuristic is tightened, I’m happy to re-review.

@DragonFSKY
Copy link
Copy Markdown
Contributor Author

Thanks, that was a valid regression risk.

I tightened the full-transcript detection so we now only treat compact replay markers (compaction / compaction_summary) as a signal to skip the stale merge. Assistant messages alone no longer bypass the merge path.

I also added a regression test covering incremental assistant input to make sure prior history is still merged in that case, while keeping the compact replay test in place.

Targeted verification:
go test ./sdk/api/handlers/openai -run 'Test(InputContainsFullTranscript|NormalizeSubsequentRequest)'

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: abbff2f524

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +311 to +313
if inputContainsFullTranscript(nextInput) {
log.Infof("responses websocket: full transcript detected, skipping stale merge (input items=%d)", len(nextInput.Array()))
mergedInput = nextInput.Raw
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Do not drop merged history for backends that ignore compaction

When /v1/responses/ws is used with a non-Codex backend after a client-side compact, this branch now sends nextInput as-is and skips the lastRequest/lastResponseOutput merge. I checked the downstream request translators in internal/translator/gemini/openai/responses/gemini_openai-responses_request.go:118 and internal/translator/claude/openai/responses/claude_openai-responses_request.go:180: their switch handles message/function_call/function_call_output only, so compaction and compaction_summary items are silently discarded. In that scenario the upstream request loses the prior conversation/tool state entirely and only sees the fresh user items, whereas the pre-change merge path still preserved the old transcript for those providers.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(codex): invalid_request_error after auto compact

2 participants