fix(acp): log session persistence failures instead of discarding them#493
fix(acp): log session persistence failures instead of discarding them#493euxaristia wants to merge 1 commit into
Conversation
persistTurn silently ignored errors from Store.AppendEvent for both the user and assistant turn. A store failure (disk full, permission error, backend error) could corrupt or lose conversation history with no trace that anything went wrong, breaking /rewind, checkpoints, and continuity between runs. Log both failures to stderr instead of discarding them. This is a best-effort minimum fix; surfacing the failure to the transcript/user is left as future work. Closes Gitlawb#471. Split out of Gitlawb#468 at reviewer request. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Walkthrough
ChangesPersist turn error logging
Estimated code review effort: 2 (Simple) | ~10 minutes Related issues: Suggested labels: bug, acp, testing Suggested reviewers: (none specified) 🐰 A quiet error once slipped through the cracks, 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/acp/agent.go (1)
393-408: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueSolid minimal fix — matches the issue's suggested minimum viable fix.
Errors are now captured and logged instead of silently discarded, exactly as issue
#471called for. The two blocks are near-identical (differ only by role/label); consider extracting a small helper to avoid repeating theAppendEvent+Fprintfpattern.♻️ Optional dedup
+func (a *Agent) appendTurnEvent(sessionID, role, content string) { + if _, err := a.deps.Store.AppendEvent(sessionID, sessions.AppendEventInput{ + Type: sessions.EventMessage, + Payload: map[string]any{"role": role, "content": content}, + }); err != nil { + fmt.Fprintf(os.Stderr, "warning: failed to persist %s turn: %v\n", role, err) + } +} + func (a *Agent) persistTurn(sess *acpSession, user, assistant string) { if a.deps.Store != nil { - if _, err := a.deps.Store.AppendEvent(sess.id, sessions.AppendEventInput{ - Type: sessions.EventMessage, - Payload: map[string]any{"role": "user", "content": user}, - }); err != nil { - // best-effort; log at least so we notice history loss - // (real fix would surface to transcript / user) - fmt.Fprintf(os.Stderr, "warning: failed to persist user turn: %v\n", err) - } - if assistant != "" { - if _, err := a.deps.Store.AppendEvent(sess.id, sessions.AppendEventInput{ - Type: sessions.EventMessage, - Payload: map[string]any{"role": "assistant", "content": assistant}, - }); err != nil { - fmt.Fprintf(os.Stderr, "warning: failed to persist assistant turn: %v\n", err) - } - } + a.appendTurnEvent(sess.id, "user", user) + if assistant != "" { + a.appendTurnEvent(sess.id, "assistant", assistant) + } } sess.appendHistory(turnRecord{user: user, assistant: assistant}) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/acp/agent.go` around lines 393 - 408, The user and assistant persistence paths in a.deps.Store.AppendEvent already log failures, but the two near-identical blocks in the session save flow still duplicate the same append-and-warning pattern. Extract a small helper around the AppendEvent call used by internal/acp/agent.go’s persistence logic so both the user and assistant turns share one implementation, while keeping the same stderr warning behavior and role-specific payloads.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/acp/agent_test.go`:
- Around line 178-196: The stderr capture in persistTurn testing leaves
os.Stderr redirected if a t.Fatalf happens before manual restoration, so update
the test around a.persistTurn in internal/acp/agent_test.go to restore os.Stderr
with a deferred cleanup immediately after swapping it to the pipe writer. Keep
the existing os.Pipe/read-capture flow, but ensure the original stderr is always
restored regardless of failures in os.Pipe, write.Close, or io.ReadAll, so the
rest of the test process is not left with a dangling stderr.
---
Nitpick comments:
In `@internal/acp/agent.go`:
- Around line 393-408: The user and assistant persistence paths in
a.deps.Store.AppendEvent already log failures, but the two near-identical blocks
in the session save flow still duplicate the same append-and-warning pattern.
Extract a small helper around the AppendEvent call used by
internal/acp/agent.go’s persistence logic so both the user and assistant turns
share one implementation, while keeping the same stderr warning behavior and
role-specific payloads.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3c2edce6-997a-4d10-a859-1b93e425fb5c
📒 Files selected for processing (2)
internal/acp/agent.gointernal/acp/agent_test.go
| origStderr := os.Stderr | ||
| read, write, err := os.Pipe() | ||
| if err != nil { | ||
| t.Fatalf("os.Pipe: %v", err) | ||
| } | ||
| os.Stderr = write | ||
|
|
||
| a.persistTurn(sess, "hello", "world") | ||
|
|
||
| if err := write.Close(); err != nil { | ||
| t.Fatalf("close pipe writer: %v", err) | ||
| } | ||
| os.Stderr = origStderr | ||
|
|
||
| captured, err := io.ReadAll(read) | ||
| if err != nil { | ||
| t.Fatalf("read captured stderr: %v", err) | ||
| } | ||
| output := string(captured) |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
os.Stderr isn't restored if any t.Fatalf fires between redirect and restore.
os.Stderr = write happens at Line 183, but restoration at Line 190 only runs after the write.Close() error check at Line 187-189. t.Fatalf calls runtime.Goexit, not os.Exit, so if write.Close() (or the earlier os.Pipe() at Line 179-182) fails, os.Stderr is left pointed at a closed/dangling pipe for the rest of the test binary — silently breaking output capture or crashing subsequent tests in the package. Restore via defer immediately after the swap, as is standard practice for this pattern.
🔧 Proposed fix
origStderr := os.Stderr
read, write, err := os.Pipe()
if err != nil {
t.Fatalf("os.Pipe: %v", err)
}
os.Stderr = write
+ defer func() { os.Stderr = origStderr }()
a.persistTurn(sess, "hello", "world")
if err := write.Close(); err != nil {
t.Fatalf("close pipe writer: %v", err)
}
- os.Stderr = origStderr
captured, err := io.ReadAll(read)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| origStderr := os.Stderr | |
| read, write, err := os.Pipe() | |
| if err != nil { | |
| t.Fatalf("os.Pipe: %v", err) | |
| } | |
| os.Stderr = write | |
| a.persistTurn(sess, "hello", "world") | |
| if err := write.Close(); err != nil { | |
| t.Fatalf("close pipe writer: %v", err) | |
| } | |
| os.Stderr = origStderr | |
| captured, err := io.ReadAll(read) | |
| if err != nil { | |
| t.Fatalf("read captured stderr: %v", err) | |
| } | |
| output := string(captured) | |
| origStderr := os.Stderr | |
| read, write, err := os.Pipe() | |
| if err != nil { | |
| t.Fatalf("os.Pipe: %v", err) | |
| } | |
| os.Stderr = write | |
| defer func() { os.Stderr = origStderr }() | |
| a.persistTurn(sess, "hello", "world") | |
| if err := write.Close(); err != nil { | |
| t.Fatalf("close pipe writer: %v", err) | |
| } | |
| captured, err := io.ReadAll(read) | |
| if err != nil { | |
| t.Fatalf("read captured stderr: %v", err) | |
| } | |
| output := string(captured) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/acp/agent_test.go` around lines 178 - 196, The stderr capture in
persistTurn testing leaves os.Stderr redirected if a t.Fatalf happens before
manual restoration, so update the test around a.persistTurn in
internal/acp/agent_test.go to restore os.Stderr with a deferred cleanup
immediately after swapping it to the pipe writer. Keep the existing
os.Pipe/read-capture flow, but ensure the original stderr is always restored
regardless of failures in os.Pipe, write.Close, or io.ReadAll, so the rest of
the test process is not left with a dangling stderr.
Vasanthdev2004 asked for this PR to be split since it bundled several independent static-analysis fixes. Remove the specialist depth cap, bash output OOM cap, and persistence error logging: each now has its own PR (Gitlawb#491, Gitlawb#492, Gitlawb#493). Also drop the "cat" addition to the Windows POSIX-utility detection in shell_runtime.go. PR Gitlawb#476 already covers cat detection comprehensively under the MSYS/sandbox angle, so keeping it here would duplicate that work. What remains: the Windows cmd.exe quoting-guidance rewrite, the Windows-specific interactive-command suggestions (steering away from Unix head/tail/ps toward native or PowerShell alternatives), the clipboard PowerShell -Command fix, and the CI test-slack change in exec_command_test.go. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
jatmn
left a comment
There was a problem hiding this comment.
I found a few issues that need to be addressed before this is ready.
Findings
-
[P1] Get the linked issue approved before continuing this community PR
CONTRIBUTING.md:19
The contribution policy says community pull requests must be tied to an issue that has already been reviewed by the core team, with approval shown by theissue-approvedlabel, and that PRs opened before that label may be closed without review. This PR closes#471, but that issue currently has no labels, and this PR is from a contributor account rather than a team-member association. Please get#471marked withissue-approvedor get explicit maintainer approval recorded before continuing the implementation review. -
[P2] Coordinate with the older overlapping fix for the same issue
GitHub PR #487
There is already an older open PR,#487, that also fixes#471in the ACP persistence path and changes the sameinternal/acp/agent.go/internal/acp/agent_test.goarea, with passing CI. That PR also surfaces load and append failures as ACP warning thought chunks, while this PR logs only append failures to stderr. Please coordinate which PR/approach maintainers want to keep, or explain why this narrower PR should proceed independently, so the repository does not review and merge competing implementations for the same bug. -
[P2] Complete CodeRabbit's request to restore stderr with a defer
internal/acp/agent_test.go:183
CodeRabbit's unresolved review item is still valid:TestPersistTurnLogsAppendEventFailuresassignsos.Stderr = writeand only restores it afterwrite.Close()succeeds. Ifwrite.Close()or any latert.Fatalfruns before line 190, the rest of the test process is left with stderr pointing at the pipe instead of the original descriptor. Please complete that review request by restoringos.Stderrwith a deferred cleanup immediately after the swap.
|
Closing in favor of #487, which fixes the same issue (#471) in the same files with a broader approach (surfaces both load and append failures as ACP warnings, not just append failures to stderr) and already has passing CI. Filing a separate PR for the same fix was my mistake, thanks to CodeRabbit for catching the overlap. |
Summary
Agent.persistTurnsilently discarded errors fromStore.AppendEventfor both the user and assistant turn (_, _ = a.deps.Store.AppendEvent(...)). A store failure (disk full, permission error, backend error) could corrupt or lose conversation history with the agent continuing as if nothing happened, breaking/rewind, checkpoints, and continuity between runs.Closes #471
Changes
internal/acp/agent.go:persistTurnnow checks the error returned by eachAppendEventcall and logs a warning to stderr (warning: failed to persist user turn: .../warning: failed to persist assistant turn: ...) instead of discarding it.Test plan
TestPersistTurnLogsAppendEventFailuresininternal/acp/agent_test.go: forcesAppendEventto fail deterministically with an invalid session id, captures stderr via a pipe, and asserts both warning messages are logged.go build ./internal/acp/...go vet ./internal/acp/...go test ./internal/acp/... -count=1(all pass)This was split out of #468 at reviewer request (Vasanthdev2004 asked for the bundled fixes to be split into independent PRs).
🤖 Generated with Claude Code
Summary by CodeRabbit