Skip to content

fix(acp): log session persistence failures instead of discarding them#493

Closed
euxaristia wants to merge 1 commit into
Gitlawb:mainfrom
euxaristia:fix/persist-turn-error-logging
Closed

fix(acp): log session persistence failures instead of discarding them#493
euxaristia wants to merge 1 commit into
Gitlawb:mainfrom
euxaristia:fix/persist-turn-error-logging

Conversation

@euxaristia

@euxaristia euxaristia commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Agent.persistTurn silently discarded errors from Store.AppendEvent for both the user and assistant turn (_, _ = a.deps.Store.AppendEvent(...)). A store failure (disk full, permission error, backend error) could corrupt or lose conversation history with the agent continuing as if nothing happened, breaking /rewind, checkpoints, and continuity between runs.

Closes #471

Changes

  • internal/acp/agent.go: persistTurn now checks the error returned by each AppendEvent call and logs a warning to stderr (warning: failed to persist user turn: ... / warning: failed to persist assistant turn: ...) instead of discarding it.
  • This is a minimum, best-effort fix (log at least). Surfacing the failure to the transcript or user is left as future work, per the issue's suggested fix list.

Test plan

  • Added TestPersistTurnLogsAppendEventFailures in internal/acp/agent_test.go: forces AppendEvent to fail deterministically with an invalid session id, captures stderr via a pipe, and asserts both warning messages are logged.
  • go build ./internal/acp/...
  • go vet ./internal/acp/...
  • go test ./internal/acp/... -count=1 (all pass)

This was split out of #468 at reviewer request (Vasanthdev2004 asked for the bundled fixes to be split into independent PRs).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Improved turn-saving behavior so message persistence issues are now reported instead of being silently ignored.
    • Added visible warnings when user or assistant message writes fail, while continuing best-effort processing.
    • Expanded test coverage to verify persistence failures are surfaced correctly.

persistTurn silently ignored errors from Store.AppendEvent for both the
user and assistant turn. A store failure (disk full, permission error,
backend error) could corrupt or lose conversation history with no trace
that anything went wrong, breaking /rewind, checkpoints, and continuity
between runs.

Log both failures to stderr instead of discarding them. This is a
best-effort minimum fix; surfacing the failure to the transcript/user is
left as future work.

Closes Gitlawb#471. Split out of Gitlawb#468 at reviewer request.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jul 4, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

persistTurn in internal/acp/agent.go now captures errors returned from Store.AppendEvent for user and assistant turns and logs warnings to stderr on failure, instead of silently discarding them. A new test verifies these warnings are emitted when persistence fails.

Changes

Persist turn error logging

Layer / File(s) Summary
Capture and log AppendEvent errors
internal/acp/agent.go
Adds fmt/os imports; persistTurn now checks Store.AppendEvent errors for user and assistant turns and prints warnings to os.Stderr on failure instead of discarding results.
Test coverage for logging failures
internal/acp/agent_test.go
Adds TestPersistTurnLogsAppendEventFailures, which redirects stderr, invokes persistTurn with an invalid session id, and asserts both user- and assistant-turn failure warnings appear.

Estimated code review effort: 2 (Simple) | ~10 minutes

Related issues: #471 — addresses silent errors in session persistence by logging AppendEvent failures instead of discarding them.

Suggested labels: bug, acp, testing

Suggested reviewers: (none specified)


🐰 A quiet error once slipped through the cracks,
Now warnings hop out with sharp little quacks,
Stderr catches what used to be lost,
Persistence failures no longer the cost,
Tests confirm the bunny's watch never relax.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the core change: logging ACP session persistence failures instead of silently discarding them.
Linked Issues check ✅ Passed The PR satisfies #471 by logging AppendEvent failures to stderr and adding a test that verifies the warnings.
Out of Scope Changes check ✅ Passed The changes stay focused on persistence-failure logging and its test coverage, with no clear unrelated additions.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/acp/agent.go (1)

393-408: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Solid minimal fix — matches the issue's suggested minimum viable fix.

Errors are now captured and logged instead of silently discarded, exactly as issue #471 called for. The two blocks are near-identical (differ only by role/label); consider extracting a small helper to avoid repeating the AppendEvent + Fprintf pattern.

♻️ Optional dedup
+func (a *Agent) appendTurnEvent(sessionID, role, content string) {
+	if _, err := a.deps.Store.AppendEvent(sessionID, sessions.AppendEventInput{
+		Type:    sessions.EventMessage,
+		Payload: map[string]any{"role": role, "content": content},
+	}); err != nil {
+		fmt.Fprintf(os.Stderr, "warning: failed to persist %s turn: %v\n", role, err)
+	}
+}
+
 func (a *Agent) persistTurn(sess *acpSession, user, assistant string) {
 	if a.deps.Store != nil {
-		if _, err := a.deps.Store.AppendEvent(sess.id, sessions.AppendEventInput{
-			Type:    sessions.EventMessage,
-			Payload: map[string]any{"role": "user", "content": user},
-		}); err != nil {
-			// best-effort; log at least so we notice history loss
-			// (real fix would surface to transcript / user)
-			fmt.Fprintf(os.Stderr, "warning: failed to persist user turn: %v\n", err)
-		}
-		if assistant != "" {
-			if _, err := a.deps.Store.AppendEvent(sess.id, sessions.AppendEventInput{
-				Type:    sessions.EventMessage,
-				Payload: map[string]any{"role": "assistant", "content": assistant},
-			}); err != nil {
-				fmt.Fprintf(os.Stderr, "warning: failed to persist assistant turn: %v\n", err)
-			}
-		}
+		a.appendTurnEvent(sess.id, "user", user)
+		if assistant != "" {
+			a.appendTurnEvent(sess.id, "assistant", assistant)
+		}
 	}
 	sess.appendHistory(turnRecord{user: user, assistant: assistant})
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/acp/agent.go` around lines 393 - 408, The user and assistant
persistence paths in a.deps.Store.AppendEvent already log failures, but the two
near-identical blocks in the session save flow still duplicate the same
append-and-warning pattern. Extract a small helper around the AppendEvent call
used by internal/acp/agent.go’s persistence logic so both the user and assistant
turns share one implementation, while keeping the same stderr warning behavior
and role-specific payloads.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/acp/agent_test.go`:
- Around line 178-196: The stderr capture in persistTurn testing leaves
os.Stderr redirected if a t.Fatalf happens before manual restoration, so update
the test around a.persistTurn in internal/acp/agent_test.go to restore os.Stderr
with a deferred cleanup immediately after swapping it to the pipe writer. Keep
the existing os.Pipe/read-capture flow, but ensure the original stderr is always
restored regardless of failures in os.Pipe, write.Close, or io.ReadAll, so the
rest of the test process is not left with a dangling stderr.

---

Nitpick comments:
In `@internal/acp/agent.go`:
- Around line 393-408: The user and assistant persistence paths in
a.deps.Store.AppendEvent already log failures, but the two near-identical blocks
in the session save flow still duplicate the same append-and-warning pattern.
Extract a small helper around the AppendEvent call used by
internal/acp/agent.go’s persistence logic so both the user and assistant turns
share one implementation, while keeping the same stderr warning behavior and
role-specific payloads.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3c2edce6-997a-4d10-a859-1b93e425fb5c

📥 Commits

Reviewing files that changed from the base of the PR and between f401c66 and 67498fe.

📒 Files selected for processing (2)
  • internal/acp/agent.go
  • internal/acp/agent_test.go

Comment on lines +178 to +196
origStderr := os.Stderr
read, write, err := os.Pipe()
if err != nil {
t.Fatalf("os.Pipe: %v", err)
}
os.Stderr = write

a.persistTurn(sess, "hello", "world")

if err := write.Close(); err != nil {
t.Fatalf("close pipe writer: %v", err)
}
os.Stderr = origStderr

captured, err := io.ReadAll(read)
if err != nil {
t.Fatalf("read captured stderr: %v", err)
}
output := string(captured)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

os.Stderr isn't restored if any t.Fatalf fires between redirect and restore.

os.Stderr = write happens at Line 183, but restoration at Line 190 only runs after the write.Close() error check at Line 187-189. t.Fatalf calls runtime.Goexit, not os.Exit, so if write.Close() (or the earlier os.Pipe() at Line 179-182) fails, os.Stderr is left pointed at a closed/dangling pipe for the rest of the test binary — silently breaking output capture or crashing subsequent tests in the package. Restore via defer immediately after the swap, as is standard practice for this pattern.

🔧 Proposed fix
 	origStderr := os.Stderr
 	read, write, err := os.Pipe()
 	if err != nil {
 		t.Fatalf("os.Pipe: %v", err)
 	}
 	os.Stderr = write
+	defer func() { os.Stderr = origStderr }()
 
 	a.persistTurn(sess, "hello", "world")
 
 	if err := write.Close(); err != nil {
 		t.Fatalf("close pipe writer: %v", err)
 	}
-	os.Stderr = origStderr
 
 	captured, err := io.ReadAll(read)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
origStderr := os.Stderr
read, write, err := os.Pipe()
if err != nil {
t.Fatalf("os.Pipe: %v", err)
}
os.Stderr = write
a.persistTurn(sess, "hello", "world")
if err := write.Close(); err != nil {
t.Fatalf("close pipe writer: %v", err)
}
os.Stderr = origStderr
captured, err := io.ReadAll(read)
if err != nil {
t.Fatalf("read captured stderr: %v", err)
}
output := string(captured)
origStderr := os.Stderr
read, write, err := os.Pipe()
if err != nil {
t.Fatalf("os.Pipe: %v", err)
}
os.Stderr = write
defer func() { os.Stderr = origStderr }()
a.persistTurn(sess, "hello", "world")
if err := write.Close(); err != nil {
t.Fatalf("close pipe writer: %v", err)
}
captured, err := io.ReadAll(read)
if err != nil {
t.Fatalf("read captured stderr: %v", err)
}
output := string(captured)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/acp/agent_test.go` around lines 178 - 196, The stderr capture in
persistTurn testing leaves os.Stderr redirected if a t.Fatalf happens before
manual restoration, so update the test around a.persistTurn in
internal/acp/agent_test.go to restore os.Stderr with a deferred cleanup
immediately after swapping it to the pipe writer. Keep the existing
os.Pipe/read-capture flow, but ensure the original stderr is always restored
regardless of failures in os.Pipe, write.Close, or io.ReadAll, so the rest of
the test process is not left with a dangling stderr.

euxaristia added a commit to euxaristia/zero that referenced this pull request Jul 4, 2026
Vasanthdev2004 asked for this PR to be split since it bundled several
independent static-analysis fixes. Remove the specialist depth cap, bash
output OOM cap, and persistence error logging: each now has its own PR
(Gitlawb#491, Gitlawb#492, Gitlawb#493).

Also drop the "cat" addition to the Windows POSIX-utility detection in
shell_runtime.go. PR Gitlawb#476 already covers cat detection comprehensively
under the MSYS/sandbox angle, so keeping it here would duplicate that
work.

What remains: the Windows cmd.exe quoting-guidance rewrite, the
Windows-specific interactive-command suggestions (steering away from
Unix head/tail/ps toward native or PowerShell alternatives), the
clipboard PowerShell -Command fix, and the CI test-slack change in
exec_command_test.go.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a few issues that need to be addressed before this is ready.

Findings

  • [P1] Get the linked issue approved before continuing this community PR
    CONTRIBUTING.md:19
    The contribution policy says community pull requests must be tied to an issue that has already been reviewed by the core team, with approval shown by the issue-approved label, and that PRs opened before that label may be closed without review. This PR closes #471, but that issue currently has no labels, and this PR is from a contributor account rather than a team-member association. Please get #471 marked with issue-approved or get explicit maintainer approval recorded before continuing the implementation review.

  • [P2] Coordinate with the older overlapping fix for the same issue
    GitHub PR #487
    There is already an older open PR, #487, that also fixes #471 in the ACP persistence path and changes the same internal/acp/agent.go / internal/acp/agent_test.go area, with passing CI. That PR also surfaces load and append failures as ACP warning thought chunks, while this PR logs only append failures to stderr. Please coordinate which PR/approach maintainers want to keep, or explain why this narrower PR should proceed independently, so the repository does not review and merge competing implementations for the same bug.

  • [P2] Complete CodeRabbit's request to restore stderr with a defer
    internal/acp/agent_test.go:183
    CodeRabbit's unresolved review item is still valid: TestPersistTurnLogsAppendEventFailures assigns os.Stderr = write and only restores it after write.Close() succeeds. If write.Close() or any later t.Fatalf runs before line 190, the rest of the test process is left with stderr pointing at the pipe instead of the original descriptor. Please complete that review request by restoring os.Stderr with a deferred cleanup immediately after the swap.

@euxaristia

Copy link
Copy Markdown
Contributor Author

Closing in favor of #487, which fixes the same issue (#471) in the same files with a broader approach (surfaces both load and append failures as ACP warnings, not just append failures to stderr) and already has passing CI. Filing a separate PR for the same fix was my mistake, thanks to CodeRabbit for catching the overlap.

@euxaristia euxaristia closed this Jul 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

acp: silent errors in session persistence can lose history

2 participants