Skip to content

fix: serve runtime persistence callbacks bypass engine deadlines and can stall streams/tool execution#835

Closed
sam-saffron-jarvis wants to merge 1 commit into
SamSaffron:mainfrom
sam-saffron-jarvis:feat/codereview-110a9883
Closed

fix: serve runtime persistence callbacks bypass engine deadlines and can stall streams/tool execution#835
sam-saffron-jarvis wants to merge 1 commit into
SamSaffron:mainfrom
sam-saffron-jarvis:feat/codereview-110a9883

Conversation

@sam-saffron-jarvis

Copy link
Copy Markdown
Contributor

What changed

  • Added an inlinePersistContext helper in cmd/serve_runtime.go that keeps the incoming context's cancellation and deadline while still bounding inline DB writes.
  • Switched the callback-driven persistence paths to use that preserved context for:
    • persistSnapshot
    • appendMessages
    • the in-progress assistant upsert inside upsertPendingAssistantLocked
  • Left the existing cancel-proof deferred/final reconciliation path unchanged, so best-effort post-cancel cleanup still uses its own detached timeout after the engine is done.
  • Added tests that verify inline persistence keeps short parent deadlines and that the snapshot callback's AddMessage path stays within the engine callback-scale deadline.

Why this is high-value

These store writes run inline on the engine's streaming/tool orchestration path. Before this change, they wrapped the callback context with context.WithoutCancel(..., 10s), which discarded the engine's 5s callback timeout and could let a slow or locked session DB stall streaming and delay tool execution for up to 10 seconds per callback.

Preserving the callback context prevents that deadline bypass. The exact failure avoided is: a slow SQLite write in assistant snapshot/response/turn persistence no longer outlives the engine's callback budget and freeze the active run while waiting on DB I/O.

Validation

  • Verified the issue is real in current code by inspecting internal/llm/engine.go and cmd/serve_runtime.go: engine callbacks are created with a 5s callback timeout, but serve runtime callback persistence was re-wrapping those contexts with context.WithoutCancel(..., 10s).
  • Added tests:
    • TestServeRuntimePersistSnapshotPreservesParentDeadline
    • TestServeRuntimeAppendMessagesPreservesParentDeadline
    • TestServeRuntimeSnapshotCallbackPreservesEngineDeadline
  • Ran:
    • gofmt -w cmd/serve_runtime.go cmd/serve_runtime_test.go
    • go build ./...
    • go test ./...
    • git diff --stat
    • git diff --check

@SamSaffron

Copy link
Copy Markdown
Owner

Closing as requested: the high-confidence fixes have been applied or superseded in the current working tree.

@SamSaffron SamSaffron closed this Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants