Skip to content

feat(tui): subscribe to daemon event stream for instant updates#593

Merged
wesm merged 14 commits intoroborev-dev:mainfrom
cpcloud:worktree-wondrous-growing-kitten
Mar 30, 2026
Merged

feat(tui): subscribe to daemon event stream for instant updates#593
wesm merged 14 commits intoroborev-dev:mainfrom
cpcloud:worktree-wondrous-growing-kitten

Conversation

@cpcloud
Copy link
Copy Markdown
Contributor

@cpcloud cpcloud commented Mar 30, 2026

Summary

  • The TUI now subscribes to the daemon's NDJSON event stream (/api/stream/events) via a background goroutine, triggering immediate data refreshes instead of waiting for the 2-10s poll cycle
  • The daemon broadcasts review.closed, review.reopened, and job.enqueued events from the corresponding API handlers — previously only worker lifecycle events (review.started/completed/failed/canceled) were broadcast
  • Polling is retained as a 15s fallback (was adaptive 2s/10s) for robustness against SSE connection drops, with the SSE goroutine handling reconnection with exponential backoff
  • The SSE subscription restarts on any successful daemon reconnect (not just endpoint changes), handles in-flight fetch coalescing via a pending-refresh flag, and validates HTTP status before decoding the stream

🤖 Generated with Claude Code

cpcloud and others added 12 commits March 30, 2026 06:42
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Subscribe to daemon NDJSON event stream for instant refresh when
reviews are closed externally. Falls back to existing polling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Close old sseCh on reconnect to let orphaned waitForSSE exit cleanly
- Reset backoff after successful connection (track connected state)
- Use atomic.Int32 for thread-safe test counter
- Remove unused eventWritten channel from test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- waitForSSE now selects on both sseCh and sseStop, so closing sseStop
  cleanly unblocks waiters without closing the data channel (which
  would race with the producer goroutine)
- Run() uses the final model from p.Run() for cleanup, avoiding
  double-close when reconnect already closed the original sseStop

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- MEDIUM: handleSSEEventMsg now sets ssePendingRefresh instead of
  dropping events when loadingJobs/loadingMore is true. The flag is
  consumed in handleJobsMsg/handleJobsErrMsg to trigger a follow-up
  fetch, preventing stale data.
- LOW: sseReadLoop validates resp.StatusCode before decoding, treating
  non-200 responses as reconnectable errors instead of decoding junk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TUI now refreshes instantly when new reviews are queued,
not just when they start/complete.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Now that the TUI subscribes to daemon SSE events for real-time updates
(enqueue, start, complete, fail, cancel, close), the aggressive 2s
polling during active jobs is unnecessary. Replace the adaptive
2s/10s intervals with a single 15s fallback that catches anything SSE
misses (connection drops, reconnect gaps, un-broadcast state changes).

The displayTick (1s) for cosmetic repaints (elapsed counters, flash
expiry) is unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…event type

- MEDIUM: consumeSSEPendingRefresh now also called from
  handlePaginationErrMsg so events during pagination aren't dropped
- LOW: consumeSSEPendingRefresh triggers full refresh set (fetchJobs +
  fetchStatus + fetchFixJobs) instead of just fetchJobs
- LOW: handleCloseReview broadcasts review.reopened when Closed=false,
  reserving review.closed for actual close operations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…chine

- Add TestHandleCloseReview_BroadcastsReopenEvent verifying Closed=false
  emits review.reopened
- Add TestSSEPendingRefreshStateMachine covering: flag set during
  loadingJobs, drained on jobs completion; flag set during loadingMore,
  drained on pagination; flag not set when no fetch in flight

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Backoff growth now uses min(backoff*2, maxBackoff) to prevent
  overshooting the 30s cap (was growing 16s -> 32s)
- Replace t.Fatal with require.FailNow in select timeout branches
  per project test conventions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…change

The SSE goroutine could be stuck in exponential backoff after a
same-address daemon restart, leaving the TUI without real-time
updates even though polling had already reconnected. Now
handleReconnectMsg restarts the SSE subscription on any successful
reconnect regardless of whether the endpoint address changed.

LOW finding about newModel starting a goroutine as a constructor
side-effect is noted but intentionally not changed — all test callers
use withExternalIODisabled() which skips the goroutine, and moving
startup to Init would require a larger lifecycle refactor.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 30, 2026

roborev: Combined Review (0051a75)

Summary verdict: Changes are mostly sound, but there is one medium-severity correctness issue in the TUI SSE refresh coalescing.

Medium

  • cmd/roborev/tui/handlers_msg.go around handleSSEEventMsg: The non-loading branch starts fetchJobs() / fetchStatus() without first setting m.loadingJobs = true. If another stream event arrives before that refresh finishes, handleSSEEventMsg can enter the same branch again instead of setting ssePendingRefresh, which defeats the coalescing logic and can trigger overlapping duplicate refreshes.
    Fix: set m.loadingJobs = true before batching the refresh commands here, consistent with handleReconnectMsg and consumeSSEPendingRefresh(), so later stream events are deferred while the refresh is in flight.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

wesm and others added 2 commits March 30, 2026 08:29
…ical ref on enqueue

review.closed/review.reopened events were broadcast with only JobID,
causing repo-filtered SSE subscribers to never receive them. Load job
metadata via GetJobByID and populate Repo, RepoName, SHA, Agent.

job.enqueued event used the request's original gitRef (possibly a
symbolic ref like HEAD) instead of the stored job.GitRef (resolved SHA),
causing inconsistent ref values across the job lifecycle event stream.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Set m.loadingJobs = true in handleSSEEventMsg before dispatching
  fetchJobs, so subsequent SSE events are properly deferred instead
  of triggering duplicate overlapping refreshes
- Fix TestHandleEnqueue_BroadcastsEvent path assertion to handle
  macOS symlink resolution (/var -> /private/var) and Windows path
  normalization (short names, forward slashes)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 30, 2026

roborev: Combined Review (90471c2)

Verdict: No Medium, High, or Critical findings identified.

The reviewed outputs did not surface any issues at Medium severity or above.

One reviewer did note a Low severity reconnect-backoff behavior in cmd/roborev/tui/sse.go:29, but it is omitted here per the reporting threshold.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 30, 2026

roborev: Combined Review (966be34)

Summary verdict: changes look solid overall, with one medium-severity concurrency/state-ordering issue to address before merge.

Medium

  • cmd/roborev/tui/handlers_msg.go:713
    The new SSE refresh coalescing only treats loadingJobs and loadingMore as the active in-flight work, but each refresh also triggers fetchStatus() and sometimes fetchFixJobs(). consumeSSEPendingRefresh() can therefore clear the pending flag and start a newer refresh before the older status/fix-jobs requests finish. That allows older responses to land after newer ones and overwrite fresher UI state.
    Suggested fix: gate the full refresh batch with a single generation/token, or keep the pending-refresh lock closed until all requests spawned by the current refresh, including status/fix-jobs, have completed.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Mar 30, 2026

This is a pre-existing pattern, not a regression from this PR. handleTickMsg (line 843 on main) also fires fetchJobs() + fetchStatus() in one batch with no generation gating — handleStatusMsg has no sequence number and just overwrites. The SSE handler follows the same approach.

Preventing stale status/fix-jobs responses from overwriting fresher data would be a good improvement but would need to touch the existing polling infrastructure too. Filed #596 for follow-up.

@wesm
Copy link
Copy Markdown
Collaborator

wesm commented Mar 30, 2026

Merging this! I have a fix coming for TUI rendering that I spotted when trying this out

@wesm wesm merged commit d50e2c5 into roborev-dev:main Mar 30, 2026
7 of 8 checks passed
@cpcloud cpcloud deleted the worktree-wondrous-growing-kitten branch March 30, 2026 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants