Skip to content

fix(tui): prevent stale status/fix-jobs responses from overwriting fresher data#598

Open
cpcloud wants to merge 10 commits intoroborev-dev:mainfrom
cpcloud:worktree-snazzy-hatching-backus
Open

fix(tui): prevent stale status/fix-jobs responses from overwriting fresher data#598
cpcloud wants to merge 10 commits intoroborev-dev:mainfrom
cpcloud:worktree-snazzy-hatching-backus

Conversation

@cpcloud
Copy link
Copy Markdown
Contributor

@cpcloud cpcloud commented Mar 30, 2026

Summary

Closes #596.

  • Add loadingStatus and loadingFixJobs in-flight guard booleans matching the existing loadingJobs pattern, preventing overlapping tick-based fetches from racing
  • Introduce startFetchStatus() and startFetchFixJobs() helpers that check the loading flag before dispatching, used at all call sites (tick, reconnect, view switch, fix trigger, patch apply)
  • Add statusErrMsg type so status fetch errors clear the loading flag instead of going through the generic errMsg handler
  • Add requestFetchFixJobs() for mutation handlers (fix enqueue, patch apply) that sets a fixJobsStale deferred-refresh flag when a fetch is already in flight — handleFixJobsMsg dispatches a follow-up fetch when this flag is set
  • Force-reset loadingFixJobs on reconnect so a stale in-flight request from the dead connection doesn't block all future fix-jobs fetches

🤖 Generated with Claude Code

cpcloud and others added 5 commits March 30, 2026 13:44
…esher data (roborev-dev#596)

Add loadingStatus and loadingFixJobs in-flight guards matching the
existing loadingJobs pattern. When a fetch is already in flight,
handleTickMsg skips dispatching another, preventing overlapping
requests from racing and overwriting fresher data with stale responses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add startFetchStatus and startFetchFixJobs helpers that check the
loading flag before dispatching. Use them at all call sites — tick
handler, reconnect, view switch, fix trigger, and patch apply — so
no path can launch a concurrent request while one is already in flight.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tations

On reconnect, force-reset loadingFixJobs and dispatch a fresh fetch
(matching the existing status handling) so a stale in-flight request
from the dead connection doesn't block all future fix-jobs fetches.

For state-mutating handlers (fix enqueue, patch apply, rebase), use
requestFetchFixJobs which sets a fixJobsStale flag when a fetch is
already in flight. handleFixJobsMsg checks this flag and dispatches
a follow-up fetch to pick up post-mutation state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The fixJobsStale check already covers both success and error paths
(it's outside the if/else), but add an explicit test to prove it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Route fetchStatus/fetchFixJobs in handleSSEEventMsg and
consumeSSEPendingRefresh through the startFetch helpers, consistent
with the tick and reconnect paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cpcloud cpcloud force-pushed the worktree-snazzy-hatching-backus branch from 5e9a2d1 to f90e1a1 Compare March 30, 2026 17:47
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 30, 2026

roborev: Combined Review (f90e1a1)

Verdict: Changes improve TUI fetch coordination, but they still leave medium-severity stale-state races in status/fix-job refresh handling.

Medium

  1. Stale refreshes can be dropped while a fetch is in flight

    • Location: cmd/roborev/tui/handlers_msg.go:729, cmd/roborev/tui/handlers_msg.go:751, cmd/roborev/tui/handlers_msg.go:864
    • The new startFetchStatus / startFetchFixJobs guards suppress SSE- and tick-driven refreshes whenever a request is already in flight, but suppressed refreshes are not reliably replayed. fixJobsStale is only set on local mutation paths, and status has no equivalent pending-refresh tracking. If the in-flight request completes with pre-change data, the UI can keep showing stale status or fix-job state until a later poll or event happens.
    • Suggested fix: Track a pending refresh whenever an SSE/tick refresh is skipped, then trigger a follow-up fetch when the current request completes, or add request generations and ignore older responses.
  2. Reconnect can be overwritten by late pre-reconnect responses

    • Location: cmd/roborev/tui/handlers_msg.go:805
    • handleReconnectMsg forces fresh status and fix-job fetches, but it does not invalidate earlier in-flight requests. Because statusMsg / statusErrMsg and fixJobsMsg are not tied to a request generation, a late response or error from before reconnect can overwrite the newer post-reconnect state or incorrectly retrigger connection-error handling.
    • Suggested fix: Add per-request generations/tokens and discard stale responses after reconnect, or otherwise prevent older requests from updating state.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

SSE events signal daemon state changes, so use requestFetchStatus and
requestFetchFixJobs in the SSE handlers. When a fetch is already in
flight, these set a stale flag so handleStatusMsg/handleFixJobsMsg
dispatch a follow-up fetch to pick up post-event state.

Also clear statusStale on reconnect to avoid stale follow-ups from
the dead connection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 30, 2026

roborev: Combined Review (e751b17)

Verdict: Changes are close, but one medium-severity correctness issue remains in the reconnect path.

Medium

  • cmd/roborev/tui/handlers_msg.go:812: handleReconnectMsg starts fresh fetchStatus() and fetchFixJobs() requests on reconnect even if older requests are still in flight, but the corresponding statusMsg and fixJobsMsg handlers still lack a request generation/sequence check. An older pre-reconnect response can therefore arrive later and overwrite newer state, reintroducing the stale/out-of-order update bug on reconnect.
    Suggested fix: add a monotonically increasing generation/sequence to status and fix-jobs requests, and ignore responses from older generations.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Increment a monotonic fetchGen counter on reconnect and embed it in
statusMsg, statusErrMsg, and fixJobsMsg. Response handlers discard
messages from older generations, preventing a late pre-reconnect
response from overwriting post-reconnect state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 30, 2026

roborev: Combined Review (8869dbc)

Verdict: Changes are close, but there is one medium-severity race in the reconnect stale-response handling that should be fixed before merge.

Medium

  • cmd/roborev/tui/handlers_msg.go:225

  • cmd/roborev/tui/handlers_msg.go:439

  • cmd/roborev/tui/handlers_msg.go:957

    The stale-response guard is applied too late in handleStatusMsg, handleFixJobsMsg, and handleStatusErrMsg: loadingStatus / loadingFixJobs are cleared before checking msg.gen < m.fetchGen. After a reconnect, an old pre-reconnect response can therefore clear the active in-flight guard for the new generation, allowing duplicate fetches on the next tick/SSE event and reopening the stale-overwrite race this patch is trying to prevent.

    Fix by checking msg.gen < m.fetchGen before mutating the loading flags, or by tying loading-flag clearing to a request-specific token so only the active fetch can reset it.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

cpcloud and others added 2 commits March 30, 2026 15:55
…flags

Increment fetchSeq on reconnect so pre-reconnect job responses are
also discarded (jobs already had seq-based staleness checking).

Move loadingStatus/loadingFixJobs flag clearing after the fetchGen
check so stale pre-reconnect responses don't clear the in-flight
guard for the current-generation request.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reset loadingMore during reconnect so a stale pre-reconnect pagination
response (now discarded by fetchSeq) doesn't leave the flag stuck true,
which would cause future tick/SSE refreshes to skip full job reloads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 30, 2026

roborev: Combined Review (64f6ddc)

Summary verdict: Changes are mostly sound, but there is one medium-severity reconnect bug that can leave fix jobs stale or hidden in the TUI.

Medium

  • cmd/roborev/tui/handlers_msg.go:817: handleReconnectMsg only resets and re-fetches fix jobs when the pre-reconnect tasksWorkflowEnabled() state is already true. If the daemon enables tasks after reconnect, or an older fix-jobs request was still in flight while local state was false, reconnect can skip the refresh and leave loadingFixJobs / hasActiveFixJobs stale. That can hide active fix jobs in the queue view and may block later startFetchFixJobs calls if loadingFixJobs remains stuck true.

Suggested fix: clear loadingFixJobs and fixJobsStale on every reconnect, and trigger the reconnect refresh independently of stale pre-reconnect task state, either unconditionally or after the first post-reconnect status update confirms task support.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Clear loadingFixJobs and fixJobsStale unconditionally on reconnect,
not just when tasksWorkflowEnabled() is true. Prevents a stuck
loadingFixJobs flag if a pre-reconnect request was in flight while
tasks were disabled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 30, 2026

roborev: Combined Review (967fd37)

Verdict: No medium-or-higher issues found; the change looks clean based on all three reviews.

All reviewers agreed there are no findings to report at Medium, High, or Critical severity. The change appears to correctly add in-flight guards, generation-based stale-response protection, and regression tests for the TUI fetch coordination behavior.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Mar 30, 2026

Failing CI is due to a race condition that I might've introduced in #595. Working on it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prevent stale status/fix-jobs responses from overwriting fresher data

1 participant