feat: replace agent heartbeat with Castellarius-driven liveness detection#506
Merged
Conversation
…tion Remove the agent-driven heartbeat mechanism (ct droplet heartbeat) and replace it with Castellarius-driven liveness detection using session log mtime. Agents no longer need to call any heartbeat command — the Castellarius checks whether the tmux pipe-pane session log file has been written to recently, which is a passive signal that requires zero agent cooperation. Changes: - Remove ct droplet heartbeat CLI command and HTTP API endpoint - Remove Heartbeat() from Client and CisternClient interface - Remove last_heartbeat_at column (migration 020 drops it; schema.sql updated) - Remove EventHeartbeat event type and display function - Replace stall detection: session log mtime replaces LastHeartbeatAt - Rename heartbeat → liveness throughout (interval, goroutine, config field) - Remove heartbeat instruction from agent prompt in session.go - Add sessionLogMtime() function alongside isTmuxAlive() - Add migration 020 to drop last_heartbeat_at column - Update README, troubleshooting docs, commands docs - Add 16 liveness regression tests covering exit detection, stall detection, orphan recovery, DB integration, error fallbacks
MichielDean
added a commit
that referenced
this pull request
May 11, 2026
…tion (#507) ## Summary Extract a shared `internal/sessionlog` package so the session log path (`~/.cistern/session-logs/<id>.log`) is resolved in one place instead of three. **Before:** Hard-coded `filepath.Join(home, ".cistern", "session-logs", id+".log")` in: - `internal/cataractae/session.go` — spawn (writes the log) - `internal/castellarius/scheduler.go` — liveness (reads mtime) - `cmd/ct/cistern.go` — peek `--raw` (reads content) **After:** All three use `sessionlog.Path()`, `sessionlog.Mtime()`, `sessionlog.Read()`, and `sessionlog.EnsureDir()`. ## What changed - New `internal/sessionlog` package with `Path()`, `Mtime()`, `Read()`, `EnsureDir()` - `LogDirFn` and `MtimeFn` are exported for test overrides (same pattern as `isTmuxAliveFn`) - Scheduler's `sessionLogMtimeFn` now delegates to `sessionlog.MtimeFn` - CLI peek's `--raw` mode uses `sessionlog.Read()` instead of manual `os.Open` - Cataractae spawn uses `sessionlog.Path()` and `sessionlog.EnsureDir()` - 6 unit tests for the new package - Removed `sessionLogDir` variable from CLI peek tests (uses `sessionlog.LogDirFn` instead) This is a follow-up to #506 (heartbeat removal) and does not depend on it being merged first — it builds on the same `sessionLogMtimeFn` that PR already introduced, just moving it to a shared package. Co-authored-by: Lobsterdog Contributors <noreply@lobsterdog.dev>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ct droplet heartbeat) and replace with Castellarius-driven liveness detection using session log mtimepipe-panesession log has been written to recentlyWhat changed
Removed:
ct droplet heartbeat <id>CLI commandPOST /api/droplets/{id}/heartbeatAPI endpointClient.Heartbeat()method andCisternClient.Heartbeat()interface methodlast_heartbeat_atdatabase column (migration 020 drops it)EventHeartbeatevent type anddisplayInfoHeartbeat()functionsession.goheartbeatfield from stall event payloadHeartbeatIntervalconfig field (renamed toLivenessInterval)heartbeatInterval/heartbeatInProgress/heartbeatRepo(renamed tolivenessInterval/livenessCheck/livenessCheckRepo)Added:
sessionLogMtime()function that checks~/.cistern/session-logs/<repo>-<worker>.logmodification timeupdated_atfor orphans)Why
Agent heartbeats were added when reading agent output was unreliable. Those bugs are now solved, and the heartbeat mechanism:
The session log mtime is a passive signal — the agent is already writing to it via tmux
pipe-pane, so no agent cooperation is needed.Testing
TestEndToEndSchemaVerification