Skip to content

fix: Extract Maintain() from Load() to eliminate side-effect coupling#126

Merged
toasterbook88 merged 9 commits into
mainfrom
fix/load-maintain-separation
May 18, 2026
Merged

fix: Extract Maintain() from Load() to eliminate side-effect coupling#126
toasterbook88 merged 9 commits into
mainfrom
fix/load-maintain-separation

Conversation

@toasterbook88
Copy link
Copy Markdown
Owner

Addresses review feedback on PR #125.

  • state.Load() no longer performs maintenance or re-saves
  • Maintenance is now explicit via state.Maintain()
  • Daemon, CLI, and tests updated to call Maintain() after Load()

All tests, race tests, lint, and coverage gates pass.

William and others added 8 commits May 17, 2026 23:08
…ir events, and tests

B-1: Remove 6 fmt.Printf DEBUG leaks from internal/execution/guarded.go
B-3: Add cmd/axis/exit_test.go with 100% coverage of Error() and ExitCode()
E-1: Add internal/versioncmp/versioncmp_test.go with 100% coverage
B-4: Add internal/repairs/types.go with minimal RepairEvent types
B-5: Add Version field to ClusterState with version-gated migrations
G-3: Separate runMigrations() (version-gated) from runMaintenance() (always runs)

Also update golden files for state JSON output including version field.

All tests pass. Lint passes. Coverage gates pass (70.7%).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add Config.IsMeshEnabled() helper that preserves backward compat:
  mesh defaults ON when discovery config is absent, follows Enabled
  when explicitly configured.
- Gate mesh creation in daemon.NewDefault() behind IsMeshEnabled().
- Gate mesh startup in cmd/axis/serve.go behind IsMeshEnabled().
- Add unit tests for IsMeshEnabled() and NewDefault() mesh behavior.

Closes H-4.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Production hardening:
- Remove DEBUG fmt.Printf leaks from execution/guarded.go (B-1)
- Add ClusterState.Version with version-gated migrations (B-5)
- Add minimal RepairEvent types in internal/repairs/types.go (B-4)
- Add mesh disable flag via config Discovery.Enabled (H-4)
- Add pprof endpoints to axis serve --pprof (F-1)
- Delete dead Fatal() code from exit.go (H-6)

Authority audits (11 docs):
- docs/authority-reservation.md (AUTH-1)
- docs/authority-identity.md (AUTH-2)
- docs/authority-freshness.md (AUTH-3)
- docs/authority-observations.md (AUTH-4)
- docs/authority-execution.md (AUTH-5)
- docs/authority-config.md (AUTH-6)
- docs/authority-secrets.md (AUTH-7)
- docs/authority-cache.md (AUTH-8)
- docs/authority-observability.md (AUTH-9)
- docs/authority-violations.md (AUTH-11)

Surface clarity:
- docs/lifecycle.md — taxonomy with inheritance rules (C-1)
- Label all 34 internal packages with lifecycle state (C-2)
- Restructure README into stable/experimental sections (C-5)
- Fix stale docs: phase-tracking, current-state, architecture,
  distributed-cognitive-architecture, hybrid-ai-router-plan,
  AGENTS.md (C-6)

Coverage + benchmarks:
- cmd/axis/exit_test.go — 100% coverage (B-3)
- internal/versioncmp/versioncmp_test.go — 100% coverage (E-1)
- internal/mcp/server_tools_test.go — 43.4% → 88.7% (E-2)
- internal/facts/ tests — 70.7% → 74.0% (E-3)
- internal/execution/guarded_test.go — 60.2% → 79.8% (E-4)
- cmd/axis/summary_test.go — dashboard contract tests (E-5)
- internal/placement/placement_bench_test.go (E-6)
- internal/snapshot/snapshot_bench_test.go (E-6)

Structural:
- examples/ directory with nodes.yaml, basic-usage.md,
  reservations.md, daemon-setup.md (H-1)

All tests pass. Lint passes. Coverage gates pass (70.7%).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Features:
- axis reservations list [--json|--ndjson] (D-1)
- axis reservations inspect <id> [--json] (D-2)
- axis reservations release <id> [--force] [--json] (D-3)
- pprof endpoints via axis serve --pprof (F-1)
- Baseline benchmarks and profiles in benchmarks/ (F-2)
- docs/profiling.md profiling workflow documentation (F-3)

Governance:
- docs/lifecycle.md taxonomy with inheritance rules (C-1)
- Label all 34 internal packages with lifecycle state (C-2)
- Label all CLI commands and API routes (C-3, C-4)
- Restructure README into stable/experimental (C-5)
- Fix stale docs: phase-tracking, current-state, architecture,
  distributed-cognitive-architecture, hybrid-ai-router-plan (C-6)
- hack/lifecycle-check.go CI enforcement (C-7)
- docs/authority-transition.md transition protocol (A-10)

Authority audits (10 docs):
- docs/authority-reservation.md (AUTH-1)
- docs/authority-identity.md (AUTH-2)
- docs/authority-freshness.md (AUTH-3)
- docs/authority-observations.md (AUTH-4)
- docs/authority-execution.md (AUTH-5)
- docs/authority-config.md (AUTH-6)
- docs/authority-secrets.md (AUTH-7)
- docs/authority-cache.md (AUTH-8)
- docs/authority-observability.md (AUTH-9)
- docs/authority-violations.md (AUTH-11)

Structural:
- examples/ directory (H-1)
- docs/decisions/dashboard-command.md → KEEP (H-2)
- docs/decisions/v2-reservations-endpoint.md → STABILIZE (H-3)
- Compile-gate safety/structured with build tag (H-5)
- Delete dead Fatal() code (H-6)
- Deduplicate dashboard UI constants (H-7)
- docs/reservations.md operator documentation (D-4)

Optimizations:
- Precompute rank keys → 54% faster for 50 nodes (G-1)
- Pre-lowercase script registry strings (G-7)

All tests pass. Lint passes. Coverage gates pass (70.7%).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- docs/future/reservation-doctor.md (D-5)
- docs/future/consistency-model.md (H-9)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds internal/transport/ssh_bench_test.go with BenchmarkSSHConnectionReuse
and BenchmarkSSHConnectionReusePerCommand.

Results show ~15x per-command speedup from connection reuse. AXIS already
achieves this within each Collect() (30+ commands over one connection).
Cross-cycle reuse would add complexity for marginal gain in typical
deployments; no profile evidence justifies it.

Also converts test helpers in ssh_lifecycle_test.go to accept testing.TB
so they can be reused from benchmarks.

Phase G status: BLOCKED — insufficient evidence for cross-cycle SSH reuse.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per review feedback: Load() should be idempotent and not perform
maintenance as a side effect. Maintenance is now an explicit
state.Maintain() call that callers invoke after Load() when they need
accurate reservation and failure views.

- state.Load() no longer calls runMaintenance() or re-saves
- runMaintenance renamed to exported Maintain() with godoc
- Daemon calls Maintain() after Load() and saves if mutated
- Daemon metadata fallback calls Maintain() before reading
- CLI commands (task context, context show) call Maintain()
- Contract tests call Maintain() after Load()
- State tests updated to call Maintain() after Load()

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…conflict

Conflict: TestLoadClearsLegacyReservationsWithoutExecIDs had a
redundant Save() call in HEAD that origin/main had already fixed.
Resolution: keep origin/main version.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request stabilizes the AXIS substrate by formalizing authority models, introducing a lifecycle taxonomy, and hardening state management. Key updates include establishing the reservation ledger as the canonical authority, implementing schema versioning for cluster state, and adding a pprof profiling flag to the daemon. Documentation is extensively updated with authority audits and decision records. Feedback identifies an unhandled error during state persistence in the daemon's refresh loop and recommends removing state-modifying side effects from the read-only Meta() function to ensure consistency.

I am having trouble creating individual review comments. Click here to see my feedback.

internal/daemon/daemon.go (488)

high

The error returned by st.Save() is being ignored. If saving the state fails after maintenance, the on-disk state will be inconsistent with the in-memory state used by the daemon, which could lead to correctness issues. This error should be handled, at a minimum by logging it, as this occurs in a background refresh loop.

                if err := st.Save(); err != nil {
                    slog.Error("failed to save maintained state", "path", state.Path(), "error", err)
                }

internal/daemon/daemon.go (661)

medium

The Meta() function is intended to be a read-only operation, but it is performing state modification by calling state.Maintain(st). These changes are not persisted as st.Save() is not called, leading to repeated, inefficient maintenance on every call. More importantly, a read-only metadata endpoint should not have side-effects like state modification. The responsibility for maintaining state should lie within the daemon's main refresh loop (doRefresh). This fallback logic can lead to inconsistent views of the system state.

        // state.Maintain(st) // This is a read-only path and should not perform state maintenance.

…ta()

- High: log error when st.Save() fails after maintenance in daemon refresh
- Medium: remove state.Maintain() from Meta() — read-only path should not
  have side effects; maintenance belongs in doRefresh only

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@toasterbook88 toasterbook88 merged commit ed2d516 into main May 18, 2026
8 checks passed
@toasterbook88 toasterbook88 deleted the fix/load-maintain-separation branch May 18, 2026 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant