fix: Extract Maintain() from Load() to eliminate side-effect coupling#126
Conversation
…ir events, and tests B-1: Remove 6 fmt.Printf DEBUG leaks from internal/execution/guarded.go B-3: Add cmd/axis/exit_test.go with 100% coverage of Error() and ExitCode() E-1: Add internal/versioncmp/versioncmp_test.go with 100% coverage B-4: Add internal/repairs/types.go with minimal RepairEvent types B-5: Add Version field to ClusterState with version-gated migrations G-3: Separate runMigrations() (version-gated) from runMaintenance() (always runs) Also update golden files for state JSON output including version field. All tests pass. Lint passes. Coverage gates pass (70.7%). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add Config.IsMeshEnabled() helper that preserves backward compat: mesh defaults ON when discovery config is absent, follows Enabled when explicitly configured. - Gate mesh creation in daemon.NewDefault() behind IsMeshEnabled(). - Gate mesh startup in cmd/axis/serve.go behind IsMeshEnabled(). - Add unit tests for IsMeshEnabled() and NewDefault() mesh behavior. Closes H-4. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Production hardening: - Remove DEBUG fmt.Printf leaks from execution/guarded.go (B-1) - Add ClusterState.Version with version-gated migrations (B-5) - Add minimal RepairEvent types in internal/repairs/types.go (B-4) - Add mesh disable flag via config Discovery.Enabled (H-4) - Add pprof endpoints to axis serve --pprof (F-1) - Delete dead Fatal() code from exit.go (H-6) Authority audits (11 docs): - docs/authority-reservation.md (AUTH-1) - docs/authority-identity.md (AUTH-2) - docs/authority-freshness.md (AUTH-3) - docs/authority-observations.md (AUTH-4) - docs/authority-execution.md (AUTH-5) - docs/authority-config.md (AUTH-6) - docs/authority-secrets.md (AUTH-7) - docs/authority-cache.md (AUTH-8) - docs/authority-observability.md (AUTH-9) - docs/authority-violations.md (AUTH-11) Surface clarity: - docs/lifecycle.md — taxonomy with inheritance rules (C-1) - Label all 34 internal packages with lifecycle state (C-2) - Restructure README into stable/experimental sections (C-5) - Fix stale docs: phase-tracking, current-state, architecture, distributed-cognitive-architecture, hybrid-ai-router-plan, AGENTS.md (C-6) Coverage + benchmarks: - cmd/axis/exit_test.go — 100% coverage (B-3) - internal/versioncmp/versioncmp_test.go — 100% coverage (E-1) - internal/mcp/server_tools_test.go — 43.4% → 88.7% (E-2) - internal/facts/ tests — 70.7% → 74.0% (E-3) - internal/execution/guarded_test.go — 60.2% → 79.8% (E-4) - cmd/axis/summary_test.go — dashboard contract tests (E-5) - internal/placement/placement_bench_test.go (E-6) - internal/snapshot/snapshot_bench_test.go (E-6) Structural: - examples/ directory with nodes.yaml, basic-usage.md, reservations.md, daemon-setup.md (H-1) All tests pass. Lint passes. Coverage gates pass (70.7%). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Features: - axis reservations list [--json|--ndjson] (D-1) - axis reservations inspect <id> [--json] (D-2) - axis reservations release <id> [--force] [--json] (D-3) - pprof endpoints via axis serve --pprof (F-1) - Baseline benchmarks and profiles in benchmarks/ (F-2) - docs/profiling.md profiling workflow documentation (F-3) Governance: - docs/lifecycle.md taxonomy with inheritance rules (C-1) - Label all 34 internal packages with lifecycle state (C-2) - Label all CLI commands and API routes (C-3, C-4) - Restructure README into stable/experimental (C-5) - Fix stale docs: phase-tracking, current-state, architecture, distributed-cognitive-architecture, hybrid-ai-router-plan (C-6) - hack/lifecycle-check.go CI enforcement (C-7) - docs/authority-transition.md transition protocol (A-10) Authority audits (10 docs): - docs/authority-reservation.md (AUTH-1) - docs/authority-identity.md (AUTH-2) - docs/authority-freshness.md (AUTH-3) - docs/authority-observations.md (AUTH-4) - docs/authority-execution.md (AUTH-5) - docs/authority-config.md (AUTH-6) - docs/authority-secrets.md (AUTH-7) - docs/authority-cache.md (AUTH-8) - docs/authority-observability.md (AUTH-9) - docs/authority-violations.md (AUTH-11) Structural: - examples/ directory (H-1) - docs/decisions/dashboard-command.md → KEEP (H-2) - docs/decisions/v2-reservations-endpoint.md → STABILIZE (H-3) - Compile-gate safety/structured with build tag (H-5) - Delete dead Fatal() code (H-6) - Deduplicate dashboard UI constants (H-7) - docs/reservations.md operator documentation (D-4) Optimizations: - Precompute rank keys → 54% faster for 50 nodes (G-1) - Pre-lowercase script registry strings (G-7) All tests pass. Lint passes. Coverage gates pass (70.7%). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- docs/future/reservation-doctor.md (D-5) - docs/future/consistency-model.md (H-9) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds internal/transport/ssh_bench_test.go with BenchmarkSSHConnectionReuse and BenchmarkSSHConnectionReusePerCommand. Results show ~15x per-command speedup from connection reuse. AXIS already achieves this within each Collect() (30+ commands over one connection). Cross-cycle reuse would add complexity for marginal gain in typical deployments; no profile evidence justifies it. Also converts test helpers in ssh_lifecycle_test.go to accept testing.TB so they can be reused from benchmarks. Phase G status: BLOCKED — insufficient evidence for cross-cycle SSH reuse. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per review feedback: Load() should be idempotent and not perform maintenance as a side effect. Maintenance is now an explicit state.Maintain() call that callers invoke after Load() when they need accurate reservation and failure views. - state.Load() no longer calls runMaintenance() or re-saves - runMaintenance renamed to exported Maintain() with godoc - Daemon calls Maintain() after Load() and saves if mutated - Daemon metadata fallback calls Maintain() before reading - CLI commands (task context, context show) call Maintain() - Contract tests call Maintain() after Load() - State tests updated to call Maintain() after Load() Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…conflict Conflict: TestLoadClearsLegacyReservationsWithoutExecIDs had a redundant Save() call in HEAD that origin/main had already fixed. Resolution: keep origin/main version. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request stabilizes the AXIS substrate by formalizing authority models, introducing a lifecycle taxonomy, and hardening state management. Key updates include establishing the reservation ledger as the canonical authority, implementing schema versioning for cluster state, and adding a pprof profiling flag to the daemon. Documentation is extensively updated with authority audits and decision records. Feedback identifies an unhandled error during state persistence in the daemon's refresh loop and recommends removing state-modifying side effects from the read-only Meta() function to ensure consistency.
I am having trouble creating individual review comments. Click here to see my feedback.
internal/daemon/daemon.go (488)
The error returned by st.Save() is being ignored. If saving the state fails after maintenance, the on-disk state will be inconsistent with the in-memory state used by the daemon, which could lead to correctness issues. This error should be handled, at a minimum by logging it, as this occurs in a background refresh loop.
if err := st.Save(); err != nil {
slog.Error("failed to save maintained state", "path", state.Path(), "error", err)
}internal/daemon/daemon.go (661)
The Meta() function is intended to be a read-only operation, but it is performing state modification by calling state.Maintain(st). These changes are not persisted as st.Save() is not called, leading to repeated, inefficient maintenance on every call. More importantly, a read-only metadata endpoint should not have side-effects like state modification. The responsibility for maintaining state should lie within the daemon's main refresh loop (doRefresh). This fallback logic can lead to inconsistent views of the system state.
// state.Maintain(st) // This is a read-only path and should not perform state maintenance.…ta() - High: log error when st.Save() fails after maintenance in daemon refresh - Medium: remove state.Maintain() from Meta() — read-only path should not have side effects; maintenance belongs in doRefresh only Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses review feedback on PR #125.
All tests, race tests, lint, and coverage gates pass.