Skip to content

feat(runtime): implement lease-based authority epochs#14

Merged
simonovic86 merged 1 commit intomainfrom
claude/serene-keller
Mar 5, 2026
Merged

feat(runtime): implement lease-based authority epochs#14
simonovic86 merged 1 commit intomainfrom
claude/serene-keller

Conversation

@simonovic86
Copy link
Owner

Summary

Implement lease-based authority system (Phase 5: Hardening) to enforce time-bounded exclusivity of agent execution across cluster nodes.

Changes

Core Authority System

  • internal/authority/ (new package)
    • authority.go: Epoch ordering logic (MajorVersion + LeaseGeneration) and authority states (ACTIVE_OWNER, HANDOFF_INITIATED, HANDOFF_PENDING, RETIRED, RECOVERY_REQUIRED)
    • lease.go: Lease lifecycle management with renewal, expiry validation, grace periods, and state transitions
    • Comprehensive test coverage for epoch comparison, lease renewal, and state machine transitions

Checkpoint Format Upgrade

  • v0x03 checkpoint format (81-byte header, up from 57 bytes)
    • Added 24 bytes for lease metadata: majorVersion (8B), leaseGeneration (8B), leaseExpiry (8B)
    • Backward compatible with v0x02 checkpoints (zero epoch returned)
    • Updated ParseCheckpointHeader() to support both versions
    • Added golden fixture checkpoint_v3.bin for format testing

Agent & Runtime Integration

  • internal/agent/instance.go

    • Added Lease *authority.Lease field to Instance
    • Checkpoint save/load includes lease epoch and expiry metadata
    • Supports checkpoint v0x02→v0x03 migration transparently
  • cmd/igord/main.go

    • New CLI flags: --lease-duration, --lease-grace
    • Lease initialization on agent startup and before each tick
    • Pre-tick lease validation with automatic recovery on expiry
  • internal/runner/runner.go

    • Added CheckAndRenewLease() and HandleLeaseExpiry() for lease lifecycle

Configuration & Services

  • internal/config/config.go: Lease duration, renewal window, and grace period config with validation
  • internal/migration/service.go: Lease config passed through migration service
  • internal/inspector/inspector.go: Checkpoint inspection displays epoch and lease expiry

Safety Properties (EI-6)

  • Safety over liveness: Expired leases (beyond grace period) trigger RECOVERY_REQUIRED state, halting all ticks
  • Automatic renewal: Agents renew leases before expiry (configurable threshold)
  • Graceful degradation: Leases are optional (Duration=0 disables); zero values in checkpoints for legacy compatibility

Testing

  • Unit tests for Epoch ordering, LeaseConfig validation, lease renewal, state transitions
  • Integration tests with checkpoint save/restore including v0x02→v0x03 compatibility
  • Golden checkpoint fixtures for format validation
  • Multi-node migration tests updated for v0x03 format

Add time-bounded lease authority so nodes must hold a valid, non-expired
lease to tick an agent. Leases auto-renew locally; expiry triggers
RECOVERY_REQUIRED state (EI-6: safety over liveness). This is the
foundation of Phase 5 (Hardening).

- New internal/authority/ package: Epoch (MajorVersion, LeaseGeneration),
  State machine (5 states), Lease lifecycle with injectable clock
- Checkpoint format bumped to v0x03 (81-byte header): adds epoch and
  lease expiry fields, backward-compatible with v0x02
- Pre-tick lease validation in runner with auto-renewal
- Migration advances epoch (MajorVersion+1) with handoff/retired
  state transitions
- CLI flags: --lease-duration (60s), --lease-grace (10s)
- Inspector displays epoch and lease metadata for v0x03 checkpoints

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@simonovic86 simonovic86 merged commit 0c253dd into main Mar 5, 2026
1 check passed
@simonovic86 simonovic86 deleted the claude/serene-keller branch March 5, 2026 04:02
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ea8a33ce8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

RenewalWindow: cfg.LeaseRenewalWindow,
GracePeriod: cfg.LeaseGracePeriod,
}
instance.Lease = authority.NewLease(leaseCfg)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve checkpoint epoch when initializing local lease

This always creates a fresh lease with NewLease, which resets authority to epoch (0,0) even when LoadCheckpointFromStorage just loaded a v0x03 checkpoint containing a higher epoch from prior renewals or migrations. After a restart, the node can therefore emit regressed epochs on later migrations, breaking monotonic authority ordering and weakening the anti-clone guarantees introduced by this change; initialize from the checkpoint epoch instead of unconditionally bootstrapping a new one.

Useful? React with 👍 / 👎.

Comment on lines +165 to +166
MajorVersion: epoch.MajorVersion,
LeaseGeneration: epoch.LeaseGeneration,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use active lease epoch when building migration package

MajorVersion/LeaseGeneration are taken from the stored checkpoint, but the target-side migration path advances lease epoch only in memory and does not immediately persist that new epoch. In a rapid A→B→C handoff (before B writes another checkpoint), B will still package A’s old epoch from disk and C will receive a non-incremented major version, causing epoch rollback across hops; package the epoch from the live instance lease or persist the updated checkpoint before migration.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant