Skip to content

[Phase 10] State Recovery & RDS Migration #15

@kination

Description

@kination

Description

Enhance orchestrator fault-tolerance by rebuilding state and reconciling with remote worker resources.

Tasks

  • Ensure all Task.Process state transitions write to DB audit log
  • Implement startup recovery: rebuild ETS from DB on restart
  • Reconcile pending/running tasks against Worker Plane on restart (re-subscribe K8s Watch, poll Lambda)
  • Add idempotency key to task dispatch to prevent duplicate execution
  • Migrate from SQLite to RDS PostgreSQL (Multi-AZ)
  • Update Ecto repo config for RDS

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions