Status: updated 2026-03-10 after reading
ROADMAP.md,ROADMAP_METAL.md,EXO_UNIFIED_INTEGRATION_PLAN.md, and../../../docs/audits/2026-03-09-psionic-exo-cluster-integration-audit.md, after confirming that the cluster-adjacent substratePSI-148,PSI-160throughPSI-175, andPSI-179throughPSI-183is already landed onmain, after confirming that the former NVIDIA local-runtime gate#3276->#3288->#3248is now closed on GitHub, after confirming that the active native Metal GPT-OSS gate remains#3286->#3285->#3269->#3262, after landingPSI-184/#3289in64c2a8fc6andPSI-185/#3290inf2e758720, after landingPSI-186/#3291incc60eea89, after openingPSI-188throughPSI-197as#3297through#3306, after landingPSI-187/#3292throughPSI-190/#3299in2acc2ecf6, after landingPSI-191/#3300inad6891b82, after confirming thatPSI-192/#3301in327944c08, after landingPSI-193/#3302ind88d284c5, after landingPSI-194/#3303infa7523ada, after landingPSI-195/#3304in1cdcf3058, after landingPSI-196/#3305in7124eefd7, after landingPSI-197/#3306ind424ab1cf, after openingPSI-198throughPSI-201as#3307through#3310for the operator-managed multi-subnet follow-on queue, after landingPSI-198/#3307in011e0452c, after landingPSI-199/#3308in87d428e43, after landingPSI-200/#3309in86a2c920a, after landingPSI-201/#3310inac9dd2285, after openingPSI-202throughPSI-205as#3311through#3314for the coordinator-authority multi-subnet follow-on queue, after landingPSI-202/#3311in1e65c56c9, after landingPSI-203/#3312inddc092cbb, after landingPSI-204/#3313in313fbdc25, after landingPSI-205/#3314in4732fbc26, after openingPSI-206throughPSI-209as#3315through#3318for the command-authorization and payout-provenance follow-on queue, after landingPSI-206/#3315ine6888aaa0, after landingPSI-207/#3316in7b7b681f7, after openingPSI-210throughPSI-213as#3319through#3322for the compute-market trust hardening follow-on queue, after landingPSI-210/#3319in37fb246f1, after landingPSI-211/#3320in4a21d6947, after landingPSI-212/#3321ind0f3e7891, after landingPSI-213/#3322inb0601f662, after openingPSI-214throughPSI-216as#3323through#3325for the wider-network discovery follow-on queue, after landingPSI-214/#3323in1102bffa4, after landingPSI-215/#3324in47410298a, after landingPSI-216/#3325in7c0f34503, after openingPSI-217throughPSI-219as#3329through#3331for the post-E1 follow-on queue, after landingPSI-217/#3329in7aa76a2a9, after landingPSI-218/#3330as the explicit decision memo inEXO_INTEROPERABILITY_DECISION.md, after landingPSI-219/#3331in98dc1bdc3, after openingPSI-220throughPSI-222as#3332through#3334for the cluster benchmark-receipt follow-on queue, after landingPSI-220/#3332in4f64525b4, after landingPSI-221/#3334ina524658b8, after landingPSI-222/#3333in3fe872c96, after openingPSI-223throughPSI-225as#3335through#3337for the declared cluster capability-profile follow-on queue, after landingPSI-223/#3335in37183c6cb, after landingPSI-224/#3336in9aad9af8d, after landingPSI-225/#3337inefa52005e, after openingPSI-226throughPSI-228as#3341,#3339, and#3340for the advertised capability-profile publication follow-on queue, after openingPSI-229throughPSI-231as#3343,#3344, and#3342for the cluster trust- publication follow-on queue, after landingPSI-230/#3344in0dc54768f, after landingPSI-231/#3342in the docs closeout commit that adds the cluster trust-publication validation drill, and after checking live GitHub issue search so this roadmap reflects the current GitHub queue rather than local placeholders.This is the live roadmap for truthful Psionic cluster support in
crates/psionic-*. It is intentionally narrower thandocs/ROADMAP.md: it is about making cluster control, placement, scheduling, and later multi-node execution honest and machine-checkable, not about the full Psionic replacement program.
Agent execution instruction: implement this roadmap in dependency order, not by
raw local ID ordering and not by whichever backend issue looks most exciting at
the moment. Work the GitHub-backed queue below one issue at a time, keep the
local PSI-* IDs aligned with the real issue numbers, and update this document
after each cluster issue lands so it reflects the new GitHub state, landed
commit hash, current backend gate, and remaining cluster queue.
Reference-first instruction: cluster work must not be implemented from memory. Choose the reference that owns the layer being changed:
- start with
~/code/exofor cluster-control architecture, ordered event routing, catchup, membership, topology-aware placement, and node-role ideas - start with
~/code/tinygradfor execution-plan caching, replayable runtime policy, queueing, and machine-checkable execution evidence - start with
~/code/llama.cppand~/code/gpt-ossfor backend-specific distributed execution truth, model eligibility constraints, and the exact local GPT-OSS execution lane that cluster work must not overclaim - start with
docs/ROADMAP.mdfor local-runtime readiness, capability/evidence surfaces, and the current CUDA execution gate - start with
docs/ROADMAP_METAL.mdfor the current Apple refusal boundary; do not treat Metal as cluster-eligible just because generic Metal substrate exists
Psionic-only execution rule: these references are design, behavior, and performance oracles only. Do not shell out to, proxy through, FFI-wrap, or otherwise delegate cluster execution to Exo or any other external runtime when closing issues in this roadmap. The shipped cluster path must remain Psionic- owned end to end.
Make Psionic's own cluster support truthful enough to become a real OpenAgents execution surface:
- cluster control lives in
crates/psionic-*, not app glue - cluster identity, membership, topology, placement, and scheduling are explicit and replayable
- the first shipped cluster scope is a trusted same-network LAN cluster with explicit admission, not an internet-wide compute-market cluster
- cluster behavior is exposed through existing Psionic capability, receipt, and evidence seams rather than through a hidden side channel
- scheduling lands before replication, and replication lands before real sharded execution
- any later Exo interoperability remains optional and never becomes the only execution path
This is not a plan to "add generic distributed systems" to Psionic.
This is also not a license to widen backend claims. Cluster support only counts when the selected backend lane is already truthful locally and the cluster layer can prove what topology was promised, selected, and delivered.
This roadmap must continue to respect docs/OWNERSHIP.md:
crates/psionic-*owns reusable cluster identity, transport, ordered state, topology, placement, scheduling, evidence, and execution truthapps/autopilot-desktopowns app-level cluster controls, UX, onboarding, status presentation, and product orchestrationcrates/wgpui*must not absorb Psionic cluster business logic- cluster work must not move wallet, payout, mission, or app-pane behavior into
crates/psionic-*
docs/ROADMAP.md already tracks the full Psionic program, and
docs/EXO_UNIFIED_INTEGRATION_PLAN.md already captures the full
cluster design shape. What is still missing is a roadmap-style, dependency-
ordered cluster queue that matches the format used by the main and Metal
roadmaps.
As of 2026-03-10, the current issue reality is:
- the first dedicated cluster queue now exists on GitHub
- the next cluster phases now also exist on GitHub
PSI-188/ #3297 throughPSI-197/ #3306 are landed onmain- the first multi-subnet follow-on queue is now landed on
main - the coordinator-authority multi-subnet queue is now landed on
main - the command authorization and payout provenance queue is now landed on
main - the compute-market trust hardening follow-on queue is now landed on
main - the wider-network discovery follow-on queue is now landed on
main - the post-E1 follow-on queue is now landed on
main - the benchmark-receipt follow-on queue is now open on GitHub
- the declared cluster capability-profile follow-on queue is now landed on
main - the advertised capability-profile publication follow-on queue is now landed on
main - the cluster trust-publication follow-on queue is now landed on
main
- the current backend execution gates are still real and must remain visible
- former NVIDIA gate:
#3276->#3288->#3248is closed onmain - Metal:
#3286->#3285->#3269->#3262
- former NVIDIA gate:
That is now concrete enough that cluster work deserves its own roadmap.
main already includes the cluster-adjacent substrate this roadmap will build
on:
PSI-148/ #3232- minimum hardware validation matrix and claim IDs
PSI-160/ #3220- local-serving isolation policy truth
PSI-161/ #3171- backend-neutral fallback lattice
PSI-162/ #3233- served-artifact identity and reproducibility tuples
PSI-163/ #3234- cache invalidation and persisted-state upgrade policy
PSI-164/ #3235- provenance and license gating for local artifacts and compute-market supply
PSI-171/ #3223- selected-device inventory qualifiers and backend-toolchain truth
PSI-172/ #3224- execution-profile, queue-policy, and throughput-class truth
PSI-173/ #3225ExecutionTopologyPlan,selected_devices, and stable topology digests
PSI-174/ #3226- execution-plan cache policy, kernel-cache policy, and compile-path evidence
PSI-175/ #3227- delivery-proof and settlement-linkage inputs
PSI-179throughPSI-183- truthful local GPT-OSS/NVIDIA enablement now exists on
main
- truthful local GPT-OSS/NVIDIA enablement now exists on
PSI-184/ #3289- landed in
64c2a8fc6 - initial
psionic-clustercrate, trusted-LAN namespace/admission config, typed UDPhello/pinghandshake, generatedClusterId/NodeId, surfaced node-role truth, and integration coverage proving seeded local nodes discover each other without claiming scheduling or execution behavior
- landed in
PSI-185/ #3290- landed in
f2e758720 - first-class
ClusterNamespace,AdmissionToken,ClusterAdmissionConfig,NodeEpoch, and file-backed identity persistence; explicit cross-cluster, admission-mismatch, and stale-epoch refusal diagnostics; and integration coverage proving node identity survives restart while epoch truth advances
- landed in
PSI-186/ #3291- landed in
cc60eea89 - typed
ClusterCommand,LocalClusterEvent,ClusterEvent,ClusterElectionMessage, andClusterConnectionFactschemas plus a Psionic-ownedClusterEventLog, contiguous indexed apply discipline, replayableClusterState/ClusterSnapshot, stable state digests, and unit coverage for ordering, replay, and out-of-order refusal
- landed in
PSI-187/ #3292- landed in
2acc2ecf6 - catchup requests and responses, compacted snapshots, bounded replay tails, full-resync versus snapshot-install recovery dispositions, and unit coverage proving rejoin and schema-mismatch recovery semantics
- landed in
PSI-188/ #3297- landed in
2acc2ecf6 - cluster topology, link-class, transport, backend-readiness, and node telemetry facts now live in authoritative cluster state with separate topology digests and replay coverage
- landed in
PSI-189/ #3298- landed in
2acc2ecf6 - artifact residency and cluster staging truth now live beside topology facts as separate digests and explicit residency status records rather than hidden scheduler assumptions
- landed in
PSI-190/ #3299- landed in
2acc2ecf6 - runtime-owned
ClusterExecutionContextnow flows throughpsionic-runtime,psionic-serve, and provider capability and receipt surfaces with policy digests, selected nodes, residency posture, transport class, and fallback history
- landed in
PSI-191/ #3300- landed in
ad6891b82 psionic-clusternow provides a deterministic whole-request remote scheduler that consumes authoritative membership, telemetry, transport, and artifact-residency facts, emits truthful single-nodeExecutionTopologyPlanoutput plus runtime-owned cluster execution evidence, and surfaces explicit machine-checkable refusal and degraded-path diagnostics
- landed in
PSI-192/ #3301- landed in
327944c08 psionic-clusternow provides explicit cluster serving policy for queue discipline, prefill-versus-decode fairness, cancellation propagation, slow-node backpressure, and reroute behavior on top of truthful whole-request scheduling, whilepsionic-runtimenow carries serving policy digests and fallback reasons for those cluster-routing outcomes
- landed in
PSI-193/ #3302- landed in
d88d284c5 psionic-clusternow provides truthful replicated serving for one lane, including stable replica-lane identity, warm-state snapshots, lifecycle policy digests, warm-standby versus selected routing truth, deterministic refusal when warm replica count is insufficient, and replicatedExecutionTopologyPlanoutput;psionic-runtimeandpsionic-providernow surface replica-state digests, replica-node evidence, and replicated topology/device truth consistently through delivered execution, capability, and receipt surfaces
- landed in
PSI-194/ #3303- landed in
fa7523ada psionic-clusternow provides a first homogeneous CUDA layer-sharded lane with deterministic multi-node placement, explicit activation and KV handoff facts plus estimated bytes-per-token, refusal when shard geometry, transport, or artifact readiness is insufficient, and truthfulExecutionTopologyPlan::layer_shardedoutput;psionic-runtimeandpsionic-providernow preserve shard handoff evidence and layer-sharded topology truth through delivered execution and receipt surfaces
- landed in
PSI-195/ #3304- landed in
1cdcf3058 psionic-clusternow provides a first homogeneous CUDA tensor-sharded lane with deterministic tensor-axis partitioning, explicit model- eligibility and mesh-transport policy truth, transport-policy digests, tensor-collective handoff evidence, refusal when backend, geometry, or mesh policy is insufficient, and truthfulExecutionTopologyPlan::tensor_shardedoutput;psionic-runtimeandpsionic-providernow preserve tensor partition facts through delivered execution and receipt surfaces
- landed in
PSI-196/ #3305- landed in
7124eefd7 psionic-clusternow ships a reusable integration validation matrix, a restart/rejoin transport test, fault-injected recovery/scheduling/ replication/sharding coverage, a release benchmark gate script for cluster planners, and an operator runbook indocs/CLUSTER_VALIDATION_RUNBOOK.md
- landed in
PSI-197/ #3306- landed in
d424ab1cf psionic-clusternow exposes machine-checkable trust posture viaClusterTrustPolicyandConfiguredClusterPeer, persists node signing identity beside node ID and epoch, authenticates configured peers with signed control-plane messages, rejects duplicate replay counters, and extends transport coverage plus the validation runbook to prove signed configured-peer discovery, unknown-peer refusal, tamper refusal, and replay refusal without pretending the default cluster posture is now internet-safe
- landed in
PSI-198/ #3307- landed in
011e0452c psionic-clusternow ships a persistedClusterOperatorManifestwith a stable rollout digest, JSON load/store, manifest-to-LocalClusterConfigconversion, and transport coverage proving authenticated configured-peer nodes can boot from manifests instead of ad hoc code-only config
- landed in
PSI-199/ #3308- landed in
87d428e43 ordered_statenow ships authenticated catchup and snapshot envelopes with stable recovery digests, signer/requester verification, replay refusal, and tests proving signed recovery succeeds while tampered or replayed recovery envelopes are refused explicitly
- landed in
PSI-200/ #3309- landed in
86a2c920a psionic-clusternow exposes explicit configured-peer dial policy and health snapshots with backoff, degraded and unreachable reachability posture, and transport coverage proving configured peers degrade honestly when absent and recover cleanly when they later join
- landed in
PSI-201/ #3310- landed in
ac9dd2285 psionic-clusternow exposes trust-bundle versioning, accepted rollout overlap windows, previous-key acceptance for configured peers, and machine-checkable rollout diagnostics for accepted overlap and stale-bundle refusal, with transport coverage proving key rotation and stale-bundle drift are surfaced honestly
- landed in
PSI-202/ #3311- landed in
1e65c56c9 ordered_statenow exposes explicit coordinator lease policy, lease-aware leadership truth, effective-versus-stale coordinator queries, stable stale-leader diagnostics, lease-aware state digests, and runbook-backed validation for operator-managed multi-subnet coordinator freshness claims
- landed in
PSI-203/ #3312- landed in
ddc092cbb ordered_statenow exposes a reusable election-term vote ledger, explicit conflicting-vote refusal, explicit same-term split-brain leader refusal, and an authoritative-state guard that refuses conflictingLeadershipReconciledevents instead of silently switching coordinators in one term
- landed in
PSI-204/ #3313- landed in
313fbdc25 - current clustered execution evidence now carries coordinator term, commit
index, fence token, and authority digest truth through
psionic-runtime,psionic-cluster, andpsionic-provider, while sharded and whole-request schedules now attach authority digests so stale coordinators cannot present current commit authority implicitly after failover
- landed in
PSI-205/ #3314- landed in
4732fbc26 cluster_validation_matrixand the operator runbook now cover stale-leader diagnostics, same-term split-brain refusal, and fenced coordinator failover rotation, so the coordinator-authority queue now has explicit validation drills and fail conditions instead of code-only claims
- landed in
PSI-206/ #3315- landed in
e6888aaa0 ordered_statenow exposes typed cluster-command authority scopes, operator-managed authorization policy digests, machine-checkable refusal codes, and coordinator-override versus self/peer/member authorization decisions with stable digests and unit coverage for coordinator-only, self- scoped, peer-scoped, and membership-status-gated command submission
- landed in
PSI-207/ #3316- landed in
7b7b681f7 IndexedClusterEvent,ClusterSnapshot, andClusterStatenow retain command authorization provenance for memberships, links, telemetry, artifact residency, and leadership facts; compaction, catchup, and snapshot recovery preserve that truth; and unit coverage proves replay and snapshot-install recovery keep provenance intact
- landed in
This is a real baseline. The cluster roadmap is not starting from zero.
The checked-in repo is not yet a cluster runtime, but it is already shaped for one:
- Psionic already has explicit topology and device-selection truth via
ExecutionTopologyPlan,selected_devices, stable digests, and provider receipt/capability surfaces - Psionic already has artifact identity, cache invalidation, provenance, hardware validation, and delivery-proof substrate that a cluster lane can extend instead of replacing
- there is now a
psionic-clustercrate with trusted-LAN hello/ping discovery, persistent node identity, explicit admission policy, machine-checkable join refusals, replayable ordered state, catchup, snapshots, compaction, recovery, topology and telemetry facts, artifact residency truth, cluster execution evidence seams, truthful remote whole-request scheduling, explicit cluster queue/fairness/backpressure policy, truthful replicated serving for one lane, and first homogeneous CUDA layer-sharded and tensor-sharded lanes with explicit handoff truth, but there is still no widened-backend sharding path - the first honest cluster scope remains a trusted same-network LAN cluster with explicit namespace/admission policy, not an adversarial compute-market fabric
- there is now an explicit operator-managed configured-peer posture for wider networks, but it is opt-in, signed, replay-protected, and still not an internet-wide adversarial trust model
- real cluster execution claims must remain gated on a stable local backend lane
rather than on design-doc optimism
- first truthful lane is now homogeneous CUDA GPT-OSS, with
#3276,#3288, and#3248closed onmain - current Metal GPT-OSS nodes remain explicit refusal candidates until the Metal roadmap queue closes
- first truthful lane is now homogeneous CUDA GPT-OSS, with
That means the next cluster work is not "make sharding happen somehow." It is:
- keep the replicated and layer-sharded lanes truthful and measurable
- keep tensor sharding bounded by explicit transport and model-eligibility refusal boundaries
- reuse the existing evidence seams instead of inventing a side channel
- widen execution claims only after the corresponding backend truth exists
This roadmap explicitly adopts the conclusions from:
docs/EXO_UNIFIED_INTEGRATION_PLAN.md../../../docs/audits/2026-03-09-psionic-exo-cluster-integration-audit.mddocs/ROADMAP.mddocs/ROADMAP_METAL.md
Exo contributes the right architecture ideas:
- typed topic separation
- ordered event logs with catchup
- leader-ordered global state
- namespace-based cluster isolation
- topology-aware placement
- coordinator-only versus execution-capable roles
Practical roadmap consequence:
- cluster work should port these semantics into Psionic-owned types and tests
- cluster work must not make Exo a required runtime dependency
The main roadmap already landed the reusable pieces cluster support needs:
- device inventory qualifiers
- execution-topology planning
- artifact identity and cache invalidation
- provider and receipt evidence
- backend validation and fallback policy
Practical roadmap consequence:
- cluster support should extend those contracts
- cluster support should not invent a second hidden truth system
The Metal roadmap now makes the refusal boundary explicit:
- current Metal GPT-OSS still has correctness and architecture blockers
- same-host throughput closure is not yet honest enough for cluster eligibility
Practical roadmap consequence:
- current Metal GPT-OSS nodes should be refused for cluster execution
- do not use cluster work to smuggle Metal GPT-OSS readiness claims into the product
Tracked by PSI-184 through PSI-187, now landed on main.
Current truth:
psionic-clusternow owns reusable cluster transport, identity, and ordered state substrate- persistent
ClusterId,NodeId,NodeEpoch, and node-role truth now exist - authoritative ordered history, catchup, snapshots, compaction, and recovery semantics are now explicit and replayable
Required outcome:
- the next phases should consume this control-plane substrate rather than rebuilding cluster truth inside scheduling code
Tracked by PSI-188 through PSI-190, now landed on main.
Current truth:
- authoritative cluster state now carries topology, link-class, transport, and node telemetry facts with stable digests
- artifact residency and placement readiness are now separate cluster truths
- provider capabilities and receipts can now carry cluster-specific digests, selected nodes, residency posture, transport class, and fallback history
Required outcome:
- the next scheduling phase should consume these facts to make remote-node routing decisions truthful rather than purely local
Tracked by PSI-191 through PSI-193, now landed on main.
Current truth:
- Psionic can now choose one best remote node for whole-request execution and express the result as truthful single-node topology and cluster evidence
- cluster serving policy is now explicit for queue discipline, decode fairness, cancellation propagation, slow-node backpressure, and reroute/refusal behavior across whole-request candidates
- replicated serving now exists for one lane with stable replica identity, warm-state snapshots, lifecycle-policy digests, and explicit selected versus warm-standby routing truth reflected in runtime and provider evidence
Required outcome:
- keep replicated routing truthful as the baseline operational scale-out mode while the next phase adds real sharded execution rather than overloading replica evidence with sharding claims
- preserve explicit degraded and refusal diagnostics when replica warm-state or lifecycle policy is insufficient for honest replicated service
Tracked by PSI-194 / #3303
and PSI-195 / #3304.
Current truth:
- one homogeneous CUDA layer-sharded lane now exists with deterministic placement, explicit activation and KV handoff truth, bytes-per-token estimates, and provider/runtime evidence for layer boundaries and handoff transport
- one homogeneous CUDA tensor-sharded lane now also exists with deterministic tensor-axis partitioning, explicit model-eligibility truth, explicit mesh transport policy, tensor-collective handoff evidence, and refusal of unsupported backend, ineligible tensor geometry, or unsuitable shard mesh
Required outcome:
- keep both sharded lanes bounded by the new validation matrix, benchmark gate, and operator runbook rather than letting the roadmap outrun the evidence
- continue refusing unsupported cluster sharding explicitly instead of collapsing to whole-request or replica-routed claims
Tracked by landed PSI-196 / #3305
and landed PSI-197 / #3306.
Current truth:
- there is now a reusable cluster validation matrix, fault-injected coverage, a release benchmark gate, and an operator runbook for both the shipped trusted-LAN scope and the widened authenticated configured-peer posture
- authenticated cluster membership now exists through machine-checkable trust policy, configured peers, signed control-plane messages, and replay protection
- adversarial or compute-market trust claims are still out of scope
Required outcome:
- keep the validation and benchmark assets authoritative for both trust postures instead of letting rollout claims outrun the tests
- keep wider trust claims bounded to operator-managed configured peers until a new GitHub-backed queue proves anything stronger
Tracked by landed PSI-198 / #3307
through PSI-201 / #3310.
Current truth:
- authenticated configured-peer posture now has a reusable operator manifest and rollout digest instead of relying entirely on hand-built Rust config
- catchup and snapshot payloads are now signed, digest-checked, and replay- checked for authenticated recovery paths
- configured peers now carry explicit dial policy, backoff, and degraded or unreachable reachability truth instead of looking like implicit LAN retries
- trust-bundle versioning, previous-key overlap, and stale-bundle rollout diagnostics are now explicit and machine-checkable for operator-managed configured-peer clusters
Required outcome:
- keep the operator runbook authoritative for manifest, recovery, dial-health, and rotation drills rather than widening trust claims without evidence
Tracked by landed PSI-202 / #3311
through landed PSI-205 / #3314.
Current truth:
- operator-managed configured-peer clusters now have manifest, signed recovery, dial-health, and trust-rollout truth
ordered_statenow has explicit coordinator lease policy, lease-aware leadership records, effective-versus-stale coordinator queries, stable stale-leader diagnostics, reusable election-term vote ledger, and same-term split-brain refusal onmain- clustered execution evidence now also carries coordinator term, commit index,
fence token, and authority digest truth on
main - operator validation drills for fenced coordinator turnover now exist in the
validation matrix and runbook on
main - that means wider operator-managed clusters still depend on implicit multi-subnet assumptions only to the extent that any stronger future claim now needs a new GitHub-backed queue rather than a placeholder extension here
Required outcome:
- keep the new failover drill authoritative while the next queue adds typed command authorization and payout-grade provenance rather than silently inferring who was allowed to mutate authoritative state
Tracked by landed PSI-206 / #3315
through PSI-209 / #3318,
with the full D3 queue now landed on main.
Current truth:
- authenticated operator-managed clusters now have signed transport, replay
protection, manifest/dial/rollout truth, coordinator lease state, split-
brain refusal, and fenced commit authority on
main ClusterCommandnow carries typed authority scopes and is paired with an operator-managed authorization policy, stable command/policy digests, coordinator override, and machine-checkable refusal diagnostics onmain- authoritative ordered events, snapshots, and recovered cluster state now also retain command-authorization provenance for the current facts they expose
- clustered execution evidence and settlement-linkage inputs now also retain bounded command/admission provenance, including scheduler membership, selected-node membership, artifact-residency authorization, and leadership fence truth
- the validation matrix and operator runbook now also cover allowed/refused authorization flows, payout-facing settlement provenance, and the sharded provenance merge path
- no open cluster roadmap issues remain under the current D3 scope; any stronger claim now requires a new explicit follow-on queue
Required outcome:
- preserve command provenance through ordered history, catchup, and snapshot flows so replay can explain who requested a mutation and under which policy
- keep runtime/provider execution and settlement evidence aligned with the same bounded provenance truth, then add validation gates for those claims
Tracked by landed PSI-210 / #3319
and PSI-211 / #3320,
plus landed PSI-212 / #3321
and PSI-213 / #3322.
Current truth:
- the D1 through D3 queues made operator-managed clusters explicit, signed, and provenance-aware, but they did not make current postures market-safe
- current cluster trust postures now include trusted-LAN, authenticated configured-peer, and attested configured-peer admission, which is stronger than the earlier operator-managed posture but still not a wider compute- market discovery fabric
ClusterTrustPolicynow exposes a machine-checkable compute-market refusal contract, runtime/provider/cluster now expose a signed cluster evidence bundle export, attested configured-peer admission now exists as an explicit seam, and cluster policy/config now surface explicit discovery posture plus a bounded non-LAN discovery assessment- any compute-market distributed-cluster language would still outrun the code unless a future wider-network discovery queue replaces the current machine-checkable refusal boundary with a real discovery fabric
Required outcome:
- keep current non-market-safe postures refusal-capable instead of doc-only
- keep clustered execution evidence bound into signed exportable bundles before talking about audit or dispute handling outside operator-managed posture
- keep attestation-aware admission explicit and refusal-capable for market- facing node identity claims
- keep current non-LAN discovery posture and refusal boundary explicit before widening cluster claims toward a compute-market fabric
- open a fresh GitHub-backed queue instead of mutating D4 in place when wider- network discovery implementation work actually starts
Phases C1 through C6 are now all landed on GitHub/main. The local PSI-* IDs
below still come from the 2026-03-09 cluster audit, but this roadmap now maps
them to their real GitHub issue numbers directly. The next multi-subnet
follow-on queues now also have real GitHub issue numbers instead of placeholder
notes.
Already on main:
| Local ID | GitHub | State | Issue | Scope | Why it exists |
|---|---|---|---|---|---|
PSI-184 |
#3289 | Closed | Stand up a hello-world local cluster connection in psionic-cluster |
psionic-cluster, docs/tests |
Landed in 64c2a8fc6: established the crate seam and proved that seeded local Psionic nodes can discover each other, exchange typed hello/ping state, and report explicit role truth without claiming execution behavior. |
PSI-185 |
#3290 | Closed | Define cluster identity, node epoch, and admission policy | psionic-cluster, psionic-runtime, docs |
Landed in f2e758720: persistent local node identity, explicit namespace/admission config, role-visible node epoch truth, and machine-checkable refusal of admission mismatch, cluster mismatch, and stale-node ambiguity. |
PSI-186 |
#3291 | Closed | Add typed cluster commands, events, and authoritative ordered state | psionic-cluster |
Landed in cc60eea89: typed control-plane schemas, contiguous indexed-event apply rules, replayable authoritative cluster state, and stable digests that later receipts and diagnostics can reference. |
PSI-187 |
#3292 | Closed | Add catchup, snapshots, compaction, and recovery semantics | psionic-cluster, storage/tests |
Landed in 2acc2ecf6: event history now supports bounded replay, compacted snapshots, snapshot-install versus full-resync recovery, and rejoin coverage. |
| Local ID | GitHub | State | Issue | Scope | Why it exists |
|---|---|---|---|---|---|
PSI-188 |
#3297 | Closed | Publish topology, link-class, and node telemetry facts | psionic-cluster, psionic-runtime, psionic-provider |
Landed in 2acc2ecf6: authoritative state now carries topology and telemetry facts with explicit link classes, readiness posture, and stable topology digests. |
PSI-189 |
#3298 | Closed | Add artifact residency and cluster staging truth | psionic-cluster, psionic-models, psionic-catalog, psionic-provider |
Landed in 2acc2ecf6: cluster state now tracks artifact residency separately from topology with explicit staging status and dedicated residency digests. |
PSI-190 |
#3299 | Closed | Extend capability and receipt evidence for clustered execution | psionic-runtime, psionic-provider, psionic-serve |
Landed in 2acc2ecf6: provider and serve surfaces now expose cluster digests, selected nodes, residency posture, transport class, and fallback history through a runtime-owned evidence type. |
| Local ID | GitHub | State | Issue | Scope | Why it exists |
|---|---|---|---|---|---|
PSI-191 |
#3300 | Closed | Add whole-request remote scheduling for one-node execution | psionic-cluster, psionic-runtime |
Landed in ad6891b82: authoritative cluster facts now drive deterministic whole-request remote scheduling with truthful single-node topology, selected device identity, degraded-path notes, and machine-checkable refusal diagnostics. |
PSI-192 |
#3301 | Closed | Add queue policy, fairness, cancellation, and backpressure rules | psionic-cluster, psionic-runtime |
Landed in 327944c08: cluster serving policy is now explicit and replayable, with queue discipline, decode fairness, cancellation propagation, slow-node backpressure, reroute/refusal outcomes, serving-policy digests, and fallback-reason evidence layered on top of truthful whole-request scheduling. |
| Local ID | GitHub | State | Issue | Scope | Why it exists |
|---|---|---|---|---|---|
PSI-193 |
#3302 | Closed | Ship replicated cluster serving for one validated backend lane | psionic-cluster, psionic-runtime, psionic-serve, psionic-provider |
Landed in d88d284c5: Psionic now has a truthful replicated-serving lane with replica warm-state and lifecycle-policy truth, deterministic warm-replica routing and refusal behavior, replicated execution topology output, and consistent capability/receipt evidence for replica selection and standby state. |
| Local ID | GitHub | State | Issue | Scope | Why it exists |
|---|---|---|---|---|---|
PSI-194 |
#3303 | Closed | Add homogeneous CUDA layer-sharded execution | psionic-backend-cuda, psionic-runtime, psionic-cluster, psionic-provider |
Landed in fa7523ada: Psionic now has a first homogeneous CUDA layer-sharded lane with deterministic shard placement, explicit activation/KV handoff evidence and bytes-per-token estimates, truthful ExecutionTopologyPlan::layer_sharded reporting, provider receipt propagation, and refusal coverage for non-CUDA, unsuitable inter-shard links, and insufficient artifact readiness. |
PSI-195 |
#3304 | Closed | Add homogeneous CUDA tensor-sharded execution and transport policy | psionic-backend-cuda, psionic-runtime, psionic-cluster, psionic-provider |
Landed in 1cdcf3058: Psionic now has a first homogeneous CUDA tensor-sharded lane with explicit tensor-axis partition geometry, model-eligibility truth, transport-policy digests, tensor-collective handoff evidence, and refusal coverage for unsupported backend, ineligible geometry, and unsuitable shard mesh transport. |
| Local ID | GitHub | State | Issue | Scope | Why it exists |
|---|---|---|---|---|---|
PSI-196 |
#3305 | Closed | Add cluster validation, fault-injection, and performance gates | docs/tests/validation plus cluster crates | Landed in 7124eefd7: Psionic now ships a reusable cluster validation matrix, restart/rejoin transport coverage, fault-injected recovery/scheduling/replication/sharding tests, a release benchmark gate script, and an operator runbook so cluster claims stay repeatable and evidence-backed. |
PSI-197 |
#3306 | Closed | Harden cluster trust beyond the first LAN scope | psionic-cluster, security/docs |
Landed in d424ab1cf: Psionic now exposes machine-checkable trust posture, authenticated configured-peer membership, signed control-plane messages, replay protection, and runbook-backed validation for widened operator-managed cluster posture without retroactively claiming internet-wide safety. |
These issues remain outside the completed first trusted-cluster scope. They are the next honest queue if Psionic widens from operator-managed configured peers toward a more operationally robust multi-subnet substrate.
| Local ID | GitHub | State | Issue | Scope | Why it exists |
|---|---|---|---|---|---|
PSI-198 |
#3307 | Closed | Add operator cluster manifest and trust-bundle digests | psionic-cluster, security/docs |
Landed in 011e0452c: Psionic now persists cluster rollout inputs as a ClusterOperatorManifest with a stable digest, JSON load/store, manifest-derived config, and manifest-backed authenticated transport coverage. |
PSI-199 |
#3308 | Closed | Add tamper-evident catchup and snapshot envelopes | psionic-cluster, ordered-state/tests |
Landed in 87d428e43: Psionic now signs recovery payloads, verifies cluster/requester/signer identity and recovery digests, rejects replayed envelopes, and documents the signed recovery drill in the validation runbook. |
PSI-200 |
#3309 | Closed | Add explicit multi-subnet peer dial policy and health truth | psionic-cluster, transport/docs |
Landed in 86a2c920a: Psionic now exposes configured-peer dial policy in the trust surface, tracks per-peer reachability and backoff truth, and validates degraded-to-reachable transitions in transport tests and the runbook. |
PSI-201 |
#3310 | Closed | Add membership key rotation and rollout diagnostics | psionic-cluster, security/docs |
Landed in ac9dd2285: Psionic now exposes trust-bundle version overlap, previous-key rotation windows, and rollout diagnostics for accepted overlap and stale-bundle refusal, with operator runbook coverage for both rotation and drift detection. |
These issues are the next honest queue for operator-managed multi-subnet
clusters now that manifest, recovery, dial-health, and rotation truth exist on
main.
| Local ID | GitHub | State | Issue | Scope | Why it exists |
|---|---|---|---|---|---|
PSI-202 |
#3311 | Closed | Add coordinator lease policy and stale-leader diagnostics | psionic-cluster, ordered-state/tests/docs |
Landed in 1e65c56c9: coordinator leadership now carries explicit lease policy and heartbeat ticks, ClusterState exposes effective-versus-stale leadership queries plus stale-leader diagnostics, snapshot digests now reflect lease turnover, and the operator runbook now has a coordinator lease drill. |
PSI-203 |
#3312 | Closed | Add vote ledger and split-brain refusal semantics | psionic-cluster, ordered-state/tests |
Landed in ddc092cbb: Psionic now has a reusable multi-term election ledger, deterministic refusal of conflicting vote grants and conflicting same-term leader heartbeats, and an authoritative-state guard that rejects conflicting same-term LeadershipReconciled events instead of silently changing leaders. |
PSI-204 |
#3313 | Closed | Add failover fencing tokens and commit authority truth | psionic-cluster, psionic-runtime, psionic-provider, docs |
Landed in 313fbdc25: Psionic now derives stable coordinator fence tokens and authority digests from ordered leadership truth, threads commit-authority evidence through runtime/provider execution context, and attaches authority digests to whole-request and sharded schedules so stale coordinators cannot look current after failover. |
PSI-205 |
#3314 | Closed | Add coordinator failover validation drills and runbook gates | docs/tests/validation plus cluster crates | Landed in 4732fbc26: cluster_validation_matrix now covers stale-leader diagnostics, split-brain refusal, and failover fence rotation, while the operator runbook now has an explicit coordinator failover drill and exit gate for fenced coordinator claims. |
These issues are the next honest queue for operator-managed multi-subnet
clusters now that signed transport and coordinator authority truth exist on
main, but command authorization and payout-grade provenance still do not.
| Local ID | GitHub | State | Issue | Scope | Why it exists |
|---|---|---|---|---|---|
PSI-206 |
#3315 | Closed | Add typed cluster command authorization policy and refusal diagnostics | psionic-cluster, ordered-state/tests |
Landed in e6888aaa0: ordered_state now exposes typed command authority scopes, operator-managed authorization policy digests, explicit coordinator override, stable authorization facts, and machine-checkable refusal codes with coverage for coordinator-only, self-node, link-peer, and membership-status-gated command submission. |
PSI-207 |
#3316 | Closed | Preserve command provenance through authoritative cluster events | psionic-cluster, recovery/tests |
Landed in 7b7b681f7: IndexedClusterEvent, ClusterSnapshot, and ClusterState now retain command-authorization provenance for the current authoritative facts they expose, while compaction, catchup, and snapshot recovery preserve that provenance and the new replay/recovery tests prove it survives state rebuilds. |
PSI-208 |
#3317 | Closed | Extend cluster execution and settlement evidence with command provenance truth | psionic-runtime, psionic-provider, psionic-cluster |
Landed in 24dd4aee8: ClusterExecutionContext and settlement-linkage inputs now retain bounded command/admission provenance, whole-request and sharded cluster planners now emit that truth from authoritative membership/residency/leadership facts, and provider receipts now serialize payout-facing cluster provenance for audit or later dispute handling. |
PSI-209 |
#3318 | Closed | Add cluster authorization and payout-provenance validation gates | docs/tests/validation plus cluster crates | Landed in 715539147: cluster_validation_matrix now covers allowed versus refused command flows plus whole-request and sharded payout-provenance surfaces, while the cluster validation runbook now defines the authorization/payout provenance drill and exit gate for stronger audit or dispute claims. |
| Local ID | GitHub | State | Issue | Scope | Why it exists |
|---|---|---|---|---|---|
PSI-210 |
#3319 | Closed | Define compute-market trust posture and refusal diagnostics | psionic-cluster, docs/tests |
Landed in 37fb246f1: ClusterTrustPolicy now derives a stable ClusterComputeMarketTrustAssessment with explicit refusal reasons for current non-market-safe trust postures and the remaining D4 hardening gaps. |
PSI-211 |
#3320 | Closed | Add signed cluster evidence bundle export | psionic-runtime, psionic-provider, psionic-cluster |
Landed in 4a21d6947: Psionic now has stable ClusterEvidenceBundlePayload and SignedClusterEvidenceBundle types, receipt-export helpers in psionic-provider, and cluster-identity verification against control-plane signing keys. |
PSI-212 |
#3321 | Closed | Add attested node-identity admission seams | psionic-cluster, docs/tests |
Landed in d0f3e7891: Psionic now has explicit attested configured-peer posture, persisted node-attestation evidence, configured-peer attestation requirements, and machine-checkable refusal diagnostics for missing or mismatched attestation during market-facing cluster admission. |
PSI-213 |
#3322 | Closed | Add non-LAN discovery posture diagnostics | psionic-cluster, docs/tests |
Landed in b0601f662: Psionic now carries explicit ClusterDiscoveryPosture, a stable ClusterNonLanDiscoveryAssessment, config/node helpers that report current discovery truth, and validation coverage that keeps LAN-only, operator-managed configured-peer, and explicitly requested-but-unimplemented wider-network discovery claims machine-checkably bounded. |
This queue is now landed on main. It replaced the earlier explicit discovery
refusal boundary with bounded wider-network discovery truth without turning the
discovery substrate into the cluster control plane itself.
| Local ID | GitHub | State | Issue | Scope | Why it exists |
|---|---|---|---|---|---|
PSI-214 |
#3323 | Closed | Add signed cluster introduction envelopes and policy digests | psionic-cluster, docs/tests |
Landed in 1102bffa4: Psionic now has ClusterDiscoveryCandidate, signed SignedClusterIntroductionEnvelope, explicit ClusterIntroductionPolicy digests, verification/refusal diagnostics for untrusted or malformed introduction artifacts, and manifest/config surfaces that keep future wider-network introductions separate from admitted membership truth. |
PSI-215 |
#3324 | Closed | Add bounded discovery-candidate ledger and admission reconciliation | psionic-cluster, ordered-state/tests |
Landed in 47410298a: Psionic now keeps ClusterDiscoveredCandidateRecord state and provenance separate from admitted membership, exposes deterministic candidate status transitions for introduced/accepted/refused/expired discovery truth, and replays, compacts, and recovers explicit candidate admission into membership without silently widening cluster membership. |
PSI-216 |
#3325 | Closed | Add wider-network discovery validation drills and rollout gates | docs/tests/validation plus cluster crates | Landed in 7c0f34503: cluster_validation_matrix now carries an explicit wider-network discovery gate covering signed introduction intake, untrusted-source refusal, expiry, and admission reconciliation, while CLUSTER_VALIDATION_RUNBOOK.md now defines the wider-network discovery drill and rollout boundary before any broader discovery claims. |
This is now the active post-E1 queue. It must remain bounded: Exo may inform placement, but Psionic must keep final scheduling authority, execution, and evidence truth.
| Local ID | GitHub | State | Issue | Scope | Why it exists |
|---|---|---|---|---|---|
PSI-217 |
#3329 | Closed | Add bounded Exo placement-hint adapter for remote scheduling | psionic-cluster, runtime evidence/tests |
Landed in 7aa76a2a9: Psionic now has a bounded ExoPlacementHint seam that can bias tie-breaking among already-eligible whole-request candidates, surfaces accepted or ignored hint diagnostics in selection notes and runtime cluster evidence, and retains final placement authority plus eligibility truth inside Psionic-owned scheduling and receipts. |
PSI-218 |
#3330 | Closed | Make an explicit keep/discard decision on optional Exo interoperability | docs/tests plus cluster crates | Landed in EXO_INTEROPERABILITY_DECISION.md: the repo now explicitly keeps only the bounded ExoPlacementHint seam from #3329 and discards the broader Exo orchestrator bridge, required runtime dependency, and execution delegation story. |
This queue is now landed on main. The Metal roadmap queue
#3286 -> #3285 -> #3269 -> #3262 still blocks any honest Apple cluster
claim; what landed here is the explicit eligibility contract and refusal
surface, not Metal cluster readiness.
| Local ID | GitHub | State | Issue | Scope | Why it exists |
|---|---|---|---|---|---|
PSI-219 |
#3331 | Closed | Add communication-class eligibility and keep Apple cluster refusal explicit | psionic-cluster, runtime/provider evidence/tests |
Landed in 98dc1bdc3: psionic-runtime now carries explicit cluster communication-class eligibility evidence, whole-request and replica lanes now retain backend communication truth in receipts/evidence, sharded planners now refuse by required communication class instead of by backend label alone, and current Metal cluster execution remains explicitly refused with diagnostics pointing at the still-open Metal roadmap gate. |
This queue is now landed on main. It closed the remaining gap between the
landed benchmark gates and the roadmap's requirement that cluster performance
claims be tied to explicit machine-checkable benchmark receipts.
| Local ID | GitHub | State | Issue | Scope | Why it exists |
|---|---|---|---|---|---|
PSI-220 |
#3332 | Closed | Add typed cluster benchmark receipts and gate JSON schema | psionic-cluster, tests |
Landed in 4f64525b4: psionic-cluster now exposes typed ClusterBenchmarkReceipt models plus topology/recovery benchmark contexts and stable digest helpers, while the benchmark gates now emit receipt-shaped JSON instead of anonymous summary blobs and release-gate artifacts now preserve benchmark identity, budget truth, context, and pass/fail outcome. |
PSI-221 |
#3334 | Closed | Wire cluster benchmark gate script and outputs to typed receipts | psionic-cluster, scripts/docs |
Landed in a524658b8: the cluster benchmark gate script now documents typed benchmark receipts instead of generic summaries, validates the stable receipt filenames and core schema fields after the release gate runs, emits explicit receipt artifact paths for CI and operator consumers, and the runbook now points at receipt artifacts rather than anonymous summary JSON. |
PSI-222 |
#3333 | Closed | Add benchmark receipt validation drill and roadmap closeout | docs/tests/validation plus cluster crates | Landed in 3fe872c96: CLUSTER_VALIDATION_RUNBOOK.md now defines an explicit benchmark receipt drill with exact commands, expected receipt files, and failure interpretation, while this roadmap now closes the G1 queue explicitly instead of leaving typed benchmark receipts as an open-ended follow-on note. |
This queue is now landed on main. It closes the remaining post-G1 gap between
typed cluster execution evidence, planner eligibility, and operator-facing
validation by making declared capability profiles authoritative for clustered
lanes and by adding an explicit validation drill for those claims.
| Local ID | GitHub | State | Issue | Scope | Why it exists |
|---|---|---|---|---|---|
PSI-223 |
#3335 | Closed | Add runtime-owned cluster execution capability profile | psionic-runtime, tests |
Landed in 37183c6cb: psionic-runtime now exposes typed ClusterExecutionCapabilityProfile and ClusterExecutionLane models, stable profile digests, and profile-derived ClusterCommunicationEligibility helpers so clustered lane support can be declared explicitly instead of starting from backend-name heuristics alone. |
PSI-224 |
#3336 | Closed | Make cluster planners consume declared execution capability profiles | psionic-cluster, psionic-runtime, psionic-provider |
Landed in 9aad9af8d: whole-request, replicated, layer-sharded, and tensor-sharded planners now consume declared ClusterExecutionCapabilityProfile truth instead of widening lane support from backend labels; ClusterCommunicationEligibility now carries the stable capability-profile digest it was derived from; and provider/runtime evidence surfaces now preserve that declared-profile digest alongside cluster execution context. |
PSI-225 |
#3337 | Closed | Add capability-profile validation drill and roadmap closeout | docs/tests/validation plus cluster crates | Landed in efa52005e: CLUSTER_VALIDATION_RUNBOOK.md now defines an explicit capability-profile drill with exact runtime, cluster, and provider commands plus failure interpretation, and this roadmap now closes the H1 queue explicitly instead of leaving declared capability-profile validation as a vague follow-on. |
This is the current active post-H1 queue. It closes the next remaining gap between declared clustered-lane truth and provider-side publication by making advertised capability surfaces publish declared cluster execution capability profiles before any request is planned.
| Local ID | GitHub | State | Issue | Scope | Why it exists |
|---|---|---|---|---|---|
PSI-226 |
#3341 | Closed | Publish declared cluster execution capability profiles in runtime capability surfaces | psionic-runtime, psionic-provider, tests |
Landed in 9ebb90a3e: BackendSelection now exposes an optional advertised cluster_execution_capability_profile, capability-side runtime/provider serialization now round-trips that declared truth before any request executes, and provider tests keep that advertised profile distinct from realized cluster_execution evidence. |
PSI-227 |
#3339 | Closed | Thread advertised cluster capability profiles through provider capability envelopes | psionic-provider, psionic-cluster, tests |
Landed in e2a4c1f84: provider capability envelopes now expose dedicated with_cluster_execution_capability_profile(...) builders, provider publication tests source declared profiles from cluster whole-request, replica-routed, layer-sharded, and tensor-sharded request builders, and advertised lane support now publishes independently of realized cluster_execution evidence. |
PSI-228 |
#3340 | Closed | Add advertised capability-profile validation drill and roadmap closeout | docs/tests/validation plus cluster crates | Landed in 478387b58: CLUSTER_VALIDATION_RUNBOOK.md now defines an explicit advertised capability-publication drill with exact runtime/provider commands, pass/fail interpretation, and separation between advertised support and realized execution evidence, and this roadmap now closes H2 explicitly instead of leaving provider-side publication validation implicit. |
H3 is now landed on main. It closed the next remaining gap
between advertised clustered-lane support and the trust posture that bounds
those claims by making provider-visible capability publication carry explicit
cluster trust posture and compute-market trust refusal truth.
| Local ID | GitHub | State | Issue | Scope | Why it exists |
|---|---|---|---|---|---|
PSI-229 |
#3343 | Closed | Add runtime-owned cluster trust publication types | psionic-runtime, psionic-cluster, tests |
Landed in f1e40e88d: psionic-runtime now owns ClusterTrustPosture, ClusterDiscoveryPosture, and ClusterComputeMarketTrustAssessment plus its stable digest, while psionic-cluster now derives and re-exports those runtime-owned publication types instead of keeping a private duplicate model. |
PSI-230 |
#3344 | Closed | Publish cluster trust assessments through provider capability surfaces | psionic-provider, psionic-cluster, tests |
Landed in 0dc54768f: BackendSelection and provider capability envelopes now publish cluster_compute_market_trust_assessment independently of realized execution, and provider tests prove trusted-LAN and attested-configured-peer postures surface bounded refusal truth without fabricating cluster_execution evidence. |
PSI-231 |
#3342 | Closed | Add cluster trust-publication validation drill and roadmap closeout | docs/tests/validation plus cluster crates | Landed in the docs closeout commit that adds the H3 trust-publication drill: CLUSTER_VALIDATION_RUNBOOK.md now defines exact cluster, runtime, and provider commands for validating published trust posture and refusal truth, and this roadmap now closes H3 explicitly instead of leaving provider trust publication as an undocumented surface. |
The shortest honest path from today's main is:
- Treat C1 through C6 as landed on
main, with the first trusted-cluster scope closing ind424ab1cf. - Treat D1 as landed on
main, with the operator-managed multi-subnet follow- on queue closing inac9dd2285. - Treat D2 as landed on
main, with the coordinator-authority follow-on queue closing in4732fbc26. - Treat D3 as landed on
main, with the authorization and payout-provenance queue closing in715539147. - Treat D4 as landed on
main, with the compute-market hardening queue now closing inb0601f662. - Treat E1 as landed on
main, with the wider-network discovery queue now closing in7c0f34503. - Treat the former local CUDA truth gate
#3276->#3288->#3248as closed onmain, and treat F1 as landed onmain. - Treat F2 as landed on
mainin98dc1bdc3: communication-class eligibility is now explicit, and current Metal nodes remain refused for cluster execution while the Metal roadmap queue stays open. - Treat G1 as landed on
mainin3fe872c96, so benchmark-backed performance claims now have typed receipts, a script-level output contract, and an operator drill instead of a vague follow-on gap. - Treat H1 as landed on
main, with the capability-profile validation drill and queue closeout now anchored inefa52005e. - Treat H2 as landed on
main, with the advertised capability-publication validation drill and queue closeout now anchored in478387b58. - Treat H3 as landed on
main:#3344now publishes bounded cluster trust assessments through provider capability surfaces, and#3342closes the queue with an explicit operator validation drill and roadmap closeout. - Keep current authenticated configured-peer posture explicit and bounded; it is operator-managed, not market-safe.
- If stronger trust or wider network claims are needed beyond H3, open a new GitHub-backed queue instead of extending this roadmap with local placeholders.
- Keep current Metal GPT-OSS nodes refused for cluster execution until the
Metal roadmap queue
#3286->#3285->#3269->#3262closes.
Why this order:
- control-plane truth first, because cluster execution without cluster truth is not supportable
- topology, residency, and evidence before scheduling, because otherwise the scheduler has nothing auditable to stand on
- remote whole-request scheduling before replication, because it gives immediate value without pretending cross-node compute is already real
- replication before sharding, because it is safer operationally and easier to prove honestly
- hardening before scope widening, because trusted-LAN cluster claims are not the same thing as compute-market distributed-cluster claims
This roadmap's first truthful cluster scope is complete only when all of the following are true:
- the shipped feature is explicitly a trusted same-network LAN cluster, not an adversarial or internet-wide cluster
psionic-clusterexists and owns cluster identity, ordered state, topology, placement, and catchup truthClusterId,NodeId,NodeEpoch, and node-role truth are persistent and refusal-capable- ordered cluster-state history, snapshots, compaction, and rejoin semantics are explicit
- topology, transport class, artifact residency, and policy digests are visible in capability and receipt surfaces
- whole-request remote scheduling works and is reflected truthfully in
ExecutionTopologyPlanand provider receipts - replicated serving works for one validated backend/product lane
- unsupported backends are refused explicitly rather than silently included
- current Metal GPT-OSS nodes remain refused for cluster execution until the Metal roadmap itself is complete
The broader cluster program is not complete until all of the following are true:
- at least one truthful homogeneous sharded CUDA path exists, or unsupported sharded paths are refused with stable diagnostics
- cluster validation covers membership refusal, disconnect/rejoin, catchup, artifact staging, remote scheduling, replication, and sharding
- cluster performance claims are tied to explicit benchmark receipts
- authenticated membership, signed control-plane messages, replay protection, and stronger admission policy exist before any multi-subnet or market-facing scope widening
- downstream OpenAgents systems can tell exactly what cluster topology was promised, selected, delivered, and degraded
- making Exo a required runtime dependency for
crates/psionic-* - proxying or delegating cluster execution through Exo and calling that Psionic cluster support
- treating Nostr, relays, or Nexus as the cluster control plane
- claiming sharded cluster execution when the system is only doing remote whole-request scheduling
- treating the first cluster scope as if it already solved adversarial compute-market security
- making current Metal GPT-OSS nodes cluster-eligible before the Metal roadmap itself is complete
- reopening local model loading, tokenizer, or artifact-format truth as if cluster work replaced the shipped Psionic loader/runtime substrate
- moving app UX or pane orchestration from
apps/autopilot-desktopintocrates/psionic-*