Summary
Use the reachable local clustered HOMEGOLF devices as the primary proving ground for the Rust-native Parameter Golf lane until system changes are no longer blocked by operational failures and are visibly improving retained local scores.
This is the main tracking issue for the next loop.
Why
Current retained reality:
- Psionic's retained real full-validation PGOLF score is still
6.306931747817168
- recent local clustered HOMEGOLF runs exposed real operational blockers instead of a clean score-improvement loop
20260328f, 20260328k, and 20260328l exported artifacts but still did not close into detached artifact_score_report.json receipts
20260328j died with cudaMalloc failed: out of memory
20260328m hit NaN on step 2 and then panicked during EMA final-surface application
That means the immediate job is not “turn the big GPUs back on and hope.” The immediate job is to drive out Rust runtime, score-closeout, and operator-kink failures on the local clustered HOMEGOLF lane until system changes produce retained score movement.
Intent
The operating rule from here is:
- local clustered HOMEGOLF devices first for Rust/runtime/ops closure
- require retained local score improvement from those system changes
- promote only de-risked candidates to H100
- use
8xH100 only after the current Rust lane stops tripping over avoidable blockers
Scope
This master task tracks the issue stack required to enter an honest improvement loop:
- fix detached score-closeout so artifact-only runs actually produce score receipts
- stabilize the local clustered competitive runner under explicit grad-accum, raw, and EMA postures
- align the exact competitive lane to the public 11L winner family where Psionic is still behind
- build a retained local-cluster scoreboard and stop-condition loop
- define the single-H100 and later
8xH100 promotion gate
- prove whether XTRAIN improves the score path or keep it off the critical path
Acceptance Criteria
- the subordinate issue stack is closed
- the local clustered HOMEGOLF lane can run without the current score-closeout, OOM, NaN, or queue-truth failures
- at least one retained local clustered HOMEGOLF score receipt improves because of concrete Rust-system or model-lane changes
- the current best local clustered candidate has an explicit promotion verdict for single-H100 and later
8xH100
References
docs/2026-03-28-parameter-golf-winner-gap-and-psionic-path-audit.md
docs/2026-03-28-homegolf-xtrain-pgolf-explicit-grad-accum-queue-correction-audit.md
docs/HOMEGOLF_TRACK.md
Summary
Use the reachable local clustered HOMEGOLF devices as the primary proving ground for the Rust-native Parameter Golf lane until system changes are no longer blocked by operational failures and are visibly improving retained local scores.
This is the main tracking issue for the next loop.
Why
Current retained reality:
6.30693174781716820260328f,20260328k, and20260328lexported artifacts but still did not close into detachedartifact_score_report.jsonreceipts20260328jdied withcudaMalloc failed: out of memory20260328mhitNaNon step 2 and then panicked during EMA final-surface applicationThat means the immediate job is not “turn the big GPUs back on and hope.” The immediate job is to drive out Rust runtime, score-closeout, and operator-kink failures on the local clustered HOMEGOLF lane until system changes produce retained score movement.
Intent
The operating rule from here is:
8xH100only after the current Rust lane stops tripping over avoidable blockersScope
This master task tracks the issue stack required to enter an honest improvement loop:
8xH100promotion gateAcceptance Criteria
8xH100References
docs/2026-03-28-parameter-golf-winner-gap-and-psionic-path-audit.mddocs/2026-03-28-homegolf-xtrain-pgolf-explicit-grad-accum-queue-correction-audit.mddocs/HOMEGOLF_TRACK.md