Skip to content

HOMEGOLF-19: Build the retained local-cluster HOMEGOLF scoreboard, delta ledger, and stop-condition loop #637

@AtlantisPleb

Description

@AtlantisPleb

Summary

Turn the local clustered HOMEGOLF lane into a retained score-improvement loop with explicit deltas, promotion rules, and a stop condition.

Tracking issue: #633

Why

The current local HOMEGOLF loop has real runs, but not yet one explicit operator truth for:

  • which run is the incumbent
  • which run improved the incumbent
  • which changes were system-only versus model-side
  • which failures are blockers versus ordinary regressions
  • when to stop iterating locally and promote a candidate upward

The user goal is to keep improving until there are no meaningful improvements left. That requires a scoreboard and stop condition, not only audits.

Scope

  • define one canonical retained local-cluster HOMEGOLF scoreboard or ledger
  • store per-run config, artifact paths, score receipts, and delta versus current incumbent
  • classify each run as improvement, noise, regression, blocked, or invalid
  • distinguish score changes caused by system or runtime fixes from score changes caused by model-family changes
  • define the local stop condition for “no more meaningful improvements observed”
  • tie promotion gating to that same ledger instead of ad hoc judgment

Acceptance Criteria

  • every retained local HOMEGOLF candidate run has one machine-legible score row
  • the loop can identify the incumbent and the exact delta from each challenger
  • the scoreboard can say honestly when local improvements have plateaued
  • promotion to H100 uses the same scoreboard instead of a separate manual story

References

  • docs/HOMEGOLF_TRACK.md
  • docs/2026-03-28-parameter-golf-winner-gap-and-psionic-path-audit.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    backendBackend workqaQuality and validation workroadmapRoadmap worktype:featureFeature request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions