Skip to content

ci: run audit-orphans.sh on every PR + weekly cron#8

Merged
ericchansen merged 1 commit into
mainfrom
ci/audit-orphans-action
May 27, 2026
Merged

ci: run audit-orphans.sh on every PR + weekly cron#8
ericchansen merged 1 commit into
mainfrom
ci/audit-orphans-action

Conversation

@ericchansen
Copy link
Copy Markdown
Owner

Summary

  • Adds an Actions workflow that runs scripts/audit-orphans.sh automatically.
  • Locks in the stewardship invariant introduced in q2mm-data#7 so orphaned benchmark data is visible on PRs and weekly drift checks.
  • Updates the README auditing section to mention the automated workflow while preserving manual invocation instructions.

What the workflow does

  • Runs on pull requests targeting main, every Monday at 09:00 UTC, and manual workflow_dispatch.
  • Checks out q2mm-data with actions/checkout@v4.
  • Checks out ericchansen/q2mm at master into q2mm-sibling with actions/checkout@v4.
  • Runs bash scripts/audit-orphans.sh "${GITHUB_WORKSPACE}/q2mm-sibling" from the q2mm-data checkout.

Local smoke tests

  • YAML validation: python3 -c "import yaml; yaml.safe_load(open('.github/workflows/audit-orphans.yml'))" passed. (python is not installed in this local shell; python3 is.)
  • Clean audit: bash scripts/audit-orphans.sh /home/eric/repos/q2mm exited 0 with ✅ No orphaned directories.
  • Failure path: after adding benchmarks/_smoke_orphan/scratch/dummy.txt, the same audit exited 1 and reported benchmarks/_smoke_orphan/scratch as orphaned.
  • Cleanup recheck: after removing benchmarks/_smoke_orphan, the audit exited 0 again with ✅ No orphaned directories.

Lock in the stewardship invariant introduced in #7 by running the orphan audit automatically instead of relying on manual memory.

The workflow runs on pull requests to main, on a weekly Monday 09:00 UTC cron, and via manual dispatch. It checks out q2mm-data plus the q2mm sibling repo and runs scripts/audit-orphans.sh against that checkout.

Smoke tests: validated the workflow YAML with python3 yaml.safe_load; confirmed the audit passes with no orphans, fails after adding benchmarks/_smoke_orphan/scratch, and passes again after removing the smoke directory.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ericchansen ericchansen merged commit a3cc8d7 into main May 27, 2026
1 check passed
@ericchansen ericchansen deleted the ci/audit-orphans-action branch May 27, 2026 21:28
ericchansen added a commit that referenced this pull request May 28, 2026
…aluation

Re-runs the three "within noise" published-FF systems from q2mm-data#6
with the new --n-evals 10 flag (q2mm#286, landed on master).  The
n=10 samples give a Student-t 95% CI on the improvement that's tight
enough to make confident scientific verdicts:

| System          | Mean Δ%  | CI₉₅   | Verdict        |
|-----------------|---------:|-------:|----------------|
| pd-allyl        | -0.029%  | ±0.34% | NOT SIGNIFICANT |
| rh-conjugate    | -0.080%  | ±1.18% | NOT SIGNIFICANT |
| heck-relay\*    | -0.59%   | ±3.26% | NOT SIGNIFICANT |

\* heck-relay run with --ratio-tol none (ratio=1.378, formally fails
default gate); even with the gate bypassed, the JaxLoss surrogate
broke down (2 non-finite line-search values) and the result is
inside the noise band.

These are statistically defensible "no improvement" verdicts, not
"within noise so we can't tell" verdicts.  The CI₉₅ excludes any
improvement larger than ~0.3 %, ~1.2 %, and ~3.3 % for pd-allyl,
rh-conjugate, and heck-relay respectively — well below any
publishable improvement claim.

Provenance:
- q2mm git_sha: 86d8483 (master, post #286)
- q2mm-data git_sha: a3cc8d7 (main, post #8)
- n_evals: 10
- ratio_tol: 0.15 (default) for pd-allyl/rh-conjugate; null for heck-relay

Wall time:
- pd-allyl:      ~21 min opt + 16 min post-eval
- rh-conjugate:  ~10 min opt + 13 min post-eval
- heck-relay:    ~24 min opt + 38 min post-eval
- Total:        ~2.0 hr GPU on RTX 5090

Companion docs update lives in ericchansen/q2mm
docs/systems/{pd-allyl,rh-conjugate,heck-relay}.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ericchansen added a commit that referenced this pull request May 28, 2026
…nt fix

Companion to q2mm fix branch fix/mm3-non-smooth-gradient (commit
78e72fa, PR #TBD).  Re-runs the convergence pipeline with --n-evals
10 against q2mm patched for the angle-term gradient correctness bug
documented in q2mm#284.

Results — two previously "no improvement" verdicts now SIGNIFICANT:

| System          | Pre-fix Δ%       | Post-fix Δ%      | Verdict       |
|-----------------|------------------|------------------|---------------|
| ch3f            | 99.83 % (det.)   | 99.83 % (det.)   | unchanged ✅  |
| rh-enamide      | 44.66 % ± 0.29 % | 44.73 % ± 0.29 % | unchanged ✅  |
| pd-allyl        | -0.029 % ± 0.34% | -0.01 % ± 0.40 % | still NS ❌   |
| rh-conjugate    | -0.080 % ± 1.18% | 18.00 % ± 4.17 % | NEWLY ✅      |
| heck-relay*     | -0.59 % ± 3.26 % | 52.82 % ± 1.54 % | NEWLY ✅      |

(*) heck-relay run with --ratio-tol none; with the fix the ratio
actually drops from 1.378 → 1.085, so the gate would now pass at
default tolerance.  Bypass retained here for direct comparison
against the pre-fix #9 baseline.

What this PR contains

Per-system, the convergence/ directory now has:
- <system>_optimized.fld — optimized force field
- validation_results.json — n=10 mean+CI numbers, full provenance
- paper_metrics.json — paper-comparable Seminario vs. optimized stats

Provenance (every JSON):
- q2mm git_sha: 78e72fa (the fix branch's HEAD)
- q2mm-data git_sha: a3cc8d7 (main, post-#8)
- n_evals: 10
- ratio_tol: 0.15 (default) for 4 systems; null for heck-relay

pd-allyl's pd-allyl_optimized.fld is bit-identical to the previous
version — the surrogate-guided step still worsened the real OF
slightly (within noise), so ScipyOptimizer reverted to initial
params.  Even the fix doesn't unlock pd-allyl: its FF really does
sit at a JaxLoss local minimum, distinct from the rh-conjugate /
heck-relay cases where the clip-arccos bug was preventing the
optimizer from finding real descent directions.

Wall time on RTX 5090:
- ch3f:        ~3 s (deterministic, n=5)
- rh-enamide:  ~26 min (opt + n=5 post-eval)
- pd-allyl:    ~50 min (opt + n=10 post-eval)
- rh-conjugate: ~36 min (opt + n=10 post-eval)
- heck-relay:  ~98 min (opt + n=10 post-eval)
- Total:       ~3.5 hr GPU

The audit-orphans CI workflow (q2mm-data#8) is expected to pass
since every directory modified is already referenced in
q2mm/docs/systems/*.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant