ci: run audit-orphans.sh on every PR + weekly cron by ericchansen · Pull Request #8 · ericchansen/q2mm-data

ericchansen · 2026-05-27T20:58:42Z

Summary

Adds an Actions workflow that runs scripts/audit-orphans.sh automatically.
Locks in the stewardship invariant introduced in q2mm-data#7 so orphaned benchmark data is visible on PRs and weekly drift checks.
Updates the README auditing section to mention the automated workflow while preserving manual invocation instructions.

What the workflow does

Runs on pull requests targeting main, every Monday at 09:00 UTC, and manual workflow_dispatch.
Checks out q2mm-data with actions/checkout@v4.
Checks out ericchansen/q2mm at master into q2mm-sibling with actions/checkout@v4.
Runs bash scripts/audit-orphans.sh "${GITHUB_WORKSPACE}/q2mm-sibling" from the q2mm-data checkout.

Local smoke tests

YAML validation: python3 -c "import yaml; yaml.safe_load(open('.github/workflows/audit-orphans.yml'))" passed. (python is not installed in this local shell; python3 is.)
Clean audit: bash scripts/audit-orphans.sh /home/eric/repos/q2mm exited 0 with ✅ No orphaned directories.
Failure path: after adding benchmarks/_smoke_orphan/scratch/dummy.txt, the same audit exited 1 and reported benchmarks/_smoke_orphan/scratch as orphaned.
Cleanup recheck: after removing benchmarks/_smoke_orphan, the audit exited 0 again with ✅ No orphaned directories.

Lock in the stewardship invariant introduced in #7 by running the orphan audit automatically instead of relying on manual memory. The workflow runs on pull requests to main, on a weekly Monday 09:00 UTC cron, and via manual dispatch. It checks out q2mm-data plus the q2mm sibling repo and runs scripts/audit-orphans.sh against that checkout. Smoke tests: validated the workflow YAML with python3 yaml.safe_load; confirmed the audit passes with no orphans, fails after adding benchmarks/_smoke_orphan/scratch, and passes again after removing the smoke directory. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…aluation Re-runs the three "within noise" published-FF systems from q2mm-data#6 with the new --n-evals 10 flag (q2mm#286, landed on master). The n=10 samples give a Student-t 95% CI on the improvement that's tight enough to make confident scientific verdicts: | System | Mean Δ% | CI₉₅ | Verdict | |-----------------|---------:|-------:|----------------| | pd-allyl | -0.029% | ±0.34% | NOT SIGNIFICANT | | rh-conjugate | -0.080% | ±1.18% | NOT SIGNIFICANT | | heck-relay\* | -0.59% | ±3.26% | NOT SIGNIFICANT | \* heck-relay run with --ratio-tol none (ratio=1.378, formally fails default gate); even with the gate bypassed, the JaxLoss surrogate broke down (2 non-finite line-search values) and the result is inside the noise band. These are statistically defensible "no improvement" verdicts, not "within noise so we can't tell" verdicts. The CI₉₅ excludes any improvement larger than ~0.3 %, ~1.2 %, and ~3.3 % for pd-allyl, rh-conjugate, and heck-relay respectively — well below any publishable improvement claim. Provenance: - q2mm git_sha: 86d8483 (master, post #286) - q2mm-data git_sha: a3cc8d7 (main, post #8) - n_evals: 10 - ratio_tol: 0.15 (default) for pd-allyl/rh-conjugate; null for heck-relay Wall time: - pd-allyl: ~21 min opt + 16 min post-eval - rh-conjugate: ~10 min opt + 13 min post-eval - heck-relay: ~24 min opt + 38 min post-eval - Total: ~2.0 hr GPU on RTX 5090 Companion docs update lives in ericchansen/q2mm docs/systems/{pd-allyl,rh-conjugate,heck-relay}.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…nt fix Companion to q2mm fix branch fix/mm3-non-smooth-gradient (commit 78e72fa, PR #TBD). Re-runs the convergence pipeline with --n-evals 10 against q2mm patched for the angle-term gradient correctness bug documented in q2mm#284. Results — two previously "no improvement" verdicts now SIGNIFICANT: | System | Pre-fix Δ% | Post-fix Δ% | Verdict | |-----------------|------------------|------------------|---------------| | ch3f | 99.83 % (det.) | 99.83 % (det.) | unchanged ✅ | | rh-enamide | 44.66 % ± 0.29 % | 44.73 % ± 0.29 % | unchanged ✅ | | pd-allyl | -0.029 % ± 0.34% | -0.01 % ± 0.40 % | still NS ❌ | | rh-conjugate | -0.080 % ± 1.18% | 18.00 % ± 4.17 % | NEWLY ✅ | | heck-relay* | -0.59 % ± 3.26 % | 52.82 % ± 1.54 % | NEWLY ✅ | (*) heck-relay run with --ratio-tol none; with the fix the ratio actually drops from 1.378 → 1.085, so the gate would now pass at default tolerance. Bypass retained here for direct comparison against the pre-fix #9 baseline. What this PR contains Per-system, the convergence/ directory now has: - <system>_optimized.fld — optimized force field - validation_results.json — n=10 mean+CI numbers, full provenance - paper_metrics.json — paper-comparable Seminario vs. optimized stats Provenance (every JSON): - q2mm git_sha: 78e72fa (the fix branch's HEAD) - q2mm-data git_sha: a3cc8d7 (main, post-#8) - n_evals: 10 - ratio_tol: 0.15 (default) for 4 systems; null for heck-relay pd-allyl's pd-allyl_optimized.fld is bit-identical to the previous version — the surrogate-guided step still worsened the real OF slightly (within noise), so ScipyOptimizer reverted to initial params. Even the fix doesn't unlock pd-allyl: its FF really does sit at a JaxLoss local minimum, distinct from the rh-conjugate / heck-relay cases where the clip-arccos bug was preventing the optimizer from finding real descent directions. Wall time on RTX 5090: - ch3f: ~3 s (deterministic, n=5) - rh-enamide: ~26 min (opt + n=5 post-eval) - pd-allyl: ~50 min (opt + n=10 post-eval) - rh-conjugate: ~36 min (opt + n=10 post-eval) - heck-relay: ~98 min (opt + n=10 post-eval) - Total: ~3.5 hr GPU The audit-orphans CI workflow (q2mm-data#8) is expected to pass since every directory modified is already referenced in q2mm/docs/systems/*.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ericchansen merged commit a3cc8d7 into main May 27, 2026
1 check passed

ericchansen deleted the ci/audit-orphans-action branch May 27, 2026 21:28

ericchansen mentioned this pull request May 28, 2026

data: rerun pd-allyl/rh-conjugate/heck-relay with n=10 (confirms no real improvement) #9

Merged

ericchansen mentioned this pull request May 28, 2026

data: regenerate all 5 systems with MM3 angle gradient fix (2 newly significant) #10

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: run audit-orphans.sh on every PR + weekly cron#8

ci: run audit-orphans.sh on every PR + weekly cron#8
ericchansen merged 1 commit into
mainfrom
ci/audit-orphans-action

ericchansen commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ericchansen commented May 27, 2026

Summary

What the workflow does

Local smoke tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant