ci: run audit-orphans.sh on every PR + weekly cron#8
Merged
Conversation
Lock in the stewardship invariant introduced in #7 by running the orphan audit automatically instead of relying on manual memory. The workflow runs on pull requests to main, on a weekly Monday 09:00 UTC cron, and via manual dispatch. It checks out q2mm-data plus the q2mm sibling repo and runs scripts/audit-orphans.sh against that checkout. Smoke tests: validated the workflow YAML with python3 yaml.safe_load; confirmed the audit passes with no orphans, fails after adding benchmarks/_smoke_orphan/scratch, and passes again after removing the smoke directory. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ericchansen
added a commit
that referenced
this pull request
May 28, 2026
…aluation Re-runs the three "within noise" published-FF systems from q2mm-data#6 with the new --n-evals 10 flag (q2mm#286, landed on master). The n=10 samples give a Student-t 95% CI on the improvement that's tight enough to make confident scientific verdicts: | System | Mean Δ% | CI₉₅ | Verdict | |-----------------|---------:|-------:|----------------| | pd-allyl | -0.029% | ±0.34% | NOT SIGNIFICANT | | rh-conjugate | -0.080% | ±1.18% | NOT SIGNIFICANT | | heck-relay\* | -0.59% | ±3.26% | NOT SIGNIFICANT | \* heck-relay run with --ratio-tol none (ratio=1.378, formally fails default gate); even with the gate bypassed, the JaxLoss surrogate broke down (2 non-finite line-search values) and the result is inside the noise band. These are statistically defensible "no improvement" verdicts, not "within noise so we can't tell" verdicts. The CI₉₅ excludes any improvement larger than ~0.3 %, ~1.2 %, and ~3.3 % for pd-allyl, rh-conjugate, and heck-relay respectively — well below any publishable improvement claim. Provenance: - q2mm git_sha: 86d8483 (master, post #286) - q2mm-data git_sha: a3cc8d7 (main, post #8) - n_evals: 10 - ratio_tol: 0.15 (default) for pd-allyl/rh-conjugate; null for heck-relay Wall time: - pd-allyl: ~21 min opt + 16 min post-eval - rh-conjugate: ~10 min opt + 13 min post-eval - heck-relay: ~24 min opt + 38 min post-eval - Total: ~2.0 hr GPU on RTX 5090 Companion docs update lives in ericchansen/q2mm docs/systems/{pd-allyl,rh-conjugate,heck-relay}.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ericchansen
added a commit
that referenced
this pull request
May 28, 2026
…nt fix Companion to q2mm fix branch fix/mm3-non-smooth-gradient (commit 78e72fa, PR #TBD). Re-runs the convergence pipeline with --n-evals 10 against q2mm patched for the angle-term gradient correctness bug documented in q2mm#284. Results — two previously "no improvement" verdicts now SIGNIFICANT: | System | Pre-fix Δ% | Post-fix Δ% | Verdict | |-----------------|------------------|------------------|---------------| | ch3f | 99.83 % (det.) | 99.83 % (det.) | unchanged ✅ | | rh-enamide | 44.66 % ± 0.29 % | 44.73 % ± 0.29 % | unchanged ✅ | | pd-allyl | -0.029 % ± 0.34% | -0.01 % ± 0.40 % | still NS ❌ | | rh-conjugate | -0.080 % ± 1.18% | 18.00 % ± 4.17 % | NEWLY ✅ | | heck-relay* | -0.59 % ± 3.26 % | 52.82 % ± 1.54 % | NEWLY ✅ | (*) heck-relay run with --ratio-tol none; with the fix the ratio actually drops from 1.378 → 1.085, so the gate would now pass at default tolerance. Bypass retained here for direct comparison against the pre-fix #9 baseline. What this PR contains Per-system, the convergence/ directory now has: - <system>_optimized.fld — optimized force field - validation_results.json — n=10 mean+CI numbers, full provenance - paper_metrics.json — paper-comparable Seminario vs. optimized stats Provenance (every JSON): - q2mm git_sha: 78e72fa (the fix branch's HEAD) - q2mm-data git_sha: a3cc8d7 (main, post-#8) - n_evals: 10 - ratio_tol: 0.15 (default) for 4 systems; null for heck-relay pd-allyl's pd-allyl_optimized.fld is bit-identical to the previous version — the surrogate-guided step still worsened the real OF slightly (within noise), so ScipyOptimizer reverted to initial params. Even the fix doesn't unlock pd-allyl: its FF really does sit at a JaxLoss local minimum, distinct from the rh-conjugate / heck-relay cases where the clip-arccos bug was preventing the optimizer from finding real descent directions. Wall time on RTX 5090: - ch3f: ~3 s (deterministic, n=5) - rh-enamide: ~26 min (opt + n=5 post-eval) - pd-allyl: ~50 min (opt + n=10 post-eval) - rh-conjugate: ~36 min (opt + n=10 post-eval) - heck-relay: ~98 min (opt + n=10 post-eval) - Total: ~3.5 hr GPU The audit-orphans CI workflow (q2mm-data#8) is expected to pass since every directory modified is already referenced in q2mm/docs/systems/*.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
scripts/audit-orphans.shautomatically.What the workflow does
main, every Monday at 09:00 UTC, and manualworkflow_dispatch.actions/checkout@v4.ericchansen/q2mmatmasterintoq2mm-siblingwithactions/checkout@v4.bash scripts/audit-orphans.sh "${GITHUB_WORKSPACE}/q2mm-sibling"from the q2mm-data checkout.Local smoke tests
python3 -c "import yaml; yaml.safe_load(open('.github/workflows/audit-orphans.yml'))"passed. (pythonis not installed in this local shell;python3is.)bash scripts/audit-orphans.sh /home/eric/repos/q2mmexited 0 with✅ No orphaned directories.benchmarks/_smoke_orphan/scratch/dummy.txt, the same audit exited 1 and reportedbenchmarks/_smoke_orphan/scratchas orphaned.benchmarks/_smoke_orphan, the audit exited 0 again with✅ No orphaned directories.