feat(benchmarks)!: flip canonical subdirs to QFUERZA-start default by ericchansen · Pull Request #11 · ericchansen/q2mm-data

ericchansen · 2026-05-28T20:36:57Z

QFUERZA-recovery benchmark artifacts (canonical default flip)

Companion data PR to q2mm#290,
which flipped load_system(starting_point=...) default from "published" to
"qfuerza". The canonical artifact path follows.

⚠ Merge order: this PR depends on q2mm#290 merging first.
The audit-orphans CI job here checks q2mm master for path references; the new
from-published/ subdirs are only referenced from q2mm#290's updated docs (docs/benchmarks/qfuerza-recovery.md,
benchmarks/CONVERGENCE_README.md). Once q2mm#290 lands on master, re-run this PR's CI.

Layout change

Per TS system in benchmarks/<system>/:

Subdir	Before	After
`convergence/`	published-start (canonical)	QFUERZA-start (canonical)
`from-published/`	—	published-start (opt-out baseline)
`from-qfuerza/`	QFUERZA-start (opt-in)	removed

5 TS systems affected: rh-enamide, heck-relay, pd-allyl-amination,
pd-1,4-conjugate-addition, rh-1,4-conjugate-addition. ch3f/ is unchanged.

What each subdir contains

convergence/ — canonical default produced by
scripts/regenerate_convergence_results.py (no extra flags;
starting_point="qfuerza" since q2mm#290): QFUERZA-derived
bond/angle scalars on the chemist-provided OPT topology, then
L-BFGS-B optimization against the full multi-target ReferenceData.
Files: validation_results.json, paper_metrics.json,
<system>_optimized.fld, per_param_comparison.md.
from-published/ — opt-out baseline produced with
--starting-point published. Same file layout (no
per_param_comparison.md, since the comparison is QFUERZA-only).

Results (carried forward from prior commits)

System	Pub. final	QFUERZA final	Ratio	Notes
rh-enamide	2.70 × 10⁵	2.78 × 10⁵	1.03×	✅ same basin
pd-allyl	7.99 × 10⁶	7.98 × 10⁶	1.00×	✅ same basin
pd-conjugate	7.24 × 10⁶	8.25 × 10⁶	1.14×	⚠ nearby basin
rh-conjugate	5.10 × 10⁶	1.78 × 10⁷	3.49×	❌ different basin
heck-relay	1.45 × 10⁶	1.45 × 10⁸	100×	❌ JaxLoss diverged

Honest mixed result; the 3 divergent systems are documented as known
limitations with workarounds in the
q2mm qfuerza-recovery doc.

Documentation updates

README.md — directory tree now shows both subdirs per TS system;
canonical/opt-out semantics documented.
benchmarks/CONVERGENCE_README.md — leads with the QFUERZA-default
framing; both subdirs explained; per_param_comparison.md added to
the file table; regeneration examples updated.

No-code-change rationale

scripts/audit-orphans.sh walks subdirs dynamically. No hardcoded path
references in the q2mm-data scripts/ or workflows/ directories.

Breaking change

External scripts or notebooks hardcoding either of the old path
conventions must be updated:

benchmarks/<system>/from-qfuerza/ → benchmarks/<system>/convergence/
benchmarks/<system>/convergence/ → benchmarks/<system>/from-published/
(when the consumer specifically wanted the published-start baseline)

Consumers of "whatever the canonical default is" can continue to read
benchmarks/<system>/convergence/ and will automatically pick up the
new QFUERZA-start data.

Adds benchmarks/<system>/from-qfuerza/ artifacts for rh-enamide, heck-relay, pd-allyl-amination, pd-1,4-conjugate-addition, and rh-1,4-conjugate-addition. Each directory contains: - validation_results.json — full run record with provenance, audit, R² - paper_metrics.json — published-paper-comparable metrics - <system>_optimized.fld — final optimized force field These runs start from QFUERZA Hessian-derived bond/angle values (overwriting the published OPT scalars) and run the standard SciPy L-BFGS-B + JaxLoss pipeline. See ericchansen/q2mm#290 for the loader and CLI code, and docs/benchmarks/qfuerza-recovery.md (in that PR) for the methodology and interpretation. Summary (QFUERZA vs published-start final OF ratio): - rh-enamide: 1.03x (same basin) - pd-allyl: 1.00x (same basin) - pd-conjugate: 1.14x (nearby basin) - rh-conjugate: 3.49x (different basin) - heck-relay: 100x (JaxLoss surrogate diverged) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…diffs Re-runs all 5 TS systems with the Phase 0 protocol fix from ericchansen/q2mm PR #290 (ftol=1e-12, fractional bounds 20% fc / 5% eq, heck-relay overridden to 5% fc). Real OF improvement now achieved on 4 of 5 systems (was 1/5 with the previous loose protocol). New artifacts per system: - per_param_comparison.md — row-by-row diff vs published OPT params, grouped by parameter category and atom motif New cross-system artifacts: - CROSS_SYSTEM_R2_RMSD.md — R²/RMSD comparison table: paper vs q2mm@published vs q2mm@QFUERZA - paper_r2_reference.json — extracted published-paper R²/RMSD reference values for the recovery-doc comparison Final QFUERZA-vs-published objective ratios (lower = q2mm objective favors QFUERZA-optimized over published-optimized): - rh-enamide: 1.09× (same basin, paper-quality RMSD) - pd-allyl: 0.77× (q2mm engine favors QFUERZA over published) - pd-conjugate: 0.86× (q2mm engine favors QFUERZA over published) - rh-conjugate: 1.50× (nearby basin; NEGATIVE fc bug present) - heck-relay: 91× (JaxLoss exploded, 0 L-BFGS-B iters) The "q2mm engine favors QFUERZA" finding for pd-allyl/pd-conjugate is analyzed in the recovery doc — it points to a q2mm/MM3 vs MacroModel/MM3* backend parity gap that limits what parameter-recovery experiments can conclude about Wahlers systems. See ericchansen/q2mm PR #290 for the full analysis page and methodology rationale.

ericchansen · 2026-05-31T21:59:22Z

Heads-up: `Audit orphaned benchmark data` failure is structural, not a real orphan

The audit-orphans.sh workflow checks out ericchansen/q2mm at master and greps for references to each benchmarks/<system>/<subdir>/. The new from-qfuerza/ directories in this PR are referenced from q2mm — in docs/benchmarks/qfuerza-recovery.md and scripts/regenerate_convergence_results.py — but those references are on the feat/qfuerza-from-scratch branch, not on master yet.

So this red CI is expected for any paired q2mm + q2mm-data PR until the q2mm PR merges. Two options:

Merge order: merge q2mm#290 to master first, then re-run this audit on the same 1f11ee7 SHA — it will pass.
Workflow improvement (future): update .github/workflows/audit-orphans.yml to also check out the head branch of any linked q2mm PR; out of scope for this PR.

No action needed from me unless you'd like option 2 split out as a separate PR against q2mm-data master.

…efault Aligns with q2mm PR #290, which flipped `load_system(starting_point=...)` default from `"published"` to `"qfuerza"`. The canonical artifact path must follow the canonical default. Layout change (per TS system in benchmarks/<system>/): - `convergence/` <- previously `from-qfuerza/` (canonical: QFUERZA-start) - `from-published/` <- previously `convergence/` (opt-out baseline) ch3f/ is unchanged (not a TS system; only has a single convergence/). Docs: - README.md: directory tree updated to show both subdirs per system and document canonical (QFUERZA) vs opt-out (published) framing. - benchmarks/CONVERGENCE_README.md: lead with the QFUERZA-default framing, link to the q2mm qfuerza-recovery doc, document both subdir semantics, add per_param_comparison.md to the file table, update regeneration examples. No code changes — audit-orphans.sh walks subdirs dynamically, so no hardcoded path updates needed. BREAKING CHANGE: any external script or notebook that hardcodes `benchmarks/<system>/from-qfuerza/` must switch to `benchmarks/<system>/convergence/`. Scripts that hardcoded the old `benchmarks/<system>/convergence/` (published-start) must switch to `benchmarks/<system>/from-published/`. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ericchansen and others added 2 commits May 28, 2026 15:36

ericchansen mentioned this pull request Jun 1, 2026

feat(systems)!: default starting_point to qfuerza ericchansen/q2mm#290

Open

ericchansen changed the title ~~feat(benchmarks): add QFUERZA-recovery results for all 5 TS systems~~ feat(benchmarks)!: flip canonical subdirs to QFUERZA-start default Jun 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(benchmarks)!: flip canonical subdirs to QFUERZA-start default#11

feat(benchmarks)!: flip canonical subdirs to QFUERZA-start default#11
ericchansen wants to merge 3 commits into
mainfrom
feat/qfuerza-recovery-results

ericchansen commented May 28, 2026 •

edited

Loading

Uh oh!

ericchansen commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ericchansen commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QFUERZA-recovery benchmark artifacts (canonical default flip)

Layout change

What each subdir contains

Results (carried forward from prior commits)

Documentation updates

No-code-change rationale

Breaking change

Related

Uh oh!

ericchansen commented May 31, 2026

Heads-up: Audit orphaned benchmark data failure is structural, not a real orphan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ericchansen commented May 28, 2026 •

edited

Loading

Heads-up: `Audit orphaned benchmark data` failure is structural, not a real orphan