feat(benchmarks)!: flip canonical subdirs to QFUERZA-start default#11
Open
ericchansen wants to merge 3 commits into
Open
feat(benchmarks)!: flip canonical subdirs to QFUERZA-start default#11ericchansen wants to merge 3 commits into
ericchansen wants to merge 3 commits into
Conversation
Adds benchmarks/<system>/from-qfuerza/ artifacts for rh-enamide, heck-relay, pd-allyl-amination, pd-1,4-conjugate-addition, and rh-1,4-conjugate-addition. Each directory contains: - validation_results.json — full run record with provenance, audit, R² - paper_metrics.json — published-paper-comparable metrics - <system>_optimized.fld — final optimized force field These runs start from QFUERZA Hessian-derived bond/angle values (overwriting the published OPT scalars) and run the standard SciPy L-BFGS-B + JaxLoss pipeline. See ericchansen/q2mm#290 for the loader and CLI code, and docs/benchmarks/qfuerza-recovery.md (in that PR) for the methodology and interpretation. Summary (QFUERZA vs published-start final OF ratio): - rh-enamide: 1.03x (same basin) - pd-allyl: 1.00x (same basin) - pd-conjugate: 1.14x (nearby basin) - rh-conjugate: 3.49x (different basin) - heck-relay: 100x (JaxLoss surrogate diverged) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…diffs Re-runs all 5 TS systems with the Phase 0 protocol fix from ericchansen/q2mm PR #290 (ftol=1e-12, fractional bounds 20% fc / 5% eq, heck-relay overridden to 5% fc). Real OF improvement now achieved on 4 of 5 systems (was 1/5 with the previous loose protocol). New artifacts per system: - per_param_comparison.md — row-by-row diff vs published OPT params, grouped by parameter category and atom motif New cross-system artifacts: - CROSS_SYSTEM_R2_RMSD.md — R²/RMSD comparison table: paper vs q2mm@published vs q2mm@QFUERZA - paper_r2_reference.json — extracted published-paper R²/RMSD reference values for the recovery-doc comparison Final QFUERZA-vs-published objective ratios (lower = q2mm objective favors QFUERZA-optimized over published-optimized): - rh-enamide: 1.09× (same basin, paper-quality RMSD) - pd-allyl: 0.77× (q2mm engine favors QFUERZA over published) - pd-conjugate: 0.86× (q2mm engine favors QFUERZA over published) - rh-conjugate: 1.50× (nearby basin; NEGATIVE fc bug present) - heck-relay: 91× (JaxLoss exploded, 0 L-BFGS-B iters) The "q2mm engine favors QFUERZA" finding for pd-allyl/pd-conjugate is analyzed in the recovery doc — it points to a q2mm/MM3 vs MacroModel/MM3* backend parity gap that limits what parameter-recovery experiments can conclude about Wahlers systems. See ericchansen/q2mm PR #290 for the full analysis page and methodology rationale.
Owner
Author
Heads-up:
|
…efault Aligns with q2mm PR #290, which flipped `load_system(starting_point=...)` default from `"published"` to `"qfuerza"`. The canonical artifact path must follow the canonical default. Layout change (per TS system in benchmarks/<system>/): - `convergence/` <- previously `from-qfuerza/` (canonical: QFUERZA-start) - `from-published/` <- previously `convergence/` (opt-out baseline) ch3f/ is unchanged (not a TS system; only has a single convergence/). Docs: - README.md: directory tree updated to show both subdirs per system and document canonical (QFUERZA) vs opt-out (published) framing. - benchmarks/CONVERGENCE_README.md: lead with the QFUERZA-default framing, link to the q2mm qfuerza-recovery doc, document both subdir semantics, add per_param_comparison.md to the file table, update regeneration examples. No code changes — audit-orphans.sh walks subdirs dynamically, so no hardcoded path updates needed. BREAKING CHANGE: any external script or notebook that hardcodes `benchmarks/<system>/from-qfuerza/` must switch to `benchmarks/<system>/convergence/`. Scripts that hardcoded the old `benchmarks/<system>/convergence/` (published-start) must switch to `benchmarks/<system>/from-published/`. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
QFUERZA-recovery benchmark artifacts (canonical default flip)
Companion data PR to q2mm#290,
which flipped
load_system(starting_point=...)default from"published"to"qfuerza". The canonical artifact path follows.Layout change
Per TS system in
benchmarks/<system>/:convergence/from-published/from-qfuerza/5 TS systems affected: rh-enamide, heck-relay, pd-allyl-amination,
pd-1,4-conjugate-addition, rh-1,4-conjugate-addition.
ch3f/is unchanged.What each subdir contains
convergence/— canonical default produced byscripts/regenerate_convergence_results.py(no extra flags;starting_point="qfuerza"since q2mm#290): QFUERZA-derivedbond/angle scalars on the chemist-provided OPT topology, then
L-BFGS-B optimization against the full multi-target ReferenceData.
Files:
validation_results.json,paper_metrics.json,<system>_optimized.fld,per_param_comparison.md.from-published/— opt-out baseline produced with--starting-point published. Same file layout (noper_param_comparison.md, since the comparison is QFUERZA-only).Results (carried forward from prior commits)
Honest mixed result; the 3 divergent systems are documented as known
limitations with workarounds in the
q2mm qfuerza-recovery doc.
Documentation updates
README.md— directory tree now shows both subdirs per TS system;canonical/opt-out semantics documented.
benchmarks/CONVERGENCE_README.md— leads with the QFUERZA-defaultframing; both subdirs explained;
per_param_comparison.mdadded tothe file table; regeneration examples updated.
No-code-change rationale
scripts/audit-orphans.shwalks subdirs dynamically. No hardcoded pathreferences in the q2mm-data scripts/ or workflows/ directories.
Breaking change
External scripts or notebooks hardcoding either of the old path
conventions must be updated:
benchmarks/<system>/from-qfuerza/→benchmarks/<system>/convergence/benchmarks/<system>/convergence/→benchmarks/<system>/from-published/(when the consumer specifically wanted the published-start baseline)
Consumers of "whatever the canonical default is" can continue to read
benchmarks/<system>/convergence/and will automatically pick up thenew QFUERZA-start data.
Related
(10.1021/acs.jctc.5c01751)