Skip to content

feat(benchmarks)!: flip canonical subdirs to QFUERZA-start default#11

Open
ericchansen wants to merge 3 commits into
mainfrom
feat/qfuerza-recovery-results
Open

feat(benchmarks)!: flip canonical subdirs to QFUERZA-start default#11
ericchansen wants to merge 3 commits into
mainfrom
feat/qfuerza-recovery-results

Conversation

@ericchansen
Copy link
Copy Markdown
Owner

@ericchansen ericchansen commented May 28, 2026

QFUERZA-recovery benchmark artifacts (canonical default flip)

Companion data PR to q2mm#290,
which flipped load_system(starting_point=...) default from "published" to
"qfuerza". The canonical artifact path follows.

Merge order: this PR depends on q2mm#290 merging first.
The audit-orphans CI job here checks q2mm master for path references; the new
from-published/ subdirs are only referenced from q2mm#290's updated docs (docs/benchmarks/qfuerza-recovery.md,
benchmarks/CONVERGENCE_README.md). Once q2mm#290 lands on master, re-run this PR's CI.

Layout change

Per TS system in benchmarks/<system>/:

Subdir Before After
convergence/ published-start (canonical) QFUERZA-start (canonical)
from-published/ published-start (opt-out baseline)
from-qfuerza/ QFUERZA-start (opt-in) removed

5 TS systems affected: rh-enamide, heck-relay, pd-allyl-amination,
pd-1,4-conjugate-addition, rh-1,4-conjugate-addition. ch3f/ is unchanged.

What each subdir contains

  • convergence/ — canonical default produced by
    scripts/regenerate_convergence_results.py (no extra flags;
    starting_point="qfuerza" since q2mm#290): QFUERZA-derived
    bond/angle scalars on the chemist-provided OPT topology, then
    L-BFGS-B optimization against the full multi-target ReferenceData.
    Files: validation_results.json, paper_metrics.json,
    <system>_optimized.fld, per_param_comparison.md.
  • from-published/ — opt-out baseline produced with
    --starting-point published. Same file layout (no
    per_param_comparison.md, since the comparison is QFUERZA-only).

Results (carried forward from prior commits)

System Pub. final QFUERZA final Ratio Notes
rh-enamide 2.70 × 10⁵ 2.78 × 10⁵ 1.03× ✅ same basin
pd-allyl 7.99 × 10⁶ 7.98 × 10⁶ 1.00× ✅ same basin
pd-conjugate 7.24 × 10⁶ 8.25 × 10⁶ 1.14× ⚠ nearby basin
rh-conjugate 5.10 × 10⁶ 1.78 × 10⁷ 3.49× ❌ different basin
heck-relay 1.45 × 10⁶ 1.45 × 10⁸ 100× ❌ JaxLoss diverged

Honest mixed result; the 3 divergent systems are documented as known
limitations with workarounds in the
q2mm qfuerza-recovery doc.

Documentation updates

  • README.md — directory tree now shows both subdirs per TS system;
    canonical/opt-out semantics documented.
  • benchmarks/CONVERGENCE_README.md — leads with the QFUERZA-default
    framing; both subdirs explained; per_param_comparison.md added to
    the file table; regeneration examples updated.

No-code-change rationale

scripts/audit-orphans.sh walks subdirs dynamically. No hardcoded path
references in the q2mm-data scripts/ or workflows/ directories.

Breaking change

External scripts or notebooks hardcoding either of the old path
conventions must be updated:

  • benchmarks/<system>/from-qfuerza/benchmarks/<system>/convergence/
  • benchmarks/<system>/convergence/benchmarks/<system>/from-published/
    (when the consumer specifically wanted the published-start baseline)

Consumers of "whatever the canonical default is" can continue to read
benchmarks/<system>/convergence/ and will automatically pick up the
new QFUERZA-start data.

Related

  • q2mm PR #290 — loader/CLI default flip (merge first)
  • Farrugia 2025, J. Chem. Theory Comput. 22, 469
    (10.1021/acs.jctc.5c01751)

ericchansen and others added 2 commits May 28, 2026 15:36
Adds benchmarks/<system>/from-qfuerza/ artifacts for rh-enamide,
heck-relay, pd-allyl-amination, pd-1,4-conjugate-addition, and
rh-1,4-conjugate-addition. Each directory contains:

- validation_results.json — full run record with provenance, audit, R²
- paper_metrics.json — published-paper-comparable metrics
- <system>_optimized.fld — final optimized force field

These runs start from QFUERZA Hessian-derived bond/angle values
(overwriting the published OPT scalars) and run the standard SciPy
L-BFGS-B + JaxLoss pipeline. See ericchansen/q2mm#290 for the loader
and CLI code, and docs/benchmarks/qfuerza-recovery.md (in that PR) for
the methodology and interpretation.

Summary (QFUERZA vs published-start final OF ratio):
- rh-enamide: 1.03x (same basin)
- pd-allyl: 1.00x (same basin)
- pd-conjugate: 1.14x (nearby basin)
- rh-conjugate: 3.49x (different basin)
- heck-relay: 100x (JaxLoss surrogate diverged)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…diffs

Re-runs all 5 TS systems with the Phase 0 protocol fix from
ericchansen/q2mm PR #290 (ftol=1e-12, fractional bounds 20% fc / 5% eq,
heck-relay overridden to 5% fc). Real OF improvement now achieved on
4 of 5 systems (was 1/5 with the previous loose protocol).

New artifacts per system:
- per_param_comparison.md — row-by-row diff vs published OPT params,
  grouped by parameter category and atom motif

New cross-system artifacts:
- CROSS_SYSTEM_R2_RMSD.md — R²/RMSD comparison table:
  paper vs q2mm@published vs q2mm@QFUERZA
- paper_r2_reference.json — extracted published-paper R²/RMSD
  reference values for the recovery-doc comparison

Final QFUERZA-vs-published objective ratios (lower = q2mm objective
favors QFUERZA-optimized over published-optimized):
- rh-enamide:   1.09× (same basin, paper-quality RMSD)
- pd-allyl:     0.77× (q2mm engine favors QFUERZA over published)
- pd-conjugate: 0.86× (q2mm engine favors QFUERZA over published)
- rh-conjugate: 1.50× (nearby basin; NEGATIVE fc bug present)
- heck-relay:   91×   (JaxLoss exploded, 0 L-BFGS-B iters)

The "q2mm engine favors QFUERZA" finding for pd-allyl/pd-conjugate is
analyzed in the recovery doc — it points to a q2mm/MM3 vs MacroModel/MM3*
backend parity gap that limits what parameter-recovery experiments can
conclude about Wahlers systems.

See ericchansen/q2mm PR #290 for the full analysis page and
methodology rationale.
@ericchansen
Copy link
Copy Markdown
Owner Author

Heads-up: Audit orphaned benchmark data failure is structural, not a real orphan

The audit-orphans.sh workflow checks out ericchansen/q2mm at master and greps for references to each benchmarks/<system>/<subdir>/. The new from-qfuerza/ directories in this PR are referenced from q2mm — in docs/benchmarks/qfuerza-recovery.md and scripts/regenerate_convergence_results.py — but those references are on the feat/qfuerza-from-scratch branch, not on master yet.

So this red CI is expected for any paired q2mm + q2mm-data PR until the q2mm PR merges. Two options:

  1. Merge order: merge q2mm#290 to master first, then re-run this audit on the same 1f11ee7 SHA — it will pass.
  2. Workflow improvement (future): update .github/workflows/audit-orphans.yml to also check out the head branch of any linked q2mm PR; out of scope for this PR.

No action needed from me unless you'd like option 2 split out as a separate PR against q2mm-data master.

…efault

Aligns with q2mm PR #290, which flipped `load_system(starting_point=...)`
default from `"published"` to `"qfuerza"`. The canonical artifact path
must follow the canonical default.

Layout change (per TS system in benchmarks/<system>/):

- `convergence/`     <- previously `from-qfuerza/` (canonical: QFUERZA-start)
- `from-published/`  <- previously `convergence/`  (opt-out baseline)

ch3f/ is unchanged (not a TS system; only has a single convergence/).

Docs:
- README.md: directory tree updated to show both subdirs per system and
  document canonical (QFUERZA) vs opt-out (published) framing.
- benchmarks/CONVERGENCE_README.md: lead with the QFUERZA-default
  framing, link to the q2mm qfuerza-recovery doc, document both
  subdir semantics, add per_param_comparison.md to the file table,
  update regeneration examples.

No code changes — audit-orphans.sh walks subdirs dynamically, so no
hardcoded path updates needed.

BREAKING CHANGE: any external script or notebook that hardcodes
`benchmarks/<system>/from-qfuerza/` must switch to
`benchmarks/<system>/convergence/`. Scripts that hardcoded the old
`benchmarks/<system>/convergence/` (published-start) must switch to
`benchmarks/<system>/from-published/`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ericchansen ericchansen changed the title feat(benchmarks): add QFUERZA-recovery results for all 5 TS systems feat(benchmarks)!: flip canonical subdirs to QFUERZA-start default Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant