feat(systems)!: default starting_point to qfuerza by ericchansen · Pull Request #290 · ericchansen/q2mm

ericchansen · 2026-05-28T20:36:18Z

TL;DR

Make QFUERZA the canonical default starting point for TS-system optimizations. starting_point="qfuerza" is now the default on load_system(); --starting-point qfuerza is the default on regenerate_convergence_results.py. Output subdir naming follows: convergence/ for the canonical default, from-published/ for the opt-in publication-baseline path.

A .fld skeleton (atom types, OPT-row topology, frozen/active partition, vdW, stretch-bend) must still be provided — those are chemistry decisions a tool cannot automate. QFUERZA fills in 100% of the bond/angle scalars it is defined to estimate, given that skeleton. For our 5 TS systems we use the literature .fld files (Donoghue 2008, Rosales 2020, Wahlers 2022) as the skeleton.

What this PR changes

API / CLI

q2mm.diagnostics.systems.load_system(starting_point=…): default flipped from "published" to "qfuerza". Accepts "published" to retain the literature OPT values verbatim (publication-baseline reproduction).
scripts/regenerate_convergence_results.py --starting-point: default flipped to "qfuerza". Output subdir is convergence/ for the canonical default, from-published/ for --starting-point published.
q2mm-benchmark user CLI: new --starting-point flag for symmetry (default "qfuerza").

Guardrails

ForceField.get_fractional_bounds(): builds sign-aware [v − f·|v|, v + f·|v|] bounds intersected with DEFAULT_BOUNDS. Documented degenerate-bounds guard.
scripts/regenerate_convergence_results.py: --fc-fraction / --eq-fraction / --ftol / --ratio-tol flags. Batch-level ERROR + non-zero exit when no system shows real progress.
q2mm/optimizers/scipy_opt.py::_run_minimize: WARNING when n_iterations<=2 and real-OF delta < 1%.

Documentation

docs/benchmarks/qfuerza-recovery.md (454 lines) — QFUERZA-recovery validation. New framing: "QFUERZA workflow (canonical default)" with explicit skeleton-vs-scalars decomposition; per-system R²/RMSD vs paper, q2mm-from-published, q2mm-from-QFUERZA; per-parameter abs deviation tables; physical-chemistry walkthrough of top deviations; "Known limitations" section listing the 3/5 systems that diverge from QFUERZA default and the workarounds.
AGENTS.md: new "Starting Point" subsection under §6; §11 pre-flight checklist updated to flip default framing.
.copilot/skills/q2mm-benchmark/SKILL.md + q2mm-analysis-design/SKILL.md: process skills that walk through the canonical workflow before launching any batch >30 min or writing any comparison doc.

Tests

test/test_systems.py::TestStartingPoint (11 tests, all passing): asserts default is "qfuerza"; pinning "published" yields zero QFUERZA overwrite; QFUERZA touches OPT scalars only (frozen MM3 backbone bit-identical); reference data is independent of starting_point; audit classification is consistent with _PARAM_SLOTS.
test/test_models.py::TestGetFractionalBounds: verifies sign-aware fractional-bound construction including the degenerate-bounds guard.
test/integration/test_heck_validation.py: pinned to starting_point="published" (the test asserts bit-identical literature OPT values).

Results

Per-system QFUERZA-from-canonical-default convergence (full discussion in docs/benchmarks/qfuerza-recovery.md):

System	Pub. final OF	QFUERZA final OF	Ratio	Verdict
rh-enamide	2.70 × 10⁵	2.94 × 10⁵	1.09×	✅ same basin
pd-allyl	7.99 × 10⁶	6.14 × 10⁶	0.77×	🌟 lower (engine-vs-MacroModel artifact, not chemical win)
pd-conjugate	7.24 × 10⁶	6.22 × 10⁶	0.86×	🌟 lower (engine artifact)
rh-conjugate	5.10 × 10⁶	7.67 × 10⁶	1.50×	⚠ nearby basin; negative `fc` (unphysical, see §3.4)
heck-relay	1.45 × 10⁶	1.32 × 10⁸	91×	❌ JaxLoss diverged; L-BFGS-B exits in 0 iters

Known limitations are documented honestly in the new "Known limitations" section of qfuerza-recovery.md with concrete workarounds (--fc-fraction 0.05 or --starting-point published). Follow-up work tracked: positive-fc sign constraint in the optimizer, q2mm/MacroModel MM3 backend parity audit, heck-relay JaxLoss pre-conditioning.

Breaking change

starting_point now defaults to "qfuerza" on load_system() and --starting-point qfuerza on regenerate_convergence_results.py. Callers that depended on the publication-baseline path must pass starting_point="published" (loader) or --starting-point published (CLI). The output subdirectory naming also flips: canonical default now writes to convergence/; opt-in publication-baseline writes to from-published/.

Companion data PR: feat(benchmarks)!: flip canonical subdirs to QFUERZA-start default q2mm-data#11 — from-qfuerza/ → convergence/ and convergence/ → from-published/ directory renames, with updated audit-orphans workflow paths and CONVERGENCE_README.
Methodology of record: Farrugia, Helquist, Norrby & Wiest, J. Chem. Theory Comput. 2025, 22, 469. 10.1021/acs.jctc.5c01751

Adds a starting_point parameter to load_system that, when set to "qfuerza", overwrites the published OPT bond/angle scalars with QFUERZA (Farrugia 2025) Hessian-derived values while leaving the published OPT topology, frozen MM3 backbone, vdW, stretch-bend, Urey-Bradley, and torsion rows untouched. Torsions are zeroed by qfuerza_into per Farrugia 2025; for systems where published torsions are already zero (e.g. rh-enamide Donoghue 2008), this is a no-op. The default "published" path is unchanged — existing baselines and behavior are preserved. A per-param-type audit is attached to SystemData.metadata under starting_point_audit, classifying every scalar as qfuerza_overwritten, retained_published, or frozen. This makes leakage (vdW, unmatched bond/angle rows, etc.) honest and visible rather than hidden. The qfuerza_fresh strategy (CH3F) treats "qfuerza" as a no-op since its starting FF is already QFUERZA-derived. scripts/regenerate_convergence_results.py grows a --starting-point {published,qfuerza} flag; output subdirectory becomes from-qfuerza/ instead of convergence/ when qfuerza is selected, preserving baseline artifacts. Anchors: Farrugia 2025 (DOI 10.1021/acs.jctc.5c01751), AGENTS.md §6. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Documents results of starting q2mm optimization from QFUERZA Hessian-derived bond/angle values (instead of published OPT values) on all 5 TS systems. Headline result: mixed. rh-enamide and pd-allyl converge to the same basin as published-start runs. pd-conjugate, rh-conjugate, and heck-relay land in different (worse) basins, with heck-relay failing entirely due to JaxLoss surrogate divergence at the poor starting FF. The page is explicit that this is NOT a from-scratch FF generation: QFUERZA only overwrites bond/angle scalars on top of the published OPT topology, frozen/active partition, vdW, SB, and atom-type rows. Per-row audit numbers are reported per system. Data lives in ericchansen/q2mm-data/benchmarks/<system>/from-qfuerza/. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds a new starting_point="qfuerza" option to the benchmark system loader so optimizations can start from QFUERZA Hessian-derived bond/angle values (while preserving published topology/frozen partition), and wires this through the convergence regeneration script, tests, and documentation.

Changes:

Extend load_system() with a starting_point parameter plus a per-parameter “starting point audit” recorded in system metadata.
Add --starting-point {published,qfuerza} to scripts/regenerate_convergence_results.py, including provenance/output subdirectory routing.
Add regression tests and a new benchmark documentation page for QFUERZA-recovery runs.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`q2mm/diagnostics/systems.py`	Adds `starting_point` support, QFUERZA overwrite hook, and audit metadata for scalar provenance.
`scripts/regenerate_convergence_results.py`	Adds CLI flag, provenance fields, logging, and output subdir selection for starting point runs.
`test/test_systems.py`	Adds `TestStartingPoint` coverage for backward-compatibility, overwrite behavior, and audit consistency.
`docs/benchmarks/qfuerza-recovery.md`	Documents methodology and results for QFUERZA-recovery validation across TS systems.
`properdocs.yml`	Adds navigation entry for the new benchmark page.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

PR #290 first iteration shipped 5 systems with a silent-failure optimizer protocol: L-BFGS-B exited at n_iterations<=2 with negligible OF change for every system, scipy reported success=True, and the misleading results made it into the comparison doc. This commit adds the source-code, CLI, agent-skill and documentation guardrails that prevent that pattern from recurring. Source code - q2mm/models/forcefield.py: add ForceField.get_fractional_bounds (fc_fraction, eq_fraction) returning a (val +/- frac*|val|) box per parameter, intersected with DEFAULT_BOUNDS sanity envelope. Sign-aware for TSFF negative force constants; falls back to sanity bounds when |val| < 1e-6 (frozen-at-zero torsions). - q2mm/optimizers/scipy_opt.py: ScipyOptimizer accepts fc_fraction and eq_fraction; uses them for the bounds list when set. Adds a WARNING in _run_minimize when n_iterations<=2 and |delta|/init<1% — the silent-failure fingerprint. - scripts/regenerate_convergence_results.py: add --ftol, --fc-fraction, --eq-fraction CLI flags (defaults preserve backward-compatible behavior for the existing convergence/ baselines). Emit batch-level ERROR + non-zero exit when every optimized system fails the no-progress check. Tests - test/test_models.py: 5 new tests covering fractional bounds for positive/negative FCs, sanity-envelope intersection, zero-value fallback, and the None-passthrough. Agent skills - .copilot/skills/q2mm-benchmark/SKILL.md: forces the agent through the audit gate (run FIRST system, inspect n_iterations & improvement_pct, STOP if either fails) before launching any q2mm batch >30 min. - .copilot/skills/q2mm-analysis-design/SKILL.md: forces the agent to restate the user's question and mock comparison tables BEFORE writing a benchmark analysis doc. Documentation - AGENTS.md: new section 11 "Benchmark Pre-Flight Checklist". Three new Common Pitfalls rows (n_iterations<=2 silent exit, default sanity bounds for from-poor-start runs, batch never optimized). Recommended invocation for from-QFUERZA runs: python scripts/regenerate_convergence_results.py \ --starting-point qfuerza --ratio-tol none \ --ftol 1e-12 --fc-fraction 0.20 --eq-fraction 0.05 Heck-relay specifically: tighter bounds (--fc-fraction 0.05 per AGENTS.md memory) due to the fragile TS landscape with large negative force constants. Refs: #290 (iterating) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replaces the headline-objective-only results table with the analysis the user actually asked for: published-paper vs q2mm@published vs q2mm@QFUERZA R²/RMSD per system, plus per-parameter motif-grouped comparison tables and a physical-chemistry walkthrough. Re-ran all 5 systems with the Phase 0 protocol fix (ftol=1e-12, fractional bounds: fc_fraction=0.20, eq_fraction=0.05, heck-relay overridden to fc_fraction=0.05). Real OF improvement now achieved on 4/5 systems (was 1/5 with the previous loose-factr protocol). Key new findings documented: - pd-allyl/pd-conjugate: q2mm objective at QFUERZA-optimized is LOWER than at published-optimized. R²/RMSD tables explain why: q2mm's MM3 implementation lacks MacroModel/MM3* cross-terms, so the FF that fits q2mm best is NOT the one that fit MacroModel best. Backend parity is the limiting factor for parameter-recovery validation. - rh-conjugate & heck-relay: optimizer produces NEGATIVE force constants. Optimizer bounds enforce magnitude but not sign — a ±20% bound around a near-zero positive fc can cross sign boundary. Documented as a known issue requiring a positive-fc constraint. - heck-relay: JaxLoss surrogate explodes at QFUERZA start due to bond R² = −6228; L-BFGS-B exits in 0 iterations. Documented as the same JaxLoss-fragility pattern from AGENTS.md §9. New analysis scripts: - scripts/compare_opt_rows.py — loader-bypassing per-param diff that auto-roundtrips published FF through ForceField.to_mm3_fld() so atom tokens and row ordering match the optimizer-saved file - scripts/build_qfuerza_recovery_tables.py — cross-system R²/RMSD rollup builder for the recovery doc - scripts/compare_qfuerza_to_published.py — earlier loader-based diff, superseded by compare_opt_rows.py but kept for the FF-object motif breakdown path Drive-by: fix Wahlers 2021 DOI in validation/published_ffs/README.md (was pointing to 10.1021/acs.joc.0c02918; correct is .1c00136). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Five fixes from copilot-pull-request-reviewer on the QFUERZA-recovery PR plus the test-core (3.10) CI failure: 1. test/test_systems.py — mark `test_qfuerza_is_noop_for_qfuerza_fresh_strategy` with `@pytest.mark.jax` so the core suite (which excludes JAX) skips it rather than failing the unconditional `from JaxEngine` import. Fixes the test-core CI failure introduced when JaxEngine was added to the test in PR #290. 2. test/test_systems.py — use `np.testing.assert_array_equal` for the frozen-scalar bit-identical check; the previous `np.allclose(..., atol=0.0)` still applied the default `rtol=1e-05`, so the test would have admitted a relative drift of up to 0.001%. The intent is true bit-identity for backbone params; the new assertion enforces it. 3. q2mm/diagnostics/systems.py — `load_system` now raises `ValueError` when `starting_point` is anything other than `"published"` or `"qfuerza"`. The `StartingPoint = Literal[...]` annotation is a static hint only; without a runtime guard a typo like `"qferza"` silently went through the `published` branch and produced misleading results. 4. q2mm/diagnostics/systems.py — refactor `_audit_starting_point` to derive per-scalar type labels from `ForceField._PARAM_SLOTS` (via a new `_build_param_type_labels` helper) instead of hand-rolling the collection/attr enumeration. Any future change to the parameter vector layout is now reflected automatically; unknown attrs raise `KeyError` rather than silently producing wrong labels. The Urey-Bradley tail still has to be appended explicitly because `_ub_angles` is a property, not in `_PARAM_SLOTS`. 5. scripts/regenerate_convergence_results.py — reflow the docstring so the `` `--starting-point published` `` and `` `--starting-point qfuerza` `` inline-code spans no longer break across newlines. Plus a new `test_unknown_starting_point_raises` test for #3. Validation: - `ruff check q2mm/ test/ scripts/` — clean - `ruff format --check q2mm test scripts examples` — clean - `pytest test/test_systems.py -x -q -m "not (openmm or tinker or jax or jax_md or psi4)"` — 10 passed, 1 deselected (the jax-marked test) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Five additional findings from copilot-pull-request-reviewer on the guardrails commit e81cf00. 1. q2mm/models/forcefield.py — guard `get_fractional_bounds()` against degenerate `lo >= hi` returns when the current param value lies outside the `DEFAULT_BOUNDS` sanity envelope. The naive `max(sanity_lo, val − f·|val|), min(sanity_hi, val + f·|val|)` formula can flip `lo > hi` for vdw_epsilon/ub_k/bond_k values outside the envelope (SciPy would then reject the bounds). Fall back to sanity bounds when the intersection is empty so L-BFGS-B can pull the value back into a physical region. New test `test_fractional_bounds_value_outside_sanity_falls_back` covers the bond_k = 5000 case (sanity envelope ±3600). 2-4. .copilot/skills/q2mm-benchmark/SKILL.md — the skill referenced CLI flags `--bounds-fraction-fc/--bounds-fraction-eq/--factr`, but the script added in this PR exposes `--fc-fraction/--eq-fraction/--ftol`. Following the skill would have failed with "unrecognized arguments" for any future agent. Updated all four occurrences (Step 2 prose, pre-flight checklist, Step 4 example command, quick-reference Q&A) to match the real CLI. 5. docs/benchmarks/qfuerza-recovery.md — the fractional-bounds formula was shown as `fc ∈ [fc₀·(1−f), fc₀·(1+f)]`, which is only correct for positive `fc₀`. The implementation is sign-aware (`val ± f·|val|`), which matters for TSFF negative force constants. Updated the doc to describe the actual sign-aware bound. Validation: - `ruff check q2mm/ test/ scripts/` — clean - `ruff format --check q2mm test scripts examples` — clean - `pytest test/test_models.py -k fractional_bounds` — 6/6 passed Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Flip the canonical default for `load_system()` and `regenerate_convergence_results.py --starting-point` from `"published"` to `"qfuerza"`, and rename the canonical output subdirectory accordingly (`convergence/` now holds QFUERZA-start runs; `from-published/` holds the opt-in publication-baseline runs). Reframe the QFUERZA-recovery doc, AGENTS.md §6 / §11, and the q2mm-benchmark / q2mm-analysis-design skills around the QFUERZA-as-defined story: QFUERZA fills in 100% of the bond/angle scalars it is defined to estimate, given a force-field skeleton (atom types, OPT topology, frozen/active partition, vdW/SB defaults) that the chemist provides via a `.fld` file. This is true for every QFUERZA workflow on every system — there is no `.fld`-free path because those decisions are chemistry calls a tool can't make. Document known limitations honestly: 3 of the 5 TS systems (rh-conjugate, pd-conjugate, heck-relay) don't converge cleanly from QFUERZA with default bounds yet; the doc lists the per-system workarounds (`--fc-fraction 0.05` or `--starting-point published`). Also: - Add a `--starting-point` flag to `q2mm-benchmark` CLI so the user CLI exposes the same lever as the regeneration script. - Pin `scripts/compare_opt_rows.py` to `starting_point="published"` (the script's whole purpose is to diff the publication baseline against an optimizer's output, so it must load literature values). - Pin `test/integration/test_heck_validation.py` to `starting_point="published"` (the test asserts bit-identical OPT values against the literature `.fld`). - Replace `test_default_starting_point_is_published` with `test_default_starting_point_is_qfuerza`; add a separate test that pins `starting_point="published"` and verifies zero QFUERZA overwrite. BREAKING CHANGE: `starting_point` now defaults to `"qfuerza"` on `load_system()` and `--starting-point qfuerza` on `regenerate_convergence_results.py`. Callers that depended on the publication-baseline path must pass `starting_point="published"` (loader) or `--starting-point published` (CLI). The output subdirectory naming also flips: canonical default now writes to `convergence/`; opt-in publication-baseline writes to `from-published/`. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

+        import sys as _sys
+        import tempfile as _tempfile
+
+        _sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+        from q2mm.diagnostics.systems import load_system
+
+        # ``compare_opt_rows.py`` exists to diff the published baseline FF
+        # against an optimizer's output, so we must load the literature OPT
+        # values verbatim — not the QFUERZA-derived default.
+        sd = load_system(args.system, starting_point="published")
+        published_path = Path(_tempfile.mkstemp(prefix=f"pub-{args.system}-", suffix=".fld")[1])
+        sd.forcefield.to_mm3_fld(str(published_path))


+def _category(row: Row) -> str:
+    return f"{row.kind}_eq" if row.kind == "bond" else f"{row.kind}_eq"
+


ericchansen and others added 2 commits May 28, 2026 12:40

Copilot AI review requested due to automatic review settings May 28, 2026 20:36

Copilot started reviewing on behalf of ericchansen May 28, 2026 20:36 View session

ericchansen mentioned this pull request May 28, 2026

feat(benchmarks)!: flip canonical subdirs to QFUERZA-start default ericchansen/q2mm-data#11

Open

Copilot AI reviewed May 28, 2026

View reviewed changes

Comment thread test/test_systems.py Outdated

Comment thread q2mm/diagnostics/systems.py

Comment thread q2mm/diagnostics/systems.py Outdated

Comment thread scripts/regenerate_convergence_results.py Outdated

ericchansen and others added 3 commits May 31, 2026 10:23

Copilot AI review requested due to automatic review settings May 31, 2026 19:35

Copilot started reviewing on behalf of ericchansen May 31, 2026 19:35 View session

Copilot AI reviewed May 31, 2026

View reviewed changes

Comment thread q2mm/models/forcefield.py Outdated

Comment thread .copilot/skills/q2mm-benchmark/SKILL.md Outdated

Comment thread .copilot/skills/q2mm-benchmark/SKILL.md

Comment thread .copilot/skills/q2mm-benchmark/SKILL.md Outdated

Comment thread docs/benchmarks/qfuerza-recovery.md Outdated

ericchansen and others added 2 commits May 31, 2026 14:43

Copilot AI review requested due to automatic review settings June 1, 2026 00:09

Copilot started reviewing on behalf of ericchansen June 1, 2026 00:09 View session

ericchansen changed the title ~~feat(systems): add starting_point=qfuerza + QFUERZA-recovery validation~~ feat(systems)!: default starting_point to qfuerza Jun 1, 2026

Copilot AI reviewed Jun 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(systems)!: default starting_point to qfuerza#290

feat(systems)!: default starting_point to qfuerza#290
ericchansen wants to merge 7 commits into
masterfrom
feat/qfuerza-from-scratch

ericchansen commented May 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		def _category(row: Row) -> str:
		return f"{row.kind}_eq" if row.kind == "bond" else f"{row.kind}_eq"

Conversation

ericchansen commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

What this PR changes

API / CLI

Guardrails

Documentation

Tests

Results

Breaking change

Related

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ericchansen commented May 28, 2026 •

edited

Loading