Skip to content

feat(systems)!: default starting_point to qfuerza#290

Open
ericchansen wants to merge 7 commits into
masterfrom
feat/qfuerza-from-scratch
Open

feat(systems)!: default starting_point to qfuerza#290
ericchansen wants to merge 7 commits into
masterfrom
feat/qfuerza-from-scratch

Conversation

@ericchansen
Copy link
Copy Markdown
Owner

@ericchansen ericchansen commented May 28, 2026

TL;DR

Make QFUERZA the canonical default starting point for TS-system optimizations. starting_point="qfuerza" is now the default on load_system(); --starting-point qfuerza is the default on regenerate_convergence_results.py. Output subdir naming follows: convergence/ for the canonical default, from-published/ for the opt-in publication-baseline path.

A .fld skeleton (atom types, OPT-row topology, frozen/active partition, vdW, stretch-bend) must still be provided — those are chemistry decisions a tool cannot automate. QFUERZA fills in 100% of the bond/angle scalars it is defined to estimate, given that skeleton. For our 5 TS systems we use the literature .fld files (Donoghue 2008, Rosales 2020, Wahlers 2022) as the skeleton.

What this PR changes

API / CLI

  • q2mm.diagnostics.systems.load_system(starting_point=…): default flipped from "published" to "qfuerza". Accepts "published" to retain the literature OPT values verbatim (publication-baseline reproduction).
  • scripts/regenerate_convergence_results.py --starting-point: default flipped to "qfuerza". Output subdir is convergence/ for the canonical default, from-published/ for --starting-point published.
  • q2mm-benchmark user CLI: new --starting-point flag for symmetry (default "qfuerza").

Guardrails

  • ForceField.get_fractional_bounds(): builds sign-aware [v − f·|v|, v + f·|v|] bounds intersected with DEFAULT_BOUNDS. Documented degenerate-bounds guard.
  • scripts/regenerate_convergence_results.py: --fc-fraction / --eq-fraction / --ftol / --ratio-tol flags. Batch-level ERROR + non-zero exit when no system shows real progress.
  • q2mm/optimizers/scipy_opt.py::_run_minimize: WARNING when n_iterations<=2 and real-OF delta < 1%.

Documentation

  • docs/benchmarks/qfuerza-recovery.md (454 lines) — QFUERZA-recovery validation. New framing: "QFUERZA workflow (canonical default)" with explicit skeleton-vs-scalars decomposition; per-system R²/RMSD vs paper, q2mm-from-published, q2mm-from-QFUERZA; per-parameter abs deviation tables; physical-chemistry walkthrough of top deviations; "Known limitations" section listing the 3/5 systems that diverge from QFUERZA default and the workarounds.
  • AGENTS.md: new "Starting Point" subsection under §6; §11 pre-flight checklist updated to flip default framing.
  • .copilot/skills/q2mm-benchmark/SKILL.md + q2mm-analysis-design/SKILL.md: process skills that walk through the canonical workflow before launching any batch >30 min or writing any comparison doc.

Tests

  • test/test_systems.py::TestStartingPoint (11 tests, all passing): asserts default is "qfuerza"; pinning "published" yields zero QFUERZA overwrite; QFUERZA touches OPT scalars only (frozen MM3 backbone bit-identical); reference data is independent of starting_point; audit classification is consistent with _PARAM_SLOTS.
  • test/test_models.py::TestGetFractionalBounds: verifies sign-aware fractional-bound construction including the degenerate-bounds guard.
  • test/integration/test_heck_validation.py: pinned to starting_point="published" (the test asserts bit-identical literature OPT values).

Results

Per-system QFUERZA-from-canonical-default convergence (full discussion in docs/benchmarks/qfuerza-recovery.md):

System Pub. final OF QFUERZA final OF Ratio Verdict
rh-enamide 2.70 × 10⁵ 2.94 × 10⁵ 1.09× ✅ same basin
pd-allyl 7.99 × 10⁶ 6.14 × 10⁶ 0.77× 🌟 lower (engine-vs-MacroModel artifact, not chemical win)
pd-conjugate 7.24 × 10⁶ 6.22 × 10⁶ 0.86× 🌟 lower (engine artifact)
rh-conjugate 5.10 × 10⁶ 7.67 × 10⁶ 1.50× ⚠ nearby basin; negative fc (unphysical, see §3.4)
heck-relay 1.45 × 10⁶ 1.32 × 10⁸ 91× ❌ JaxLoss diverged; L-BFGS-B exits in 0 iters

Known limitations are documented honestly in the new "Known limitations" section of qfuerza-recovery.md with concrete workarounds (--fc-fraction 0.05 or --starting-point published). Follow-up work tracked: positive-fc sign constraint in the optimizer, q2mm/MacroModel MM3 backend parity audit, heck-relay JaxLoss pre-conditioning.

Breaking change

starting_point now defaults to "qfuerza" on load_system() and --starting-point qfuerza on regenerate_convergence_results.py. Callers that depended on the publication-baseline path must pass starting_point="published" (loader) or --starting-point published (CLI). The output subdirectory naming also flips: canonical default now writes to convergence/; opt-in publication-baseline writes to from-published/.

Related

ericchansen and others added 2 commits May 28, 2026 12:40
Adds a starting_point parameter to load_system that, when set to
"qfuerza", overwrites the published OPT bond/angle scalars with
QFUERZA (Farrugia 2025) Hessian-derived values while leaving the
published OPT topology, frozen MM3 backbone, vdW, stretch-bend,
Urey-Bradley, and torsion rows untouched. Torsions are zeroed by
qfuerza_into per Farrugia 2025; for systems where published torsions
are already zero (e.g. rh-enamide Donoghue 2008), this is a no-op.

The default "published" path is unchanged — existing baselines and
behavior are preserved.

A per-param-type audit is attached to SystemData.metadata under
starting_point_audit, classifying every scalar as qfuerza_overwritten,
retained_published, or frozen. This makes leakage (vdW, unmatched
bond/angle rows, etc.) honest and visible rather than hidden.

The qfuerza_fresh strategy (CH3F) treats "qfuerza" as a no-op since
its starting FF is already QFUERZA-derived.

scripts/regenerate_convergence_results.py grows a --starting-point
{published,qfuerza} flag; output subdirectory becomes from-qfuerza/
instead of convergence/ when qfuerza is selected, preserving baseline
artifacts.

Anchors: Farrugia 2025 (DOI 10.1021/acs.jctc.5c01751), AGENTS.md §6.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Documents results of starting q2mm optimization from QFUERZA Hessian-derived
bond/angle values (instead of published OPT values) on all 5 TS systems.

Headline result: mixed. rh-enamide and pd-allyl converge to the same basin
as published-start runs. pd-conjugate, rh-conjugate, and heck-relay land in
different (worse) basins, with heck-relay failing entirely due to JaxLoss
surrogate divergence at the poor starting FF.

The page is explicit that this is NOT a from-scratch FF generation:
QFUERZA only overwrites bond/angle scalars on top of the published OPT
topology, frozen/active partition, vdW, SB, and atom-type rows. Per-row
audit numbers are reported per system.

Data lives in ericchansen/q2mm-data/benchmarks/<system>/from-qfuerza/.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new starting_point="qfuerza" option to the benchmark system loader so optimizations can start from QFUERZA Hessian-derived bond/angle values (while preserving published topology/frozen partition), and wires this through the convergence regeneration script, tests, and documentation.

Changes:

  • Extend load_system() with a starting_point parameter plus a per-parameter “starting point audit” recorded in system metadata.
  • Add --starting-point {published,qfuerza} to scripts/regenerate_convergence_results.py, including provenance/output subdirectory routing.
  • Add regression tests and a new benchmark documentation page for QFUERZA-recovery runs.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
q2mm/diagnostics/systems.py Adds starting_point support, QFUERZA overwrite hook, and audit metadata for scalar provenance.
scripts/regenerate_convergence_results.py Adds CLI flag, provenance fields, logging, and output subdir selection for starting point runs.
test/test_systems.py Adds TestStartingPoint coverage for backward-compatibility, overwrite behavior, and audit consistency.
docs/benchmarks/qfuerza-recovery.md Documents methodology and results for QFUERZA-recovery validation across TS systems.
properdocs.yml Adds navigation entry for the new benchmark page.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/test_systems.py Outdated
Comment thread q2mm/diagnostics/systems.py
Comment thread q2mm/diagnostics/systems.py Outdated
Comment thread scripts/regenerate_convergence_results.py Outdated
ericchansen and others added 3 commits May 31, 2026 10:23
PR #290 first iteration shipped 5 systems with a silent-failure
optimizer protocol: L-BFGS-B exited at n_iterations<=2 with
negligible OF change for every system, scipy reported success=True,
and the misleading results made it into the comparison doc. This
commit adds the source-code, CLI, agent-skill and documentation
guardrails that prevent that pattern from recurring.

Source code
- q2mm/models/forcefield.py: add ForceField.get_fractional_bounds
  (fc_fraction, eq_fraction) returning a (val +/- frac*|val|) box
  per parameter, intersected with DEFAULT_BOUNDS sanity envelope.
  Sign-aware for TSFF negative force constants; falls back to sanity
  bounds when |val| < 1e-6 (frozen-at-zero torsions).
- q2mm/optimizers/scipy_opt.py: ScipyOptimizer accepts fc_fraction
  and eq_fraction; uses them for the bounds list when set. Adds a
  WARNING in _run_minimize when n_iterations<=2 and |delta|/init<1%
  — the silent-failure fingerprint.
- scripts/regenerate_convergence_results.py: add --ftol,
  --fc-fraction, --eq-fraction CLI flags (defaults preserve
  backward-compatible behavior for the existing convergence/
  baselines). Emit batch-level ERROR + non-zero exit when every
  optimized system fails the no-progress check.

Tests
- test/test_models.py: 5 new tests covering fractional bounds for
  positive/negative FCs, sanity-envelope intersection, zero-value
  fallback, and the None-passthrough.

Agent skills
- .copilot/skills/q2mm-benchmark/SKILL.md: forces the agent through
  the audit gate (run FIRST system, inspect n_iterations &
  improvement_pct, STOP if either fails) before launching any q2mm
  batch >30 min.
- .copilot/skills/q2mm-analysis-design/SKILL.md: forces the agent
  to restate the user's question and mock comparison tables BEFORE
  writing a benchmark analysis doc.

Documentation
- AGENTS.md: new section 11 "Benchmark Pre-Flight Checklist".
  Three new Common Pitfalls rows (n_iterations<=2 silent exit,
  default sanity bounds for from-poor-start runs, batch never
  optimized).

Recommended invocation for from-QFUERZA runs:
  python scripts/regenerate_convergence_results.py \
    --starting-point qfuerza --ratio-tol none \
    --ftol 1e-12 --fc-fraction 0.20 --eq-fraction 0.05

Heck-relay specifically: tighter bounds (--fc-fraction 0.05 per
AGENTS.md memory) due to the fragile TS landscape with large
negative force constants.

Refs: #290 (iterating)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the headline-objective-only results table with the analysis the
user actually asked for: published-paper vs q2mm@published vs q2mm@QFUERZA
R²/RMSD per system, plus per-parameter motif-grouped comparison tables
and a physical-chemistry walkthrough.

Re-ran all 5 systems with the Phase 0 protocol fix (ftol=1e-12,
fractional bounds: fc_fraction=0.20, eq_fraction=0.05, heck-relay
overridden to fc_fraction=0.05). Real OF improvement now achieved on
4/5 systems (was 1/5 with the previous loose-factr protocol).

Key new findings documented:

- pd-allyl/pd-conjugate: q2mm objective at QFUERZA-optimized is LOWER
  than at published-optimized. R²/RMSD tables explain why: q2mm's
  MM3 implementation lacks MacroModel/MM3* cross-terms, so the FF that
  fits q2mm best is NOT the one that fit MacroModel best. Backend
  parity is the limiting factor for parameter-recovery validation.
- rh-conjugate & heck-relay: optimizer produces NEGATIVE force
  constants. Optimizer bounds enforce magnitude but not sign — a
  ±20% bound around a near-zero positive fc can cross sign boundary.
  Documented as a known issue requiring a positive-fc constraint.
- heck-relay: JaxLoss surrogate explodes at QFUERZA start due to
  bond R² = −6228; L-BFGS-B exits in 0 iterations. Documented as
  the same JaxLoss-fragility pattern from AGENTS.md §9.

New analysis scripts:
- scripts/compare_opt_rows.py — loader-bypassing per-param diff that
  auto-roundtrips published FF through ForceField.to_mm3_fld() so atom
  tokens and row ordering match the optimizer-saved file
- scripts/build_qfuerza_recovery_tables.py — cross-system R²/RMSD
  rollup builder for the recovery doc
- scripts/compare_qfuerza_to_published.py — earlier loader-based diff,
  superseded by compare_opt_rows.py but kept for the FF-object motif
  breakdown path

Drive-by: fix Wahlers 2021 DOI in validation/published_ffs/README.md
(was pointing to 10.1021/acs.joc.0c02918; correct is .1c00136).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Five fixes from copilot-pull-request-reviewer on the QFUERZA-recovery
PR plus the test-core (3.10) CI failure:

1. test/test_systems.py — mark
   `test_qfuerza_is_noop_for_qfuerza_fresh_strategy` with
   `@pytest.mark.jax` so the core suite (which excludes JAX) skips it
   rather than failing the unconditional `from JaxEngine` import. Fixes
   the test-core CI failure introduced when JaxEngine was added to the
   test in PR #290.

2. test/test_systems.py — use `np.testing.assert_array_equal` for the
   frozen-scalar bit-identical check; the previous `np.allclose(...,
   atol=0.0)` still applied the default `rtol=1e-05`, so the test would
   have admitted a relative drift of up to 0.001%. The intent is true
   bit-identity for backbone params; the new assertion enforces it.

3. q2mm/diagnostics/systems.py — `load_system` now raises
   `ValueError` when `starting_point` is anything other than
   `"published"` or `"qfuerza"`. The `StartingPoint = Literal[...]`
   annotation is a static hint only; without a runtime guard a typo
   like `"qferza"` silently went through the `published` branch and
   produced misleading results.

4. q2mm/diagnostics/systems.py — refactor `_audit_starting_point` to
   derive per-scalar type labels from `ForceField._PARAM_SLOTS` (via a
   new `_build_param_type_labels` helper) instead of hand-rolling the
   collection/attr enumeration. Any future change to the parameter
   vector layout is now reflected automatically; unknown attrs raise
   `KeyError` rather than silently producing wrong labels. The
   Urey-Bradley tail still has to be appended explicitly because
   `_ub_angles` is a property, not in `_PARAM_SLOTS`.

5. scripts/regenerate_convergence_results.py — reflow the docstring so
   the `` `--starting-point published` `` and `` `--starting-point
   qfuerza` `` inline-code spans no longer break across newlines.

Plus a new `test_unknown_starting_point_raises` test for #3.

Validation:
- `ruff check q2mm/ test/ scripts/` — clean
- `ruff format --check q2mm test scripts examples` — clean
- `pytest test/test_systems.py -x -q -m "not (openmm or tinker or jax or jax_md or psi4)"`
  — 10 passed, 1 deselected (the jax-marked test)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 31, 2026 19:35
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Comment thread q2mm/models/forcefield.py Outdated
Comment thread .copilot/skills/q2mm-benchmark/SKILL.md Outdated
Comment thread .copilot/skills/q2mm-benchmark/SKILL.md
Comment thread .copilot/skills/q2mm-benchmark/SKILL.md Outdated
Comment thread docs/benchmarks/qfuerza-recovery.md Outdated
ericchansen and others added 2 commits May 31, 2026 14:43
Five additional findings from copilot-pull-request-reviewer on the
guardrails commit e81cf00.

1. q2mm/models/forcefield.py — guard `get_fractional_bounds()` against
   degenerate `lo >= hi` returns when the current param value lies
   outside the `DEFAULT_BOUNDS` sanity envelope. The naive
   `max(sanity_lo, val − f·|val|), min(sanity_hi, val + f·|val|)`
   formula can flip `lo > hi` for vdw_epsilon/ub_k/bond_k values
   outside the envelope (SciPy would then reject the bounds). Fall
   back to sanity bounds when the intersection is empty so L-BFGS-B
   can pull the value back into a physical region.

   New test `test_fractional_bounds_value_outside_sanity_falls_back`
   covers the bond_k = 5000 case (sanity envelope ±3600).

2-4. .copilot/skills/q2mm-benchmark/SKILL.md — the skill referenced
   CLI flags `--bounds-fraction-fc/--bounds-fraction-eq/--factr`, but
   the script added in this PR exposes `--fc-fraction/--eq-fraction/--ftol`.
   Following the skill would have failed with "unrecognized arguments"
   for any future agent. Updated all four occurrences (Step 2 prose,
   pre-flight checklist, Step 4 example command, quick-reference Q&A)
   to match the real CLI.

5. docs/benchmarks/qfuerza-recovery.md — the fractional-bounds formula
   was shown as `fc ∈ [fc₀·(1−f), fc₀·(1+f)]`, which is only correct
   for positive `fc₀`. The implementation is sign-aware (`val ± f·|val|`),
   which matters for TSFF negative force constants. Updated the doc to
   describe the actual sign-aware bound.

Validation:
- `ruff check q2mm/ test/ scripts/` — clean
- `ruff format --check q2mm test scripts examples` — clean
- `pytest test/test_models.py -k fractional_bounds` — 6/6 passed

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Flip the canonical default for `load_system()` and
`regenerate_convergence_results.py --starting-point` from `"published"`
to `"qfuerza"`, and rename the canonical output subdirectory
accordingly (`convergence/` now holds QFUERZA-start runs;
`from-published/` holds the opt-in publication-baseline runs).

Reframe the QFUERZA-recovery doc, AGENTS.md §6 / §11, and the
q2mm-benchmark / q2mm-analysis-design skills around the
QFUERZA-as-defined story: QFUERZA fills in 100% of the bond/angle
scalars it is defined to estimate, given a force-field skeleton
(atom types, OPT topology, frozen/active partition, vdW/SB defaults)
that the chemist provides via a `.fld` file. This is true for every
QFUERZA workflow on every system — there is no `.fld`-free path
because those decisions are chemistry calls a tool can't make.

Document known limitations honestly: 3 of the 5 TS systems
(rh-conjugate, pd-conjugate, heck-relay) don't converge cleanly from
QFUERZA with default bounds yet; the doc lists the per-system
workarounds (`--fc-fraction 0.05` or `--starting-point published`).

Also:
- Add a `--starting-point` flag to `q2mm-benchmark` CLI so the user
  CLI exposes the same lever as the regeneration script.
- Pin `scripts/compare_opt_rows.py` to `starting_point="published"`
  (the script's whole purpose is to diff the publication baseline
  against an optimizer's output, so it must load literature values).
- Pin `test/integration/test_heck_validation.py` to
  `starting_point="published"` (the test asserts bit-identical OPT
  values against the literature `.fld`).
- Replace `test_default_starting_point_is_published` with
  `test_default_starting_point_is_qfuerza`; add a separate test that
  pins `starting_point="published"` and verifies zero QFUERZA
  overwrite.

BREAKING CHANGE: `starting_point` now defaults to `"qfuerza"` on
`load_system()` and `--starting-point qfuerza` on
`regenerate_convergence_results.py`. Callers that depended on the
publication-baseline path must pass `starting_point="published"`
(loader) or `--starting-point published` (CLI). The output
subdirectory naming also flips: canonical default now writes to
`convergence/`; opt-in publication-baseline writes to
`from-published/`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 1, 2026 00:09
@ericchansen ericchansen changed the title feat(systems): add starting_point=qfuerza + QFUERZA-recovery validation feat(systems)!: default starting_point to qfuerza Jun 1, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Comment on lines +252 to +263
import sys as _sys
import tempfile as _tempfile

_sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from q2mm.diagnostics.systems import load_system

# ``compare_opt_rows.py`` exists to diff the published baseline FF
# against an optimizer's output, so we must load the literature OPT
# values verbatim — not the QFUERZA-derived default.
sd = load_system(args.system, starting_point="published")
published_path = Path(_tempfile.mkstemp(prefix=f"pub-{args.system}-", suffix=".fld")[1])
sd.forcefield.to_mm3_fld(str(published_path))
Comment on lines +106 to +108
def _category(row: Row) -> str:
return f"{row.kind}_eq" if row.kind == "bond" else f"{row.kind}_eq"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants