Fix ligand res_id offset to match AF3 convention by Ubiquinone-dot · Pull Request #261 · RosettaCommons/foundry

Ubiquinone-dot · 2026-04-05T04:20:25Z

Summary

Reset ligand res_id to start from 1 per chain using dense rank-based renumbering, matching AF3's output convention
Add validation error when a ligand is on chain A (the protein chain), with allow_ligand_on_chain_a override option
Includes changes from Fix HBond conflict and stabilize tempdir usage #231 (hbond_fix — partial ligand fix, hbond cleanup)

Before: RFD3 offset ligand res_id from protein max (e.g. res_id=51 for a 50-residue protein), causing (chain_id, res_id, atom_name) pairing to fail against AF3 predictions which always use res_id=1.

After: Ligand res_id values are renumbered sequentially starting from 1 per chain (e.g. two ligands on chain B → res_id=[1, 2]).

See docs/issues/rfd3_ligand_resid_offset.md in the pipelines repo for full writeup.

Test plan

Ran inference with M0255_1mg5 (two ligands: NAI, ACT) — output shows Chain B: res_id=[1, 2] (was [1, 51] before rank-based fix, [51, 52] with original offset)
Verify RMSD pairing no longer falls back to (res_name, atom_name) in pipeline runs

🤖 Generated with Claude Code

RFD3 was offsetting ligand res_id values from the protein max, causing (chain_id, res_id, atom_name) pairing to fail against AF3 predictions which always start ligand res_id at 1. Replace the offset with dense rank-based per-chain renumbering (1, 2, ...) and add a chain A validation with an override option (allow_ligand_on_chain_a). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR aligns RFD3’s ligand residue indexing with AF3 conventions by renumbering ligand res_id to start at 1 per ligand chain, adds validation around ligands appearing on the protein chain, and extends the pipeline/sampler to better support “partial ligand fix” behavior (keeping unfixed ligand atoms near their original coordinates).

Changes:

Renumber ligand res_id densely from 1 per ligand chain (instead of offsetting from protein max) and add an overrideable validation error for ligands on the protein chain.
Add an atom-level ligand mask feature (is_ligand_atom) and use it in the inference sampler to avoid noising/centering-induced ligand fragmentation for partial-ligand-fix cases.
Consolidate/continue HBond handling around HBPLUS and add a regression test for partial ligand fixed-atom selection.

Reviewed changes

Copilot reviewed 1 out of 1 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
models/rfd3/src/rfd3/inference/input_parsing.py	Implements ligand chain validation + dense per-chain ligand `res_id` renumbering; avoids zeroing unfixed ligand coords at init.
models/rfd3/src/rfd3/model/inference_sampler.py	Uses `is_ligand_atom` to suppress noise injection and exclude ligands from certain centering operations to maintain ligand connectivity.
models/rfd3/src/rfd3/transforms/design_transforms.py	Adds `is_ligand_atom` feature into `feats` for downstream sampler logic.
models/rfd3/src/rfd3/transforms/pipelines.py	Wires `is_ligand_atom` into the base pipeline feature list.
models/rfd3/src/rfd3/transforms/hbonds_hbplus.py	Adjusts HBPLUS tempdir handling and hb2 reading/parsing logic.
models/rfd3/src/rfd3/transforms/hbonds.py	Removes legacy biotite/hydride HBond implementation.
models/rfd3/src/rfd3/metrics/hbonds_metrics.py	Removes legacy HBond metrics implementation tied to the deleted transform.
models/rfd3/src/rfd3/inference/parsing.py	Removes an obsolete TODO related to partial ligand fixing.
models/rfd3/tests/test_ligand_partial_fix.py	Adds a unit test ensuring selected fixed ligand atoms are respected in parsing/build output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-05T04:26:35Z

models/rfd3/src/rfd3/transforms/hbonds_hbplus.py

hbonds is appended twice per HBPLUS line due to a duplicated items = {...}; hbonds.append(items) block. This will double-count hbonds and inflate donor/acceptor masks and logged statistics. Remove the duplicate append (keep a single items dict per parsed line).

Suggested change

Copilot · 2026-04-05T04:26:35Z

models/rfd3/src/rfd3/transforms/hbonds_hbplus.py

HB = open(hb2_path, "r").readlines() leaves the file handle unclosed. Use a context manager (with open(...) as f:) to avoid leaking descriptors (especially in long-running training/inference jobs).

Suggested change

with open(hb2_path, "r") as f:

HB = f.readlines()

Copilot · 2026-04-05T04:26:36Z

models/rfd3/src/rfd3/transforms/hbonds_hbplus.py

The previous implementation ran HBPLUS with cwd=tmpdir. Removing cwd can cause HBPLUS to emit its outputs (and/or auxiliary files) into the process working directory or to fail to find the expected .hb2 next to pdb_path, depending on HBPLUS behavior. Consider restoring cwd=tmpdir (or otherwise guaranteeing all HBPLUS outputs land in tmpdir).

Suggested change

stderr=subprocess.DEVNULL,

cwd=tmpdir,

Copilot · 2026-04-05T04:26:36Z

models/rfd3/src/rfd3/inference/input_parsing.py

+                    f"Ligand found on chain A, which is reserved for the protein. "
+                    f"Ligand chain(s): {ligand_chains.tolist()}. "
+                    f"Place ligands on separate chains (B, C, D, ...) or set "
+                    f"'allow_ligand_on_chain_a: true' to override this check."
+                )
+            # Reset ligand res_id to start from 1 per chain, matching the
+            # convention AF3 uses in its output CIF files.  Use dense
+            # rank-based renumbering so gaps in the original numbering
+            # (e.g. res_id 850, 900) become sequential (1, 2).


The ligand-on-chain check is hard-coded to chain ID "A", but in partial-diffusion mode the protein chain ID is taken from the input (start_chain = atom_array_in.chain_id[0]) and may not be A. This can raise (or miss) the error incorrectly. Consider checking whether the ligand shares a chain with any protein atoms in atom_array (e.g., np.intersect1d(np.unique(atom_array.chain_id[atom_array.is_protein]), ligand_chains)), rather than assuming the protein chain is always A.

Suggested change

f"Ligand found on chain A, which is reserved for the protein. "

f"Ligand chain(s): {ligand_chains.tolist()}. "

f"Place ligands on separate chains (B, C, D, ...) or set "

f"'allow_ligand_on_chain_a: true' to override this check."

)

# Reset ligand res_id to start from 1 per chain, matching the

# convention AF3 uses in its output CIF files. Use dense

# rank-based renumbering so gaps in the original numbering

# (e.g. res_id 850, 900) become sequential (1, 2).

# Error if any ligand shares a chain with protein atoms unless

# explicitly overridden — chain ID is leaked to the model so this

# is a significant difference from the expected convention.

ligand_chains = np.unique(ligand_array.chain_id)

protein_chains = np.unique(atom_array.chain_id[atom_array.is_protein])

overlapping_chains = np.intersect1d(protein_chains, ligand_chains)

if overlapping_chains.size > 0 and not self.allow_ligand_on_chain_a:

raise ValueError(

f"Ligand found on protein chain(s): {overlapping_chains.tolist()}. "

f"Protein chain(s): {protein_chains.tolist()}. "

f"Ligand chain(s): {ligand_chains.tolist()}. "

f"Place ligands on separate chains or set "

models/rfd3/src/rfd3/inference/input_parsing.py

Copilot · 2026-04-05T04:26:36Z

models/rfd3/src/rfd3/inference/input_parsing.py

This changes ligand res_id semantics (dense re-numbering per chain) and introduces a new validation error for ligands on the protein chain. Please add a unit test that asserts (1) ligands on the same chain get res_id=[1..N] even if the input res_id has gaps/offsets, and (2) the chain-conflict error (and allow_ligand_on_chain_a override) behaves as expected.

Rename allow_ligand_on_chain_a → allow_ligand_on_existing_chain and check against all chains already present in the built atom array, not just chain A. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When allow_ligand_on_existing_chain is False, raise an error if multiple ligand residues share the same chain. Reset res_id min to 1 per chain, preserving relative gaps when ligands share a chain (override mode). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The override path now matches the old behaviour (offset from protein max res_id). The default path (separate chains) sets each ligand chain's res_id to 1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 5, 2026 04:20

Copilot started reviewing on behalf of Ubiquinone-dot April 5, 2026 04:21 View session

Ubiquinone-dot force-pushed the fix/rfd3-ligand-resid-offset branch from acb5e17 to 94922e2 Compare April 5, 2026 04:26

Copilot AI reviewed Apr 5, 2026

View reviewed changes

Ubiquinone-dot and others added 4 commits April 4, 2026 21:29

Generalize chain validation to all existing chains

96b2e5f

Rename allow_ligand_on_chain_a → allow_ligand_on_existing_chain and check against all chains already present in the built atom array, not just chain A. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Note in errors that override restores old behaviour

860b8b8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Ubiquinone-dot merged commit b5395e4 into production Apr 5, 2026
3 checks passed

Ubiquinone-dot deleted the fix/rfd3-ligand-resid-offset branch April 5, 2026 04:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ligand res_id offset to match AF3 convention#261

Fix ligand res_id offset to match AF3 convention#261
Ubiquinone-dot merged 5 commits intoproductionfrom
fix/rfd3-ligand-resid-offset

Ubiquinone-dot commented Apr 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-                    f"Ligand found on chain A, which is reserved for the protein. "
-                    f"Ligand chain(s): {ligand_chains.tolist()}. "
-                    f"Place ligands on separate chains (B, C, D, ...) or set "
-                    f"'allow_ligand_on_chain_a: true' to override this check."
-                )
-            # Reset ligand res_id to start from 1 per chain, matching the
-            # convention AF3 uses in its output CIF files.  Use dense
-            # rank-based renumbering so gaps in the original numbering
-            # (e.g. res_id 850, 900) become sequential (1, 2).
+            # Error if any ligand shares a chain with protein atoms unless
+            # explicitly overridden — chain ID is leaked to the model so this
+            # is a significant difference from the expected convention.
+            ligand_chains = np.unique(ligand_array.chain_id)
+            protein_chains = np.unique(atom_array.chain_id[atom_array.is_protein])
+            overlapping_chains = np.intersect1d(protein_chains, ligand_chains)
+            if overlapping_chains.size > 0 and not self.allow_ligand_on_chain_a:
+                raise ValueError(
+                    f"Ligand found on protein chain(s): {overlapping_chains.tolist()}. "
+                    f"Protein chain(s): {protein_chains.tolist()}. "
+                    f"Ligand chain(s): {ligand_chains.tolist()}. "
+                    f"Place ligands on separate chains or set "

Conversation

Ubiquinone-dot commented Apr 5, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants