Skip to content

Fix/rfd3 duplicate cif atom ids#248

Open
DeltaDesign wants to merge 2 commits intoRosettaCommons:productionfrom
DeltaDesign:fix/rfd3-duplicate-cif-atom-ids
Open

Fix/rfd3 duplicate cif atom ids#248
DeltaDesign wants to merge 2 commits intoRosettaCommons:productionfrom
DeltaDesign:fix/rfd3-duplicate-cif-atom-ids

Conversation

@DeltaDesign
Copy link
Copy Markdown

Summary

Fixes #148.

What changed and why

PadTokensWithVirtualAtoms copies the central atom (CB) to create virtual atom positions, including CB's atom_id annotation. After cleanup removes the virtual atoms, sidechain atoms retain CB's duplicated atom_id. The CIF writer uses these values for _atom_site.id, producing non-conformant output with duplicate IDs.

_atom_site.id uniqueness is required by the mmCIF dictionary specification — downstream parsers (PyMOL, ChimeraX, Phenix) may silently drop or misassign atoms when IDs collide.

The fix strips atom_id before CIF serialization so the writer auto-generates unique sequential 1..N IDs. Two sites:

  • _cleanup_virtual_atoms_and_assign_atom_name_elements — strips after filtering virtual atoms (covers Python API consumers)
  • RFD3Output._strip_atom_id — strips in dump() before every to_cif_file() call (covers trajectories and cleanup_virtual_atoms=False path)

How tested

Two new tests in models/rfd3/tests/test_cif_duplicate_ids.py:

  • test_cleanup_strips_atom_id — verifies cleanup removes the duplicated annotation
  • test_cif_output_has_unique_ids — verifies CIF output has unique sequential _atom_site.id values

ruff check passes on all 3 changed files.

DeltaDesign and others added 2 commits March 23, 2026 23:15
…#148)

PadTokensWithVirtualAtoms copies CB's atom_id to virtual atom slots.
After cleanup removes the virtual atoms, sidechain atoms retain the
duplicated atom_id. The CIF writer uses these values for _atom_site.id,
producing non-conformant output with duplicate IDs.

Strip atom_id before CIF serialization so the writer auto-generates
unique sequential 1..N IDs. Two sites:

- _cleanup_virtual_atoms_and_assign_atom_name_elements: strips after
  filtering virtual atoms (covers Python API consumers)
- RFD3Output._strip_atom_id: strips in dump() before every
  to_cif_file() call (covers trajectories and cleanup_virtual_atoms=False)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFD3 output cif contains duplicates in id coulmn

1 participant