GitHub - hmblair/ciffy: Load .cif files in a jiffy

Overview

ciffy is a fast CIF file parser for molecular structures, with a C backend and Python interface. It supports both NumPy and PyTorch backends for array operations.

Performance

ciffy is 40-55x faster than BioPython and Biotite for parsing CIF files. Even loading via ciffy and converting to a biotite AtomArray is ~10x faster than biotite's own loader:

Structure	Atoms	ciffy	BioPython	Biotite	ciffy→biotite
3SKW	2,826	0.30 ms	16 ms (53x)	12 ms (39x)	1 ms (10x)
9GCM	4,466	0.45 ms	20 ms (45x)	17 ms (38x)	2 ms (10x)
9MDS	102,216	8.91 ms	494 ms (55x)	411 ms (46x)	41 ms (10x)

_{Run python tests/profiling/profile_io.py to reproduce.}

Installation

From PyPI

pip install ciffy

From Source

git clone https://github.com/hmblair/ciffy.git
cd ciffy
pip install -e .

Backends

ciffy supports two array backends:

NumPy: Lightweight, no additional dependencies required
PyTorch: For GPU support (CUDA/MPS) and integration with deep learning workflows

Specify the backend when loading structures:

import ciffy

# Load with NumPy backend (default)
polymer = ciffy.load("structure.cif", backend="numpy")

# Load with PyTorch backend
polymer = ciffy.load("structure.cif", backend="torch")

Convert between backends:

torch_polymer = polymer.torch()
numpy_polymer = polymer.numpy()

Move tensors to GPU (PyTorch only):

polymer_gpu = polymer.torch().to("cuda")
polymer_mps = polymer.torch().to("mps")

Loading Structures

import ciffy

# Load from CIF file
polymer = ciffy.load("structure.cif")

# Load specific chains
polymer = ciffy.load("structure.cif", chains=["A", "B"])

# Load specific molecule types
polymer = ciffy.load("structure.cif", molecule_types=ciffy.RNA)

# Print summary
print(polymer)

Example output:

Polymer 9GCM [2024-08-02]
─────────────────────────
   Type     Res  Atoms
─────────────────────────
A  RNA      135   1413
B  PROTEIN  132   1032
C  PROTEIN  246   1261
D  PROTEIN  485    760
─────────────────────────
Σ  4        998   4466
─────────────────────────

Working with Polymers

Properties

polymer.coordinates       # (N, 3) atom positions
polymer.atoms             # (N,) atom type indices
polymer.elements          # (N,) element indices
polymer.sequence          # (R,) residue type indices
polymer.bonds             # (B, 2) covalent bond pairs
polymer.molecule_types    # (C,) molecule type per chain
polymer.names             # Chain names ["A", "B", ...]
polymer.lengths           # (C,) residues per chain

polymer.size()                      # Total atoms
polymer.size(ciffy.RESIDUE)         # Total residues
polymer.size(ciffy.CHAIN)           # Total chains
polymer.sequence_str()              # "acgu..." sequence string

Selection

# Select by chain
chain_a = polymer.chain(0)
chains_ab = polymer.chain([0, 1])

# Select by residue
first_residue = polymer.residue(0)
some_residues = polymer.residue([0, 5, 10])

# Select by molecule type
rna_only = polymer.molecule_type(ciffy.RNA)
protein_only = polymer.molecule_type(ciffy.PROTEIN)

# Select by residue type
adenines = polymer.residue_type(ciffy.Residue.A)

# Structural selections
backbone = polymer.backbone()          # Backbone atoms only
bases = polymer.nucleobase()           # Nucleobase atoms (RNA/DNA)
sidechains = polymer.sidechain()       # Sidechain atoms
heavy = polymer.heavy()                # Heavy atoms (no hydrogens)

# Remove unresolved residues
resolved = polymer.strip()

Iteration

# Iterate over all chains
for chain in polymer.chains():
    print(chain.sequence_str())

# Iterate over RNA chains only
for chain in polymer.molecule_type(ciffy.RNA).chains():
    print(chain.pdb_id, chain.sequence_str())

Hierarchy Operations

# Counts at different scales
atoms_per_residue = polymer.counts(ciffy.RESIDUE)    # (R,)
residues_per_chain = polymer.counts(ciffy.CHAIN)     # (C,)

# Membership indices
chain_per_atom = polymer.membership(ciffy.CHAIN)     # (N,) chain index per atom
residue_per_atom = polymer.membership(ciffy.RESIDUE) # (N,) residue index per atom

# Reduce atom features to residue level (mean pooling)
residue_coords = polymer.reduce(polymer.coordinates, ciffy.RESIDUE)  # (R, 3)

# Expand residue features to atom level
atom_features = polymer.expand(residue_features, ciffy.RESIDUE)  # (N, ...)

Geometry

# Center coordinates
centered, centroids = polymer.center(ciffy.MOLECULE)
centered, centroids = polymer.center(ciffy.CHAIN)

# PCA alignment
aligned, rotations = polymer.pca(ciffy.CHAIN)

# Pairwise distances
distances = polymer.pairwise_distances()                    # Atom-atom
distances = polymer.pairwise_distances(ciffy.RESIDUE)       # Residue centroids

# K-nearest neighbors
neighbors = polymer.knn(k=16)

Saving

polymer.write("output.cif")

Building Polymers

From Sequence

Create template polymers (no coordinates) from sequence strings:

import ciffy

# RNA (lowercase with u)
rna = ciffy.template("acguacgu")

# DNA (lowercase with t)
dna = ciffy.template("acgtacgt")

# Protein (uppercase)
protein = ciffy.template("MGKLF")

# Multi-chain
multi = ciffy.template(["acgu", "MGKLF"])

Building Chains Residue-by-Residue

Build polymers incrementally using append():

from ciffy import Polymer, Residue

# Build a template (no coordinates)
p = Polymer()
for res in [Residue.A, Residue.C, Residue.G, Residue.U]:
    p = p.append(res)

# Build with coordinates
p = Polymer()
p = p.append(Residue.A, coords)  # First residue with absolute coords

For autoregressive generation with relative positioning:

from ciffy import Polymer, Residue
from ciffy.geometry import LocalCoordinates

p = Polymer()
p = p.append(Residue.A, first_coords)  # Absolute coordinates

# Subsequent residues use LocalCoordinates(coords, transform)
# where transform is an SE(3) transform [axis-angle (3), translation (3)]
p = p.append(Residue.C, LocalCoordinates(coords, transform))
p = p.append(Residue.G, LocalCoordinates(coords, transform))

Structural Metrics

import ciffy

# RMSD (Kabsch-aligned)
rmsd = ciffy.rmsd(polymer1, polymer2)
rmsd = ciffy.rmsd(polymer1, polymer2, scale=ciffy.CHAIN)  # Per-chain

# Also works on raw coordinates
rmsd = ciffy.rmsd(coords1, coords2)

# TM-score
tm = ciffy.tm_score(pred, ref)

# lDDT
lddt = ciffy.lddt(pred, ref)

# Radius of gyration
rg = ciffy.rg(polymer)

# Clash detection
clashes = ciffy.clashes(polymer)

Command Line Interface

# View structure summary
ciffy info structure.cif

# Show sequences
ciffy info structure.cif --sequence

# Show entity descriptions
ciffy info structure.cif --desc

# Multiple files
ciffy info *.cif

Testing

pytest tests/

Contributing

See CONTRIBUTING.md for development setup and guidelines.

Name		Name	Last commit message	Last commit date
Latest commit History 863 Commits
.github/workflows		.github/workflows
archive		archive
ciffy		ciffy
codegen		codegen
docs		docs
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
README.md		README.md
TODO.md		TODO.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Performance

Installation

From PyPI

From Source

Backends

Loading Structures

Working with Polymers

Properties

Selection

Iteration

Hierarchy Operations

Geometry

Saving

Building Polymers

From Sequence

Building Chains Residue-by-Residue

Structural Metrics

Command Line Interface

Testing

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

Performance

Installation

From PyPI

From Source

Backends

Loading Structures

Working with Polymers

Properties

Selection

Iteration

Hierarchy Operations

Geometry

Saving

Building Polymers

From Sequence

Building Chains Residue-by-Residue

Structural Metrics

Command Line Interface

Testing

Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages