A lightweight mmCIF (Macromolecular Crystallographic Information File) parser built from scratch in Python, with no external dependencies. Designed as both an educational resource and a practical tool for structural bioinformatics.
See Features.md for a detailed feature list.
- No external dependencies (no Biopython required)
- Clean, object-oriented Python code
- Extracts atomic-level information from CIF files
- Supports .cif files from RCSB PDB
- Useful for understanding how structural biology file formats work internally
Define a low-level binary data structure for storing mmCIF entries and sections efficiently in memory, simulating a compiled or serialized format.
Each mmCIF data item (atom site, loop header, value) is represented in the following binary format:
| Segment | Description | Example |
|---|---|---|
| RECORD_TYPE (1B) | 0x01 = header, 0x02 = loop, 0x03 = data value | 0x02 |
| RECORD_ID (2B) | Unique ID for the entry (short int) | 0x00FA |
| FIELD_NAME_LENGTH (1B) | Length of the field name | 0x07 |
| FIELD_NAME | UTF-8 string of field name | _atom_site |
| VALUE_LENGTH (1B) | Length of the value | 0x05 |
| VALUE | UTF-8 string of the value | C1' |
Each mmCIF block is a binary sequence of multiple record units. A block header can optionally store metadata such as loop count, atom count, etc.
- Enables serialization and memory-efficient storage
- Fast search and indexing for compiled applications
- Suitable for integration with compiled languages (C, Rust)
- Can be exported as
.binfor direct loading into visualization tools or bioinformatics engines
mmCIF_Parser_Project/
├── main.py # Run this to see the parser in action
├── mmcif_parser.py # Core parsing logic
├── data_structures.py # Binary data structure model
├── enzyme_parser.py # Enzyme-specific parsing utilities
├── example.cif # Sample mmCIF file
├── data/ # Additional CIF files
└── README.md
python main.py data/example.cifpython test_mmcif_parser.pyAdditional analyses demonstrating real-world applications of mmCIF parsing:
See LICENSE for details.