mmCIF Validator Testing Suite

This directory contains the regression test suite for the mmCIF validator: test CIF files, the runner script, and generated output for comparison.

Directory layout

testing/
├── README.md                    # This file
├── run_validation_suite.py      # Run validator on all CIFs, save/compare output
├── validation_baseline.txt       # Saved reference output (generate with --generate-baseline)
├── validation_output.txt        # Latest run output (compare to baseline)
└── cif_files/                   # All test .cif files
    ├── 6ijw.cif ... 8q6j.cif    # Real PDB entries (method + metadata completeness)
    └── test_*.cif               # Synthetic tests (validation cases)

How to use the suite

Run from the repository root:

# Run validation on all CIFs; write results to testing/validation_output.txt
python testing/run_validation_suite.py

# Generate or refresh the baseline (do this once before code changes, or to accept new behaviour)
python testing/run_validation_suite.py --generate-baseline

Or from the testing/ directory:

cd testing
python run_validation_suite.py
python run_validation_suite.py --generate-baseline

Regression workflow

Before changing validator code: Run with --generate-baseline to create validation_baseline.txt.
Make your code changes.
After changes: Run without --generate-baseline (writes to validation_output.txt).
Compare: Diff the two files to see what changed.
- Windows: fc testing\validation_baseline.txt testing\validation_output.txt
- Linux/macOS: diff testing/validation_baseline.txt testing/validation_output.txt
Review the diff: new or removed errors may be expected (e.g. fixes) or regressions.

Custom paths

python testing/run_validation_suite.py --dict path/to/mmcif.dic --tests path/to/cif_folder
python testing/run_validation_suite.py --generate-baseline -o path/to/my_baseline.txt

By default the suite uses the PDBx/mmCIF dictionary from http://mmcif.pdb.org/dictionaries/ascii/mmcif_pdbx.dic; you can override this with --dict.

Logging

The suite logs to stderr so you can redirect or diff the main output file without mixing in log lines. It logs:

Dictionary source, test directory, and output file path
Number of CIF files to process
With --verbose / -v: each file as it is validated and its exit code (passed/failed)
Final line: where output was written and how many files had validation issues

Example with verbose:

python testing/run_validation_suite.py -v

Test files overview

Real PDB entries: metadata completeness and method detection

These files are full (or substantial) PDB mmCIF entries. They are used to check:

Metadata completeness: The validator’s JSON output includes a metadata_completeness object (percentage, filled/total counts, missing categories, missing items). These files exercise that logic with realistic method-specific mandatory categories.
Method recognition: The validator infers experimental method from which categories are present in the file (method-specific mandatory category lists). The reported method_detected in the JSON should match the method implied by the entry.

File	_exptl.method	Expected method_detected	Purpose
6ijw.cif	SOLUTION NMR	`nmr`	NMR method + completeness
6qvt.cif	X-RAY DIFFRACTION	`xray`	X-ray method + completeness
6ssp.cif	X-RAY DIFFRACTION	`xray`	X-ray method + completeness
7q5a.cif	ELECTRON MICROSCOPY	`em`	EM method + completeness
8ozl.cif	ELECTRON MICROSCOPY	`em`	EM method + completeness
8pps.cif	X-RAY DIFFRACTION	`xray`	X-ray method + completeness
8pwh.cif	ELECTRON MICROSCOPY	`em`	EM method + completeness
8q6j.cif	ELECTRON MICROSCOPY	`em`	EM method + completeness

Method detection is based on which categories exist in the file (from the completeness lists), not on the literal value of _exptl.method; the table above documents how these entries are expected to be classified.

Synthetic test files: validation cases

Each test_*.cif file is a small CIF chosen to trigger one or more specific validator behaviours (errors or warnings). Use these to confirm that the validator reports the right issue for each scenario.

File	Case(s) covered
test_duplicate_item.cif	Same item appears twice in one block (e.g. `_entry.id` twice). Expect: duplicate item error.
test_duplicate_category.cif	Same category given in two separate blocks (e.g. two `entity` blocks). Expect: duplicate category error.
test_format_error_entity_poly.cif	Malformed `entity_poly`: loop with one data row followed by key–value pairs of the same category. Expect: duplicate category or format error once the parser records the loop block.
test_loop_row_mismatch.cif	Loop with wrong number of values in a row (e.g. two columns, second row has one value). Exercises loop parsing and may surface row-length or parsing errors.
test_multiple_data_blocks.cif	File contains two `data_` blocks. Expect: only the first block is validated (parser stops at second `data_`).
test_value_out_of_range.cif	Item with type `positive_int` (e.g. `_em_image_scans.dimension_height`) set to `0`. Expect: type/range error.
test_type_checks_pdb_id_and_date.cif	Invalid `pdb_id`-like value and invalid date format. Expect: type errors for the offending values.
test_enum_invalid_em_software.cif	`_em_software.name` value not in the dictionary enumeration (e.g. `phaser_voyager.em_placement`). Expect: enumeration error (once `_pdbx_item_enumeration` is parsed).
test_asym_id_valid_invalid.cif	`_atom_site.label_asym_id` / `auth_asym_id` with valid (e.g. `A`) and invalid (e.g. `B:Axp`) values. Expect: asym_id format errors for the invalid values when enforced.
test_mandatory_missing_item.cif	Category present but a mandatory item missing (e.g. `entity` without `_entity.id`). Expect: missing mandatory item error.
test_fk_missing_parent.cif	Child references non-existent parent (e.g. `atom_site.label_asym_id` = `Z` with no `struct_asym.id` = `Z`). Expect: foreign-key / parent-missing error.
test_composite_fk_mismatch.cif	Rows in `atom_site` that may violate composite key or parent–child consistency (e.g. label_asym_id + label_comp_id + label_seq_id). Exercises composite-FK logic.
test_undefined_items.cif	Item names not in the dictionary (e.g. `_my_local_category.foo`, `_not_defined_item`). Expect: undefined-item warnings/errors as implemented.
test_advisory_range_warning.cif	Value outside advisory (e.g. `_exptl_crystal.density_Matthews` = 10.0 vs recommended range). Expect: advisory-range warning, not hard error.
test_multiline_and_quoted_values.cif	Loop containing multi-line text (semicolon-delimited) and quoted values with spaces. Exercises parsing of multi-line and quoted loop values.
test_cross_check_dictionary_enum.cif	Cross-item dictionary enumeration compatibility check (e.g. `_diffrn_detector.type` incompatible with `_diffrn_detector.detector`). Expect: cross-check error from dictionary detail mapping.
test_cross_check_conditional_refine_mr_starting_model_skipped_when_initial_refinement_present.cif	Conditional required: `_refine.pdbx_method_to_determine_struct` is molecular replacement and `_pdbx_initial_refinement_model` is present; `_refine.pdbx_starting_model` is `?`. Expect: no cross-check error for `pdbx_starting_model` (superseded by `pdbx_initial_refinement_model`); file is otherwise minimal so validation can pass end-to-end.
test_cross_check_conditional_refine_mr_starting_model_required_without_initial_refinement.cif	Conditional required: molecular replacement without `pdbx_initial_refinement_model`. Expect: cross-check error requiring `pdbx_starting_model` when it is missing.
test_cross_check_date_order_invalid_coords_before_deposition.cif	Pairwise date order: `_pdbx_database_status.recvd_initial_deposition_date` must not be after `date_coordinates`. Expect: cross-check error.
test_cross_check_date_order_valid_deposition_coords.cif	Positive case: initial deposition on or before coordinates date. Expect: no date-order error from this rule pair.
test_cross_check_date_order_valid_same_day.cif	Edge case: same calendar day for both dates (`<=`). Expect: no date-order error.
test_cross_check_date_order_edge_missing_coords.cif	Edge case: `date_coordinates` missing (`?`). Expect: no date-order error when the secondary date is absent.
test_cross_check_date_order_invalid_begin_after_end.cif	Date order: `date_begin_deposition` must not be after `date_end_processing`. Expect: cross-check error.
test_cross_check_date_order_invalid_form_after_initial.cif	Date order: `date_deposition_form` must not be after `recvd_initial_deposition_date`. Expect: cross-check error.
test_cross_check_uniqueness_invalid_entity_id.cif	Uniqueness: two `entity` rows share the same `_entity.id`. Expect: duplicate-key error on each duplicate row (same message).
test_cross_check_uniqueness_valid_entity_ids.cif	Uniqueness positive case: two distinct `_entity.id` values. Expect: no duplicate-entity-id error.
test_cross_check_uniqueness_invalid_struct_asym_id.cif	Uniqueness: two `_struct_asym` rows share the same `_struct_asym.id`. Expect: duplicate-key errors.
test_cross_check_uniqueness_valid_struct_asym_ids.cif	Uniqueness positive case: distinct asym ids. Expect: no duplicate-asym-id error.
test_cross_check_uniqueness_invalid_entity_poly_entity_id.cif	Uniqueness: two `_entity_poly` rows with the same `entity_id`. Expect: duplicate-key errors.
test_cross_check_uniqueness_valid_entity_poly_entity_id.cif	Uniqueness positive case: one `entity_poly` row per entity. Expect: no duplicate-entity_poly error.
test_cross_check_make_mandatory_subtypes.cif	Subtype-gated required-item check for `makeMandatorySubtypes` (`em_3d_reconstruction` missing `resolution_method`). Expect: no subtype-specific error when subtype context is absent; error appears when subtype context includes `EM-single_part` or related subtype.
test_cross_check_cross_reference_selectors.cif	Selector-gated cross-reference check for `cross_reference_full` (`expt: coded`, `code: PDB`). Expect: selector rule skipped when code context is absent; cross-reference error appears when runtime context includes `requested_codes=['PDB']`.
test_procedural_diffrn_wavelength_invalid_single_for_laue.cif	Procedural validator migration: `diffrn_source.pdbx_wavelength_list` against `diffrn_radiation.pdbx_diffrn_protocol`. Expect: error when protocol is `LAUE` but wavelength list is a single value.
test_procedural_diffrn_wavelength_valid_single.cif	Procedural validator positive case. Expect: no procedural wavelength-list error when protocol is `SINGLE WAVELENGTH` and list has one value.
test_procedural_diffrn_wavelength_edge_missing.cif	Procedural validator edge case. Expect: no procedural wavelength-list error when wavelength value is missing (`?`).
test_procedural_diffrn_wavelength_invalid_empty_list_laue.cif	Procedural validator: `pdbx_wavelength_list` is an empty quoted value (`''`) while `pdbx_diffrn_protocol` is `LAUE` for the same `diffrn_id`. Expect: procedural error that the wavelength list must not be empty (may appear together with parent-category checks if `diffrn` is absent).
test_procedural_diffrn_wavelength_invalid_empty_list_single.cif	Same as above for protocol `SINGLE WAVELENGTH`. Expect: procedural empty-list error.
test_procedural_diffrn_wavelength_edge_empty_mismatched_diffrn_id.cif	Edge case: empty wavelength on `diffrn_id` 1 but LAUE protocol only on a different `diffrn_id`. Expect: no procedural empty-list error (no matching radiation row for that id).
test_procedural_database_related_invalid_pdb_id.cif	Procedural validator migration: `pdbx_database_related.db_id` format check for `db_name=PDB`. Expect: error for invalid PDB/deposition accession format.
test_procedural_database_related_valid_pdb_id.cif	Procedural validator positive case for `pdbx_database_related.db_id`. Expect: no procedural accession-format error for valid PDB ID.
test_procedural_database_related_edge_non_target_db.cif	Procedural validator edge case for non-target `db_name` values. Expect: no procedural accession-format error when `db_name` is not one of the configured procedural checks.
test_procedural_struct_ref_seq_invalid_genbank_accession.cif	Procedural validator migration: `pdbx_struct_ref_seq_depositor_info.db_accession` format for `db_name=GB`. Expect: error for invalid GenBank accession format.
test_procedural_struct_ref_seq_valid_genbank_accession.cif	Procedural validator positive case for `pdbx_struct_ref_seq_depositor_info.db_accession`. Expect: no procedural accession-format error for valid GenBank accession format.
test_procedural_struct_ref_seq_edge_empty_accession.cif	Procedural validator edge case for optional `db_accession`. Expect: no procedural accession-format error when accession is missing (`?`).
test_procedural_struct_ref_seq_invalid_uniprot_accession.cif	Procedural validator migration: `pdbx_struct_ref_seq_depositor_info.db_accession` format for `db_name=UNP`. Expect: error for invalid UniProt accession format.
test_procedural_struct_ref_seq_valid_uniprot_accession.cif	Procedural validator positive case for `db_name=UNP`. Expect: no procedural accession-format error for valid UniProt accession format.
test_procedural_initial_refinement_invalid_pdb_accession.cif	Procedural validator migration: conditional accession format for `pdbx_initial_refinement_model` when type is `experimental model` and source is `PDB`. Expect: error for invalid PDB accession format.
test_procedural_initial_refinement_valid_pdb_accession.cif	Procedural validator positive case for `pdbx_initial_refinement_model` (`experimental model` + `PDB`). Expect: no procedural accession-format error for valid PDB accession format.
test_procedural_initial_refinement_edge_non_matching_condition.cif	Procedural validator edge case for condition-gated rule. Expect: no procedural accession-format error when row does not match configured condition (e.g. source `Other`).
test_procedural_initial_refinement_invalid_pdbdev_accession.cif	Procedural validator migration: conditional accession format for `pdbx_initial_refinement_model` when type is `integrative model` and source is `PDB-Dev`. Expect: error for invalid PDB-Dev accession format.
test_procedural_initial_refinement_valid_pdbdev_accession.cif	Procedural validator positive case for `integrative model` + `PDB-Dev`. Expect: no procedural accession-format error for valid PDB-Dev accession format.
test_procedural_initial_refinement_invalid_alphafold_accession.cif	Procedural validator migration for `in silico model` + `AlphaFold`. Expect: error for invalid AlphaFold accession format.
test_procedural_initial_refinement_valid_alphafold_accession.cif	Procedural validator positive case for `in silico model` + `AlphaFold`. Expect: no procedural accession-format error for valid AlphaFold accession format.
test_procedural_initial_refinement_invalid_modelarchive_accession.cif	Procedural validator migration for `in silico model` + `ModelArchive`. Expect: error for invalid ModelArchive accession format.
test_procedural_initial_refinement_valid_modelarchive_accession.cif	Procedural validator positive case for `in silico model` + `ModelArchive`. Expect: no procedural accession-format error for valid ModelArchive accession format.
test_procedural_initial_refinement_invalid_integrative_source_name.cif	Procedural validator migration: for `pdbx_initial_refinement_model` with `type=integrative model`, `source_name` must be `PDB-Dev`. Expect: error when source name is not `PDB-Dev`.
test_procedural_initial_refinement_valid_integrative_source_name.cif	Procedural validator positive case for integrative source-name rule. Expect: no procedural source-name error when `source_name` is `PDB-Dev`.
test_procedural_initial_refinement_edge_non_integrative_source_name.cif	Procedural validator edge case for condition-gated source-name rule. Expect: no procedural source-name error when `type` is not `integrative model`.
test_procedural_entity_poly_warning_homopolymer_ala.cif	Procedural validator migration: `entity_poly.pdbx_seq_one_letter_code` all ALA (homopolymer). Expect: warning (poly-ALA homopolymer guidance).
test_procedural_entity_poly_warning_stretch_ala.cif	Procedural validator migration: sequence contains ten consecutive `A` (poly-ALA stretch) but is not all-ALA. Expect: warning (stretch guidance).
test_procedural_entity_poly_edge_normal_sequence.cif	Procedural validator edge case: ordinary one-letter sequence with no poly-ALA homopolymer or 10+ `A` stretch. Expect: no procedural entity_poly sequence warnings.

Output files

File	Purpose
`validation_baseline.txt`	Reference output; generate with `--generate-baseline`.
`validation_output.txt`	Output of the latest run; compare to baseline after code changes.

Paths in the output are normalized to <REPO> so that diffs are portable across machines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mmCIF Validator Testing Suite

Directory layout

How to use the suite

Regression workflow

Custom paths

Logging

Test files overview

Real PDB entries: metadata completeness and method detection

Synthetic test files: validation cases

Output files

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

mmCIF Validator Testing Suite

Directory layout

How to use the suite

Regression workflow

Custom paths

Logging

Test files overview

Real PDB entries: metadata completeness and method detection

Synthetic test files: validation cases

Output files