This directory contains the regression test suite for the mmCIF validator: test CIF files, the runner script, and generated output for comparison.
testing/
├── README.md # This file
├── run_validation_suite.py # Run validator on all CIFs, save/compare output
├── validation_baseline.txt # Saved reference output (generate with --generate-baseline)
├── validation_output.txt # Latest run output (compare to baseline)
└── cif_files/ # All test .cif files
├── 6ijw.cif ... 8q6j.cif # Real PDB entries (method + metadata completeness)
└── test_*.cif # Synthetic tests (validation cases)
Run from the repository root:
# Run validation on all CIFs; write results to testing/validation_output.txt
python testing/run_validation_suite.py
# Generate or refresh the baseline (do this once before code changes, or to accept new behaviour)
python testing/run_validation_suite.py --generate-baselineOr from the testing/ directory:
cd testing
python run_validation_suite.py
python run_validation_suite.py --generate-baseline- Before changing validator code: Run with
--generate-baselineto createvalidation_baseline.txt. - Make your code changes.
- After changes: Run without
--generate-baseline(writes tovalidation_output.txt). - Compare: Diff the two files to see what changed.
- Windows:
fc testing\validation_baseline.txt testing\validation_output.txt - Linux/macOS:
diff testing/validation_baseline.txt testing/validation_output.txt
- Windows:
- Review the diff: new or removed errors may be expected (e.g. fixes) or regressions.
python testing/run_validation_suite.py --dict path/to/mmcif.dic --tests path/to/cif_folder
python testing/run_validation_suite.py --generate-baseline -o path/to/my_baseline.txtBy default the suite uses the PDBx/mmCIF dictionary from http://mmcif.pdb.org/dictionaries/ascii/mmcif_pdbx.dic; you can override this with --dict.
The suite logs to stderr so you can redirect or diff the main output file without mixing in log lines. It logs:
- Dictionary source, test directory, and output file path
- Number of CIF files to process
- With
--verbose/-v: each file as it is validated and its exit code (passed/failed) - Final line: where output was written and how many files had validation issues
Example with verbose:
python testing/run_validation_suite.py -vThese files are full (or substantial) PDB mmCIF entries. They are used to check:
- Metadata completeness: The validator’s JSON output includes a
metadata_completenessobject (percentage, filled/total counts, missing categories, missing items). These files exercise that logic with realistic method-specific mandatory categories. - Method recognition: The validator infers experimental method from which categories are present in the file (method-specific mandatory category lists). The reported
method_detectedin the JSON should match the method implied by the entry.
| File | _exptl.method | Expected method_detected | Purpose |
|---|---|---|---|
| 6ijw.cif | SOLUTION NMR | nmr |
NMR method + completeness |
| 6qvt.cif | X-RAY DIFFRACTION | xray |
X-ray method + completeness |
| 6ssp.cif | X-RAY DIFFRACTION | xray |
X-ray method + completeness |
| 7q5a.cif | ELECTRON MICROSCOPY | em |
EM method + completeness |
| 8ozl.cif | ELECTRON MICROSCOPY | em |
EM method + completeness |
| 8pps.cif | X-RAY DIFFRACTION | xray |
X-ray method + completeness |
| 8pwh.cif | ELECTRON MICROSCOPY | em |
EM method + completeness |
| 8q6j.cif | ELECTRON MICROSCOPY | em |
EM method + completeness |
Method detection is based on which categories exist in the file (from the completeness lists), not on the literal value of _exptl.method; the table above documents how these entries are expected to be classified.
Each test_*.cif file is a small CIF chosen to trigger one or more specific validator behaviours (errors or warnings). Use these to confirm that the validator reports the right issue for each scenario.
| File | Case(s) covered |
|---|---|
| test_duplicate_item.cif | Same item appears twice in one block (e.g. _entry.id twice). Expect: duplicate item error. |
| test_duplicate_category.cif | Same category given in two separate blocks (e.g. two entity blocks). Expect: duplicate category error. |
| test_format_error_entity_poly.cif | Malformed entity_poly: loop with one data row followed by key–value pairs of the same category. Expect: duplicate category or format error once the parser records the loop block. |
| test_loop_row_mismatch.cif | Loop with wrong number of values in a row (e.g. two columns, second row has one value). Exercises loop parsing and may surface row-length or parsing errors. |
| test_multiple_data_blocks.cif | File contains two data_ blocks. Expect: only the first block is validated (parser stops at second data_). |
| test_value_out_of_range.cif | Item with type positive_int (e.g. _em_image_scans.dimension_height) set to 0. Expect: type/range error. |
| test_type_checks_pdb_id_and_date.cif | Invalid pdb_id-like value and invalid date format. Expect: type errors for the offending values. |
| test_enum_invalid_em_software.cif | _em_software.name value not in the dictionary enumeration (e.g. phaser_voyager.em_placement). Expect: enumeration error (once _pdbx_item_enumeration is parsed). |
| test_asym_id_valid_invalid.cif | _atom_site.label_asym_id / auth_asym_id with valid (e.g. A) and invalid (e.g. B:Axp) values. Expect: asym_id format errors for the invalid values when enforced. |
| test_mandatory_missing_item.cif | Category present but a mandatory item missing (e.g. entity without _entity.id). Expect: missing mandatory item error. |
| test_fk_missing_parent.cif | Child references non-existent parent (e.g. atom_site.label_asym_id = Z with no struct_asym.id = Z). Expect: foreign-key / parent-missing error. |
| test_composite_fk_mismatch.cif | Rows in atom_site that may violate composite key or parent–child consistency (e.g. label_asym_id + label_comp_id + label_seq_id). Exercises composite-FK logic. |
| test_undefined_items.cif | Item names not in the dictionary (e.g. _my_local_category.foo, _not_defined_item). Expect: undefined-item warnings/errors as implemented. |
| test_advisory_range_warning.cif | Value outside advisory (e.g. _exptl_crystal.density_Matthews = 10.0 vs recommended range). Expect: advisory-range warning, not hard error. |
| test_multiline_and_quoted_values.cif | Loop containing multi-line text (semicolon-delimited) and quoted values with spaces. Exercises parsing of multi-line and quoted loop values. |
| test_cross_check_dictionary_enum.cif | Cross-item dictionary enumeration compatibility check (e.g. _diffrn_detector.type incompatible with _diffrn_detector.detector). Expect: cross-check error from dictionary detail mapping. |
| test_cross_check_conditional_refine_mr_starting_model_skipped_when_initial_refinement_present.cif | Conditional required: _refine.pdbx_method_to_determine_struct is molecular replacement and _pdbx_initial_refinement_model is present; _refine.pdbx_starting_model is ?. Expect: no cross-check error for pdbx_starting_model (superseded by pdbx_initial_refinement_model); file is otherwise minimal so validation can pass end-to-end. |
| test_cross_check_conditional_refine_mr_starting_model_required_without_initial_refinement.cif | Conditional required: molecular replacement without pdbx_initial_refinement_model. Expect: cross-check error requiring pdbx_starting_model when it is missing. |
| test_cross_check_date_order_invalid_coords_before_deposition.cif | Pairwise date order: _pdbx_database_status.recvd_initial_deposition_date must not be after date_coordinates. Expect: cross-check error. |
| test_cross_check_date_order_valid_deposition_coords.cif | Positive case: initial deposition on or before coordinates date. Expect: no date-order error from this rule pair. |
| test_cross_check_date_order_valid_same_day.cif | Edge case: same calendar day for both dates (<=). Expect: no date-order error. |
| test_cross_check_date_order_edge_missing_coords.cif | Edge case: date_coordinates missing (?). Expect: no date-order error when the secondary date is absent. |
| test_cross_check_date_order_invalid_begin_after_end.cif | Date order: date_begin_deposition must not be after date_end_processing. Expect: cross-check error. |
| test_cross_check_date_order_invalid_form_after_initial.cif | Date order: date_deposition_form must not be after recvd_initial_deposition_date. Expect: cross-check error. |
| test_cross_check_uniqueness_invalid_entity_id.cif | Uniqueness: two entity rows share the same _entity.id. Expect: duplicate-key error on each duplicate row (same message). |
| test_cross_check_uniqueness_valid_entity_ids.cif | Uniqueness positive case: two distinct _entity.id values. Expect: no duplicate-entity-id error. |
| test_cross_check_uniqueness_invalid_struct_asym_id.cif | Uniqueness: two _struct_asym rows share the same _struct_asym.id. Expect: duplicate-key errors. |
| test_cross_check_uniqueness_valid_struct_asym_ids.cif | Uniqueness positive case: distinct asym ids. Expect: no duplicate-asym-id error. |
| test_cross_check_uniqueness_invalid_entity_poly_entity_id.cif | Uniqueness: two _entity_poly rows with the same entity_id. Expect: duplicate-key errors. |
| test_cross_check_uniqueness_valid_entity_poly_entity_id.cif | Uniqueness positive case: one entity_poly row per entity. Expect: no duplicate-entity_poly error. |
| test_cross_check_make_mandatory_subtypes.cif | Subtype-gated required-item check for makeMandatorySubtypes (em_3d_reconstruction missing resolution_method). Expect: no subtype-specific error when subtype context is absent; error appears when subtype context includes EM-single_part or related subtype. |
| test_cross_check_cross_reference_selectors.cif | Selector-gated cross-reference check for cross_reference_full (expt: coded, code: PDB). Expect: selector rule skipped when code context is absent; cross-reference error appears when runtime context includes requested_codes=['PDB']. |
| test_procedural_diffrn_wavelength_invalid_single_for_laue.cif | Procedural validator migration: diffrn_source.pdbx_wavelength_list against diffrn_radiation.pdbx_diffrn_protocol. Expect: error when protocol is LAUE but wavelength list is a single value. |
| test_procedural_diffrn_wavelength_valid_single.cif | Procedural validator positive case. Expect: no procedural wavelength-list error when protocol is SINGLE WAVELENGTH and list has one value. |
| test_procedural_diffrn_wavelength_edge_missing.cif | Procedural validator edge case. Expect: no procedural wavelength-list error when wavelength value is missing (?). |
| test_procedural_diffrn_wavelength_invalid_empty_list_laue.cif | Procedural validator: pdbx_wavelength_list is an empty quoted value ('') while pdbx_diffrn_protocol is LAUE for the same diffrn_id. Expect: procedural error that the wavelength list must not be empty (may appear together with parent-category checks if diffrn is absent). |
| test_procedural_diffrn_wavelength_invalid_empty_list_single.cif | Same as above for protocol SINGLE WAVELENGTH. Expect: procedural empty-list error. |
| test_procedural_diffrn_wavelength_edge_empty_mismatched_diffrn_id.cif | Edge case: empty wavelength on diffrn_id 1 but LAUE protocol only on a different diffrn_id. Expect: no procedural empty-list error (no matching radiation row for that id). |
| test_procedural_database_related_invalid_pdb_id.cif | Procedural validator migration: pdbx_database_related.db_id format check for db_name=PDB. Expect: error for invalid PDB/deposition accession format. |
| test_procedural_database_related_valid_pdb_id.cif | Procedural validator positive case for pdbx_database_related.db_id. Expect: no procedural accession-format error for valid PDB ID. |
| test_procedural_database_related_edge_non_target_db.cif | Procedural validator edge case for non-target db_name values. Expect: no procedural accession-format error when db_name is not one of the configured procedural checks. |
| test_procedural_struct_ref_seq_invalid_genbank_accession.cif | Procedural validator migration: pdbx_struct_ref_seq_depositor_info.db_accession format for db_name=GB. Expect: error for invalid GenBank accession format. |
| test_procedural_struct_ref_seq_valid_genbank_accession.cif | Procedural validator positive case for pdbx_struct_ref_seq_depositor_info.db_accession. Expect: no procedural accession-format error for valid GenBank accession format. |
| test_procedural_struct_ref_seq_edge_empty_accession.cif | Procedural validator edge case for optional db_accession. Expect: no procedural accession-format error when accession is missing (?). |
| test_procedural_struct_ref_seq_invalid_uniprot_accession.cif | Procedural validator migration: pdbx_struct_ref_seq_depositor_info.db_accession format for db_name=UNP. Expect: error for invalid UniProt accession format. |
| test_procedural_struct_ref_seq_valid_uniprot_accession.cif | Procedural validator positive case for db_name=UNP. Expect: no procedural accession-format error for valid UniProt accession format. |
| test_procedural_initial_refinement_invalid_pdb_accession.cif | Procedural validator migration: conditional accession format for pdbx_initial_refinement_model when type is experimental model and source is PDB. Expect: error for invalid PDB accession format. |
| test_procedural_initial_refinement_valid_pdb_accession.cif | Procedural validator positive case for pdbx_initial_refinement_model (experimental model + PDB). Expect: no procedural accession-format error for valid PDB accession format. |
| test_procedural_initial_refinement_edge_non_matching_condition.cif | Procedural validator edge case for condition-gated rule. Expect: no procedural accession-format error when row does not match configured condition (e.g. source Other). |
| test_procedural_initial_refinement_invalid_pdbdev_accession.cif | Procedural validator migration: conditional accession format for pdbx_initial_refinement_model when type is integrative model and source is PDB-Dev. Expect: error for invalid PDB-Dev accession format. |
| test_procedural_initial_refinement_valid_pdbdev_accession.cif | Procedural validator positive case for integrative model + PDB-Dev. Expect: no procedural accession-format error for valid PDB-Dev accession format. |
| test_procedural_initial_refinement_invalid_alphafold_accession.cif | Procedural validator migration for in silico model + AlphaFold. Expect: error for invalid AlphaFold accession format. |
| test_procedural_initial_refinement_valid_alphafold_accession.cif | Procedural validator positive case for in silico model + AlphaFold. Expect: no procedural accession-format error for valid AlphaFold accession format. |
| test_procedural_initial_refinement_invalid_modelarchive_accession.cif | Procedural validator migration for in silico model + ModelArchive. Expect: error for invalid ModelArchive accession format. |
| test_procedural_initial_refinement_valid_modelarchive_accession.cif | Procedural validator positive case for in silico model + ModelArchive. Expect: no procedural accession-format error for valid ModelArchive accession format. |
| test_procedural_initial_refinement_invalid_integrative_source_name.cif | Procedural validator migration: for pdbx_initial_refinement_model with type=integrative model, source_name must be PDB-Dev. Expect: error when source name is not PDB-Dev. |
| test_procedural_initial_refinement_valid_integrative_source_name.cif | Procedural validator positive case for integrative source-name rule. Expect: no procedural source-name error when source_name is PDB-Dev. |
| test_procedural_initial_refinement_edge_non_integrative_source_name.cif | Procedural validator edge case for condition-gated source-name rule. Expect: no procedural source-name error when type is not integrative model. |
| test_procedural_entity_poly_warning_homopolymer_ala.cif | Procedural validator migration: entity_poly.pdbx_seq_one_letter_code all ALA (homopolymer). Expect: warning (poly-ALA homopolymer guidance). |
| test_procedural_entity_poly_warning_stretch_ala.cif | Procedural validator migration: sequence contains ten consecutive A (poly-ALA stretch) but is not all-ALA. Expect: warning (stretch guidance). |
| test_procedural_entity_poly_edge_normal_sequence.cif | Procedural validator edge case: ordinary one-letter sequence with no poly-ALA homopolymer or 10+ A stretch. Expect: no procedural entity_poly sequence warnings. |
| File | Purpose |
|---|---|
validation_baseline.txt |
Reference output; generate with --generate-baseline. |
validation_output.txt |
Output of the latest run; compare to baseline after code changes. |
Paths in the output are normalized to <REPO> so that diffs are portable across machines.