Common workflows for MediaIngredientMech, including data import, curation, validation, and CultureMech integration.
just installThis installs the package in editable mode with dev dependencies.
just gen-schemaGenerates Python dataclasses from the LinkML schema into src/mediaingredientmech/datamodel/.
just import-dataThis runs scripts/import_from_culturemech.py, which:
- Reads mapped ingredients from
CultureMech/output/mapped_ingredients.yaml(995 records) - Reads unmapped ingredients from
CultureMech/output/unmapped_ingredients.yaml(136 records) - Converts each ingredient into an IngredientRecord with:
- Mapped ingredients:
mapping_status: MAPPED, populatedontology_mapping - Unmapped ingredients:
mapping_status: UNMAPPED, noontology_mapping
- Mapped ingredients:
- Aggregates synonyms from raw text variants
- Populates occurrence statistics from media recipe counts
- Creates initial CurationEvent with
action: IMPORTED - Writes output to
data/curated/ingredients.yaml
After import, data/curated/ingredients.yaml contains an IngredientCollection:
generation_date: "2026-03-06T10:00:00"
total_count: 1131
mapped_count: 995
unmapped_count: 136
ingredients:
- identifier: CHEBI:26710
preferred_term: sodium chloride
mapping_status: MAPPED
ontology_mapping:
ontology_id: CHEBI:26710
ontology_label: sodium chloride
ontology_source: CHEBI
mapping_quality: EXACT_MATCH
synonyms:
- synonym_text: NaCl
synonym_type: ABBREVIATION
source: CultureMech
occurrence_statistics:
total_occurrences: 2341
media_count: 2341
curation_history:
- timestamp: "2026-03-06T10:00:00"
curator: system
action: IMPORTED
changes: "Imported from CultureMech mapped_ingredients.yaml"
new_status: MAPPED
# ... more recordsRunning just import-data again overwrites existing data. Always create a snapshot first:
just snapshot
just import-dataA typical curation session follows this pattern:
just reportReview the progress report to see how many ingredients remain unmapped and which categories need attention.
just snapshotOutput:
Snapshot created: data/snapshots/20260306_103000
just curateThe interactive CLI presents unmapped ingredients sorted by occurrence count. See the Curation Guide for detailed instructions.
just validate-allThis runs scripts/validate_all.py, which checks:
- Schema compliance (all required fields present, correct types)
- Ontology ID format validation (
^[A-Z]+:[0-9]+$) - Ontology term existence (via OAK/OLS if configured)
- Status consistency (MAPPED records have ontology_mapping, UNMAPPED do not)
Expected output:
Validating data/curated/ingredients.yaml...
Checked 1131 records.
Errors: 0
Warnings: 2
- WARN: CHEBI:99999 not found in CHEBI (record: CHEBI:99999)
- WARN: Missing occurrence_statistics for 3 records
just reportThis runs scripts/generate_report.py and shows curation progress statistics.
If validation passes, commit the curated data:
git add data/curated/
git commit -m "Curate batch: mapped 15 ingredients"just validate-alljust validate-schemaChecks that the LinkML schema itself is syntactically valid.
| Check | Description | Severity |
|---|---|---|
| Required fields | All required fields are present | Error |
| Type checking | Field values match declared types | Error |
| Enum values | Enum fields contain valid values | Error |
| Ontology ID format | IDs match ^[A-Z]+:[0-9]+$ |
Error |
| Status consistency | MAPPED records have ontology_mapping | Error |
| Ontology term existence | Term exists in source ontology | Warning |
| Missing statistics | Records without occurrence_statistics | Warning |
| Orphan synonyms | Synonyms without occurrence counts | Warning |
just snapshotSnapshots are stored in data/snapshots/<timestamp>/ and contain copies of all files in data/curated/. The snapshots directory is excluded from git.
ls data/snapshots/Output:
20260305_090000/
20260306_103000/
20260306_143000/
To restore data from a previous snapshot:
cp data/snapshots/20260306_103000/*.yaml data/curated/
just validate-allAlways validate after restoring to confirm data integrity.
- Create a snapshot before each curation session
- Create a snapshot before re-importing data
- Commit curated data to git regularly
- Git provides the primary version history; snapshots provide quick rollback within a session
CultureMech MediaIngredientMech
----------- -------------------
output/mapped_ingredients.yaml --import--> data/curated/ingredients.yaml
output/unmapped_ingredients.yaml --import--> (curate and validate)
data/curated/ingredients.yaml
input/ingredient_mappings.yaml <--export-- (validated mappings)
just import-dataReads from the CultureMech output directory and creates IngredientRecords. See the Data Import section above for details.
After curating and validating ingredients, export the mappings back to CultureMech format:
python scripts/export_to_culturemech.pyThis produces a file compatible with CultureMech's expected input format, containing:
- All MAPPED ingredients with their ontology IDs and labels
- Updated synonym lists
- Mapping quality metadata
When CultureMech data is updated (new media recipes added, new ingredients discovered):
- Create a snapshot of current curated data
- Re-run import:
just import-data - The import script preserves existing curation history for known ingredients
- New ingredients appear as UNMAPPED
- Validate and curate the new entries
- Export updated mappings back to CultureMech
MediaIngredientMech expects CultureMech data at a sibling path:
KG-Microbe/
CultureMech/
output/
mapped_ingredients.yaml
unmapped_ingredients.yaml
MediaIngredientMech/
data/curated/
ingredients.yaml
If CultureMech is in a different location, set the CULTUREMECH_DIR environment variable before importing.
just test # Run all tests
just test-cov # Run tests with coverage reportjust format # Format code with black
just lint # Lint code with ruff
just typecheck # Type check with mypy
just check # Run all quality checks (lint + typecheck + test)just clean # Remove generated files, caches, and build artifacts- Curation Guide - Detailed curation instructions and quality standards
- Schema Reference - Complete data model documentation