Workflows

Common workflows for MediaIngredientMech, including data import, curation, validation, and CultureMech integration.

Initial Setup

Install Dependencies

just install

This installs the package in editable mode with dev dependencies.

Generate Schema Dataclasses

just gen-schema

Generates Python dataclasses from the LinkML schema into src/mediaingredientmech/datamodel/.

Data Import

Import from CultureMech

just import-data

This runs scripts/import_from_culturemech.py, which:

Reads mapped ingredients from CultureMech/output/mapped_ingredients.yaml (995 records)
Reads unmapped ingredients from CultureMech/output/unmapped_ingredients.yaml (136 records)
Converts each ingredient into an IngredientRecord with:
- Mapped ingredients: mapping_status: MAPPED, populated ontology_mapping
- Unmapped ingredients: mapping_status: UNMAPPED, no ontology_mapping
Aggregates synonyms from raw text variants
Populates occurrence statistics from media recipe counts
Creates initial CurationEvent with action: IMPORTED
Writes output to data/curated/ingredients.yaml

Expected Output Structure

After import, data/curated/ingredients.yaml contains an IngredientCollection:

generation_date: "2026-03-06T10:00:00"
total_count: 1131
mapped_count: 995
unmapped_count: 136
ingredients:
  - identifier: CHEBI:26710
    preferred_term: sodium chloride
    mapping_status: MAPPED
    ontology_mapping:
      ontology_id: CHEBI:26710
      ontology_label: sodium chloride
      ontology_source: CHEBI
      mapping_quality: EXACT_MATCH
    synonyms:
      - synonym_text: NaCl
        synonym_type: ABBREVIATION
        source: CultureMech
    occurrence_statistics:
      total_occurrences: 2341
      media_count: 2341
    curation_history:
      - timestamp: "2026-03-06T10:00:00"
        curator: system
        action: IMPORTED
        changes: "Imported from CultureMech mapped_ingredients.yaml"
        new_status: MAPPED
  # ... more records

Re-importing

Running just import-data again overwrites existing data. Always create a snapshot first:

just snapshot
just import-data

Batch Curation Workflow

A typical curation session follows this pattern:

Step 1: Check Current Status

just report

Review the progress report to see how many ingredients remain unmapped and which categories need attention.

Step 2: Create a Snapshot

just snapshot

Output:

Snapshot created: data/snapshots/20260306_103000

Step 3: Curate Ingredients

just curate

The interactive CLI presents unmapped ingredients sorted by occurrence count. See the Curation Guide for detailed instructions.

Step 4: Validate Changes

just validate-all

This runs scripts/validate_all.py, which checks:

Schema compliance (all required fields present, correct types)
Ontology ID format validation (^[A-Z]+:[0-9]+$)
Ontology term existence (via OAK/OLS if configured)
Status consistency (MAPPED records have ontology_mapping, UNMAPPED do not)

Expected output:

Validating data/curated/ingredients.yaml...
Checked 1131 records.
Errors: 0
Warnings: 2
  - WARN: CHEBI:99999 not found in CHEBI (record: CHEBI:99999)
  - WARN: Missing occurrence_statistics for 3 records

Step 5: Generate Report

just report

This runs scripts/generate_report.py and shows curation progress statistics.

Step 6: Commit Changes

If validation passes, commit the curated data:

git add data/curated/
git commit -m "Curate batch: mapped 15 ingredients"

Validation

Validate All Data

just validate-all

Validate Schema Only

just validate-schema

Checks that the LinkML schema itself is syntactically valid.

What Gets Validated

Check	Description	Severity
Required fields	All required fields are present	Error
Type checking	Field values match declared types	Error
Enum values	Enum fields contain valid values	Error
Ontology ID format	IDs match `^[A-Z]+:[0-9]+$`	Error
Status consistency	MAPPED records have ontology_mapping	Error
Ontology term existence	Term exists in source ontology	Warning
Missing statistics	Records without occurrence_statistics	Warning
Orphan synonyms	Synonyms without occurrence counts	Warning

Backup and Restore

Creating Snapshots

just snapshot

Snapshots are stored in data/snapshots/<timestamp>/ and contain copies of all files in data/curated/. The snapshots directory is excluded from git.

Listing Snapshots

ls data/snapshots/

Output:

20260305_090000/
20260306_103000/
20260306_143000/

Restoring from a Snapshot

To restore data from a previous snapshot:

cp data/snapshots/20260306_103000/*.yaml data/curated/
just validate-all

Always validate after restoring to confirm data integrity.

Backup Strategy

Create a snapshot before each curation session
Create a snapshot before re-importing data
Commit curated data to git regularly
Git provides the primary version history; snapshots provide quick rollback within a session

CultureMech Integration

Data Flow

CultureMech                    MediaIngredientMech
-----------                    -------------------
output/mapped_ingredients.yaml   --import-->  data/curated/ingredients.yaml
output/unmapped_ingredients.yaml --import-->       (curate and validate)
                                              data/curated/ingredients.yaml
input/ingredient_mappings.yaml <--export--        (validated mappings)

Import from CultureMech

just import-data

Reads from the CultureMech output directory and creates IngredientRecords. See the Data Import section above for details.

Round-Trip Export to CultureMech

After curating and validating ingredients, export the mappings back to CultureMech format:

python scripts/export_to_culturemech.py

This produces a file compatible with CultureMech's expected input format, containing:

All MAPPED ingredients with their ontology IDs and labels
Updated synonym lists
Mapping quality metadata

Keeping Data in Sync

When CultureMech data is updated (new media recipes added, new ingredients discovered):

Create a snapshot of current curated data
Re-run import: just import-data
The import script preserves existing curation history for known ingredients
New ingredients appear as UNMAPPED
Validate and curate the new entries
Export updated mappings back to CultureMech

Directory Conventions

MediaIngredientMech expects CultureMech data at a sibling path:

KG-Microbe/
  CultureMech/
    output/
      mapped_ingredients.yaml
      unmapped_ingredients.yaml
  MediaIngredientMech/
    data/curated/
      ingredients.yaml

If CultureMech is in a different location, set the CULTUREMECH_DIR environment variable before importing.

Development Workflows

Running Tests

just test          # Run all tests
just test-cov      # Run tests with coverage report

Code Quality

just format        # Format code with black
just lint          # Lint code with ruff
just typecheck     # Type check with mypy
just check         # Run all quality checks (lint + typecheck + test)

Cleaning Up

just clean         # Remove generated files, caches, and build artifacts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflows

Initial Setup

Install Dependencies

Generate Schema Dataclasses

Data Import

Import from CultureMech

Expected Output Structure

Re-importing

Batch Curation Workflow

Step 1: Check Current Status

Step 2: Create a Snapshot

Step 3: Curate Ingredients

Step 4: Validate Changes

Step 5: Generate Report

Step 6: Commit Changes

Validation

Validate All Data

Validate Schema Only

What Gets Validated

Backup and Restore

Creating Snapshots

Listing Snapshots

Restoring from a Snapshot

Backup Strategy

CultureMech Integration

Data Flow

Import from CultureMech

Round-Trip Export to CultureMech

Keeping Data in Sync

Directory Conventions

Development Workflows

Running Tests

Code Quality

Cleaning Up

Related Documentation

FilesExpand file tree

WORKFLOWS.md

Latest commit

History

WORKFLOWS.md

File metadata and controls

Workflows

Initial Setup

Install Dependencies

Generate Schema Dataclasses

Data Import

Import from CultureMech

Expected Output Structure

Re-importing

Batch Curation Workflow

Step 1: Check Current Status

Step 2: Create a Snapshot

Step 3: Curate Ingredients

Step 4: Validate Changes

Step 5: Generate Report

Step 6: Commit Changes

Validation

Validate All Data

Validate Schema Only

What Gets Validated

Backup and Restore

Creating Snapshots

Listing Snapshots

Restoring from a Snapshot

Backup Strategy

CultureMech Integration

Data Flow

Import from CultureMech

Round-Trip Export to CultureMech

Keeping Data in Sync

Directory Conventions

Development Workflows

Running Tests

Code Quality

Cleaning Up

Related Documentation