Skip to content

Add environmental_context field to ingredient schema for cross-repo environmental linking #1

@realmarcin

Description

@realmarcin

Summary

Add environmental_context field to MediaIngredientMech ingredient records to enable environmental linking with CommunityMech and CultureMech.

Background

Triggered by: CommunityMech issue #24 (SPRUCE Peatland Community addition)

Currently:

This creates a gap where ingredients cannot be linked to environmental contexts.

Proposed Schema Addition

Add environmental_context field to ingredient records:

# In mapped_ingredients_schema.yaml
environmental_context:
  description: Environmental contexts where this ingredient is relevant
  range: EnvironmentContext
  multivalued: true
  inlined_as_list: true

EnvironmentContext:
  description: Environmental context for ingredient usage
  attributes:
    environment_term:
      description: ENVO term for environment (CURIE format)
      range: string
      required: true
      pattern: "^ENVO:\\d{7,8}$"
    
    relevance:
      description: Why this ingredient is relevant to this environment
      range: EnvironmentRelevanceEnum
      required: true
    
    notes:
      description: Additional context
      range: string

EnvironmentRelevanceEnum:
  permissible_values:
    NATURAL_SOURCE:
      description: Ingredient is naturally sourced from this environment
    REQUIRED_FOR_ORGANISM:
      description: Required for cultivating organisms from this environment  
    SELECTIVE_AGENT:
      description: Selectively promotes organisms from this environment
    ENVIRONMENT_MIMIC:
      description: Mimics chemical conditions of this environment
    COMMONLY_USED:
      description: Commonly used for this environment (general association)

Example Usage

Example 1: Peatland-Specific Ingredient

ontology_id: CHEBI:27385
preferred_term: Humic acid
ontology_mapping:
  ontology_id: CHEBI:27385
  ontology_label: humic acid
  ontology_source: CHEBI

environmental_context:
  - environment_term: ENVO:00000044  # peatland
    relevance: NATURAL_SOURCE
    notes: "Major component of peat organic matter, provides carbon source for peat microbes"
  
  - environment_term: ENVO:00005773  # peat bog  
    relevance: REQUIRED_FOR_ORGANISM
    notes: "Required for cultivation of humic acid-degrading bacteria from peat bogs"

occurrence_statistics:
  total_occurrences: 45
  media_count: 12

Example 2: Multi-Environment Ingredient

ontology_id: CHEBI:26710
preferred_term: NaCl
ontology_mapping:
  ontology_id: CHEBI:26710
  ontology_label: sodium chloride

environmental_context:
  - environment_term: ENVO:00002149  # sea water
    relevance: ENVIRONMENT_MIMIC
    notes: "Provides salinity for marine organism cultivation"
  
  - environment_term: ENVO:00002044  # saline lake
    relevance: ENVIRONMENT_MIMIC
    notes: "High concentrations for halophilic organisms"
  
  - environment_term: ENVO:00000044  # peatland
    relevance: COMMONLY_USED
    notes: "Low concentrations as mineral source"

Example 3: Environment-Specific Extract

ontology_id: CHEBI:NNNNNN  # To be assigned
preferred_term: Sphagnum moss extract
ontology_mapping:
  ontology_id: CHEBI:NNNNNN
  ontology_label: plant extract

environmental_context:
  - environment_term: ENVO:00000044  # peatland
    relevance: NATURAL_SOURCE
    notes: "Extracted from Sphagnum moss, dominant peatland plant"
  
  - environment_term: ENVO:00005773  # peat bog
    relevance: REQUIRED_FOR_ORGANISM
    notes: "Essential for cultivating Sphagnum-associated bacteria"

Benefits

  1. Cross-repository discovery: Find ingredients relevant to specific environments
  2. Community curation: Auto-suggest ingredients when adding environment-specific communities
  3. Media formulation: Identify environment-appropriate ingredients when designing new media
  4. Environmental coverage: Track which environments have specialized ingredients
  5. Knowledge linking: Connect ingredient chemistry to ecological context

Use Cases

Use Case 1: Adding Peatland Community

When curating SPRUCE peatland community:

# Search for peatland-relevant ingredients
ingredients = search_ingredients(environment="ENVO:00000044")
# Returns: humic acid, Sphagnum extract, organic acids, etc.

Use Case 2: Formulating Environment-Specific Media

# Design medium for deep-sea organisms
ingredients = get_ingredients_by_environment("ENVO:01000030")  # hydrothermal vent
# Suggests: high-pressure tolerant components, sulfur compounds, etc.

Use Case 3: Environmental Coverage Analysis

Environment: Peatland (ENVO:00000044)
- Communities: 3
- Media: 15
- Specialized Ingredients: 8 ✅ (humic acid, Sphagnum extract, etc.)

Environment: Permafrost (ENVO:00000134)
- Communities: 2  
- Media: 1
- Specialized Ingredients: 0 ⚠️ GAP IDENTIFIED

Use Case 4: Cross-Repository Query

# Find complete environment profile
SELECT ?community ?media ?ingredient
WHERE {
  # Peatland communities
  ?community communitymech:environment_term ENVO:00000044 .
  
  # Peatland media
  ?media culturemech:source_environment ENVO:00000044 .
  
  # Peatland ingredients  
  ?ingredient mediaingredientmech:environmental_context ?ctx .
  ?ctx mediaingredientmech:environment_term ENVO:00000044 .
}

Implementation Plan

Phase 1: Schema Design

  • Review and refine EnvironmentContext class and EnvironmentRelevanceEnum
  • Define validation rules for ENVO terms
  • Decide on required vs. recommended fields

Phase 2: Schema Migration

  • Update mapped_ingredients_schema.yaml
  • Regenerate Python dataclasses
  • Update validation pipelines
  • Create migration documentation

Phase 3: Data Curation

Automated Inference

  • Analyze ingredient names for environment keywords ("marine", "soil", "gut")
  • Check occurrence in environment-tagged media (if CultureMech implemented)
  • Infer from chemical properties (pH, salinity, etc.)

LLM-Assisted Curation

# Use Claude to suggest environmental contexts
for ingredient in unmapped_ingredients:
    suggestion = llm.suggest_environment_context(
        name=ingredient.name,
        chebi_term=ingredient.ontology_id,
        occurrences=ingredient.media_count
    )
    # Returns: {environment_term, relevance, notes, confidence}

Prioritization

  1. High Priority:

    • Specialized extracts (Sphagnum, seaweed, etc.)
    • Environment-specific compounds (humic acid → peatland)
    • Selective agents (high salt → marine)
  2. Medium Priority:

    • Common ingredients used differently across environments
    • pH-specific buffers and compounds
  3. Low Priority (Can skip):

    • Ubiquitous ingredients (water, glucose) unless specific relevance

Migration Strategy

Backward Compatibility

  • Field is optional - existing records remain valid
  • No breaking changes to API or validation

Gradual Rollout

  1. Schema update (non-breaking)
  2. Add to new ingredient records
  3. Backfill high-priority ingredients
  4. Full coverage over 6-12 months

Notes

  • ENVO prefix already exists in schema: ENVO: http://purl.obolibrary.org/obo/ENVO_
  • Field is multivalued (ingredients can be relevant to multiple environments)
  • Field is optional (many ubiquitous ingredients may not need environmental context)
  • Coordinated with CultureMech issue #2 for source_environment field

Related Issues

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions