Skip to content

Enhance cross-repository environmental linking with CultureMech and MediaIngredientMech #30

@realmarcin

Description

@realmarcin

Summary

Enhance CommunityMech's environmental linking capabilities to support bidirectional references with CultureMech media and MediaIngredientMech ingredients based on shared ENVO terms.

Background

Triggered by: Issue #24 (SPRUCE Peatland Community addition)

CommunityMech currently has robust environmental metadata:

environment_term:
  preferred_term: peatland
  term:
    id: ENVO:00000044
    label: peatland

However, there's no mechanism to:

  1. Link communities to relevant CultureMech media for the same environment
  2. Link communities to relevant MediaIngredientMech ingredients
  3. Auto-suggest media/ingredients when curating new communities

Related Schema Enhancements

In Progress:

These will enable environment-based cross-repository queries.

Proposed CommunityMech Enhancements

Option 1: Enhance growth_media Field (Recommended)

Currently, growth_media field exists but may not have CultureMech ID linking:

# Current (if implemented)
growth_media:
  - name: Some Medium
    description: "Medium description"

# Proposed Enhancement
growth_media:
  - preferred_term: Acidic Peatland Medium
    culturemech_id: CultureMech:010001  # Direct link
    environment_match: ENVO:00000044    # Matched via environment
    notes: "Used for cultivating methanogenic archaea from SPRUCE"
    
  - preferred_term: Generic Anaerobic Medium  
    culturemech_id: CultureMech:005432
    environment_match: null  # Not environment-specific
    notes: "General purpose medium"

Option 2: Add related_media Field

If growth_media is for actual media used, add separate field for related media:

related_media:
  description: CultureMech media relevant to this community's environment
  range: RelatedMedia
  multivalued: true
  inlined_as_list: true

RelatedMedia:
  attributes:
    preferred_term:
      description: Media name
      required: true
    culturemech_id:
      description: CultureMech identifier
      range: string
      pattern: "^CultureMech:\\d{6}$"
    relationship_type:
      description: How media relates to community
      range: MediaRelationshipEnum
      # VALUES: CULTIVATION_MEDIUM, ISOLATION_MEDIUM, 
      #         ENVIRONMENT_ANALOG, REFERENCED_IN_STUDY
    evidence:
      description: Evidence for this relationship
      range: EvidenceItem
      multivalued: true

Option 3: Add related_ingredients Field

Similarly for ingredients:

related_ingredients:
  description: MediaIngredientMech ingredients relevant to this community
  range: RelatedIngredient
  multivalued: true
  inlined_as_list: true

RelatedIngredient:
  attributes:
    preferred_term:
      description: Ingredient name
      required: true
    mediaingredientmech_id:
      description: MediaIngredientMech identifier  
      range: string
      pattern: "^MediaIngredientMech:\\d{6}$"
    relevance:
      description: Why ingredient is relevant
      range: string
    evidence:
      description: Evidence for relevance
      range: EvidenceItem
      multivalued: true

Use Cases

Use Case 1: Adding a New Peatland Community

Current Workflow:

  1. User provides PMIDs, environmental info
  2. Create community record with environment_term: ENVO:00000044
  3. ❌ No way to find relevant media
  4. ❌ No way to find relevant ingredients

Enhanced Workflow:

  1. User provides PMIDs, environmental info
  2. Create community record with environment_term: ENVO:00000044
  3. Auto-query CultureMech: "Find media with source_environment: ENVO:00000044"
  4. Auto-query MediaIngredientMech: "Find ingredients with environmental_context: ENVO:00000044"
  5. Auto-suggest: "15 peatland media found, 8 peatland ingredients found"
  6. User selects relevant media/ingredients to link

Use Case 2: Environmental Coverage Dashboard

Environment: Peatland (ENVO:00000044)
┌─────────────────────┬───────┬────────────────────────┐
│ Repository          │ Count │ Status                 │
├─────────────────────┼───────┼────────────────────────┤
│ Communities         │ 3     │ ✅ SPRUCE, ...         │
│ Media (CultureMech) │ 15    │ ✅ Good coverage       │
│ Ingredients (MIM)   │ 8     │ ✅ Specialized items   │
└─────────────────────┴───────┴────────────────────────┘

Environment: Deep-sea hydrothermal vent (ENVO:01000030)
┌─────────────────────┬───────┬────────────────────────┐
│ Repository          │ Count │ Status                 │
├─────────────────────┼───────┼────────────────────────┤
│ Communities         │ 5     │ ✅ Well studied        │
│ Media (CultureMech) │ 2     │ ⚠️ Need more media     │
│ Ingredients (MIM)   │ 1     │ ⚠️ Need more items     │
└─────────────────────┴───────┴────────────────────────┘

Use Case 3: Cross-Repository SPARQL Query

# Find complete environmental profile
SELECT ?community ?community_name ?media ?media_name ?ingredient ?ingredient_name
WHERE {
  # Communities in peatland
  ?community a communitymech:MicrobialCommunity ;
             communitymech:environment_term/communitymech:id "ENVO:00000044" ;
             communitymech:name ?community_name .
  
  # Media for peatland organisms
  ?media a culturemech:CultureMedia ;
         culturemech:source_environment/culturemech:id "ENVO:00000044" ;
         culturemech:name ?media_name .
  
  # Ingredients relevant to peatland
  ?ingredient a mediaingredientmech:MappedIngredient ;
              mediaingredientmech:environmental_context/mediaingredientmech:environment_term "ENVO:00000044" ;
              mediaingredientmech:preferred_term ?ingredient_name .
}

Use Case 4: add_community Skill Enhancement

Update the orchestration skill to automatically suggest cross-repo links:

# In add_community skill workflow
def match_culturemech_media(community_environment: str):
    """Find CultureMech media matching community environment"""
    
    # Query CultureMech for matching environment
    media_results = culturemech_api.search(
        source_environment=community_environment
    )
    
    # Return suggestions
    return [
        {
            "culturemech_id": media.id,
            "name": media.name,
            "confidence": calculate_relevance_score(media, community),
            "evidence": "Environment match: " + community_environment
        }
        for media in media_results
    ]

# Similarly for ingredients
def match_mediaingredient_items(community_environment: str):
    """Find MediaIngredientMech ingredients matching environment"""
    
    ingredient_results = mediaingredient_api.search(
        environmental_context=community_environment
    )
    
    return format_ingredient_suggestions(ingredient_results)

Implementation Plan

Phase 1: Schema Review (Weeks 1-2)

  • Review current growth_media field implementation
  • Decide: enhance growth_media vs. add related_media/related_ingredients
  • Define validation rules for cross-repo IDs
  • Coordinate with CultureMech and MediaIngredientMech schema changes

Phase 2: Schema Enhancement (Weeks 3-4)

  • Update communitymech.yaml schema
  • Add CultureMech ID and MediaIngredientMech ID linking
  • Regenerate Python dataclasses
  • Update validation pipelines

Phase 3: Tooling Enhancement (Weeks 5-6)

  • Update add_community skill to query CultureMech by environment
  • Update add_community skill to query MediaIngredientMech by environment
  • Create auto-suggestion interface for media/ingredient linking
  • Add cross-repo validation (check IDs exist)

Phase 4: Documentation (Week 7)

  • Document cross-repo linking patterns
  • Create tutorial for adding environment-linked communities
  • Add SPARQL query examples

Phase 5: Backfill (Ongoing)

  • For existing communities with environment terms, suggest media/ingredient links
  • Prioritize well-characterized environments (peatland, marine, gut, soil)

Example: SPRUCE Community Enhancement

Current State (Issue #24)

id: CommunityMech:000024
name: SPRUCE Peatland Warming Microbial Community
environment_term:
  preferred_term: peatland
  term:
    id: ENVO:00000044
    label: peatland

Enhanced State (After Implementation)

id: CommunityMech:000024
name: SPRUCE Peatland Warming Microbial Community

environment_term:
  preferred_term: peatland
  term:
    id: ENVO:00000044
    label: peatland

# Auto-discovered from CultureMech
related_media:
  - preferred_term: Acidic Peatland Medium
    culturemech_id: CultureMech:010001
    relationship_type: ENVIRONMENT_ANALOG
    evidence:
      - reference: PMID:38515239
        supports: SUPPORT
        evidence_source: IN_VIVO
        snippet: "Peat microbial communities characterized in situ"

  - preferred_term: Methanogen Enrichment Medium
    culturemech_id: CultureMech:010045
    relationship_type: CULTIVATION_MEDIUM
    evidence:
      - reference: PMID:34836550
        supports: SUPPORT
        evidence_source: IN_VIVO
        snippet: "Methanogenic archaea detected in anoxic peat"

# Auto-discovered from MediaIngredientMech  
related_ingredients:
  - preferred_term: Humic acid
    mediaingredientmech_id: MediaIngredientMech:000523
    relevance: "Major peat organic matter component, provides carbon source"
    
  - preferred_term: Sphagnum moss extract
    mediaingredientmech_id: MediaIngredientMech:001234
    relevance: "Extracted from dominant peatland plant species"

Benefits

  1. Automated discovery: Find media/ingredients when adding communities
  2. Coverage tracking: Identify environments needing more resources
  3. Knowledge graph: Rich cross-repository linking
  4. User experience: Guided curation with suggestions
  5. FAIR data: Improved findability and interoperability

Success Metrics

  • Linking Coverage: % of communities with media/ingredient links
  • Query Success: Cross-repo queries return complete results
  • Curation Time: Time to add new community (should decrease)
  • Coverage Gaps: Identified environments needing resources

Related Issues

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions