MicroGrowAgents includes comprehensive genome function interpretation capabilities using Bakta GFF3 annotations from 57 bacterial genomes (667,502 features).
The genome function system provides organism-specific media formulation by:
- Auxotrophy Detection: Identifying missing biosynthetic pathways
- Enzyme Analysis: Querying EC numbers with wildcard support
- Cofactor Requirements: Determining essential cofactors that cannot be biosynthesized
- Transporter Analysis: Finding nutrient uptake genes for concentration refinement
- 57 Bakta-annotated genomes from diverse bacteria
- 667,502 total features (623,586 CDS, 7,236 tRNA, 660 rRNA)
- 100% organism linkage to NCBI Taxonomy
- Annotation coverage: 14.8% EC, 18.6% GO, 10.7% KEGG, 24.2% COG
Find enzymes by EC number with wildcard support:
# Find all oxidoreductases (EC 1.*.*.*)
uv run python -c "
from microgrowagents.agents.kg_reasoning_agent import KGReasoningAgent
from pathlib import Path
agent = KGReasoningAgent(Path('data/processed/microgrow.duckdb'))
result = agent.run('genome_enzymes SAMN00114986 1.1.*')
print(f\"Found {result['data']['count']} enzymes\")
for enzyme in result['data']['enzymes'][:3]:
print(f\" {enzyme['gene_symbol']}: {enzyme['product']}\")
"Identify metabolic auxotrophies from genome analysis:
uv run python -c "
from microgrowagents.agents.genome_function_agent import GenomeFunctionAgent
from pathlib import Path
agent = GenomeFunctionAgent(Path('data/processed/microgrow.duckdb'))
result = agent.detect_auxotrophies(
query='detect auxotrophies',
organism='SAMN00114986'
)
print(f\"Detected {result['data']['summary']['auxotrophies_detected']} auxotrophies\")
for aux in result['data']['auxotrophies']:
print(f\" {aux['pathway_name']}: {', '.join(aux['nutrients'])}\")
print(f\" Confidence: {aux['confidence']}, Completeness: {aux['completeness']:.1%}\")
"Search for nutrient transporter genes:
uv run python -c "
from microgrowagents.agents.genome_function_agent import GenomeFunctionAgent
from pathlib import Path
agent = GenomeFunctionAgent(Path('data/processed/microgrow.duckdb'))
result = agent.find_transporters(
query='find glucose transporters',
organism='SAMN00114986',
substrate='glucose'
)
print(f\"Found {len(result['data']['transporters'])} transporters\")
for transporter in result['data']['transporters'][:3]:
print(f\" {transporter['gene_symbol']}: {transporter['product']}\")
print(f\" Family: {transporter['family']}, Affinity: {transporter['affinity']}\")
"Prompt to Claude Code:
Analyze the metabolic capabilities of SAMN00114986 genome:
1. Find all oxidoreductase enzymes (EC 1.*.*.*)
2. Detect any biosynthetic auxotrophies
3. Identify transporter genes for iron and glucose
4. Summarize the organism's metabolic profile
Use the GenomeFunctionAgent and provide a detailed report.
What Claude Code will do:
- Import and initialize GenomeFunctionAgent
- Call find_enzymes() with EC wildcard pattern
- Call detect_auxotrophies() for pathway analysis
- Call find_transporters() for iron and glucose
- Generate a structured markdown report with findings
Prompt to Claude Code:
Compare the metabolic capabilities of SAMN00114986 and SAMN00766392:
1. Compare their auxotrophy profiles
2. Compare their enzyme counts by EC class
3. Identify unique transporters in each organism
4. Generate a comparison table
Show which organism is more metabolically versatile.
What Claude Code will do:
- Run genome queries for both organisms in parallel
- Compare auxotrophies, enzymes, and transporters
- Calculate similarity metrics
- Generate side-by-side comparison table
- Provide recommendation on metabolic versatility
Prompt to Claude Code:
Design a defined medium for SAMN02194963 using genome analysis:
1. Detect all auxotrophies from the genome
2. Identify essential cofactors that cannot be biosynthesized
3. Find transporter genes for key nutrients
4. Use MediaFormulationAgent to create formulation
5. Use GenMediaConcAgent to predict concentrations with transporter adjustments
Create a complete medium recipe with rationale.
What Claude Code will do:
- Run comprehensive genome analysis
- Detect auxotrophies automatically
- Identify required cofactors
- Query for transporter genes
- Integrate with MediaFormulationAgent to add auxotrophy supplements
- Refine concentrations based on transporter presence/affinity
- Generate complete medium formulation with evidence
Prompt to Claude Code:
I have a bacterial strain (SAMN05421681) that grows poorly on minimal medium.
Help me optimize the medium:
1. Detect auxotrophies from its genome
2. For each detected auxotrophy, recommend specific supplements
3. Find any missing cofactors
4. Check for transporter limitations
5. Generate an optimized medium formulation
Explain the reasoning for each recommendation.
What Claude Code will do:
- Analyze genome for biosynthetic pathway completeness
- Identify high-confidence auxotrophies
- Map auxotrophies to required nutrients
- Check cofactor biosynthesis capabilities
- Analyze transporter genes for uptake efficiency
- Create optimized formulation with detailed rationale
- Explain each supplement addition with genomic evidence
Prompt to Claude Code:
I want to engineer SAMN00114986 for increased production of L-methionine.
Analyze the genome to understand:
1. Current methionine biosynthesis pathway completeness
2. All enzymes involved (with EC numbers and locations)
3. Transporters for methionine precursors
4. Potential bottleneck enzymes
5. Required cofactors and their biosynthesis status
Suggest which genes might need overexpression.
What Claude Code will do:
- Check methionine pathway completeness
- List all pathway enzymes with genomic coordinates
- Identify precursor transporters
- Calculate pathway completeness scores
- Identify missing or low-copy enzymes
- Check cofactor dependencies
- Generate engineering recommendations with rationale
When using MediaFormulationAgent with an organism parameter, detected auxotrophies are automatically added:
from microgrowagents.agents.media_formulation_agent import MediaFormulationAgent
from pathlib import Path
agent = MediaFormulationAgent(Path('data/processed/microgrow.duckdb'))
result = agent.run(
query="Design minimal defined medium",
organism="SAMN02194963",
growth_conditions={"temperature": 37, "pH": 7.0},
formulation_goals=["minimal", "defined"]
)
# result["data"]["formulation"] will automatically include:
# - Nutrients for detected auxotrophies (high/medium confidence)
# - Essential cofactors that cannot be biosynthesized
# - Evidence from genome analysis in rationaleGenMediaConcAgent automatically refines concentrations based on transporter presence:
from microgrowagents.agents.gen_media_conc_agent import GenMediaConcAgent
from pathlib import Path
agent = GenMediaConcAgent(Path('data/processed/microgrow.duckdb'))
result = agent.run(
query="glucose,iron_sulfate,thiamine",
mode="ingredients",
organism="NCBITaxon:562" # E. coli
)
# Concentrations are adjusted:
# - No transporter: +50% (passive diffusion needs higher concentration)
# - High-affinity transporter: -25% (efficient uptake)
# - Low/medium affinity: default rangeGenome queries are integrated into KGReasoningAgent:
from microgrowagents.agents.kg_reasoning_agent import KGReasoningAgent
from pathlib import Path
agent = KGReasoningAgent(Path('data/processed/microgrow.duckdb'))
# Query enzymes
result = agent.run("genome_enzymes SAMN00114986 2.7.*")
# Detect auxotrophies
result = agent.run("genome_auxotrophies SAMN02194963")
# Find transporters
result = agent.run("genome_transporters SAMN00114986 iron")Run comprehensive genome integration tests:
# Run all genome tests
uv run pytest tests/test_genome_integration.py -v
# Run specific test class
uv run pytest tests/test_genome_integration.py::TestGenomeFunctionAgent -v
# Run with coverage
uv run pytest tests/test_genome_integration.py --cov=microgrowagents.agents.genome_function_agentTo reload genome data or add new genomes:
# Initialize genome schema
uv run python scripts/init_genome_schema.py
# Load all GFF3 files
uv run python scripts/load_genomes.py
# Test loading
uv run python scripts/test_genome_agent.pyThe genome function system consists of:
- GFF3Parser (
parsers/gff3_parser.py): Parses Bakta GFF3 files - NCBILookupService (
services/ncbi_lookup.py): Maps SAMN IDs to organisms - GenomeDataLoader (
database/genome_loader.py): Batch loads genomes into DuckDB - GenomeFunctionAgent (
agents/genome_function_agent.py): Core query interface - Integration: Automatic integration with media formulation and concentration agents
- Database: 667,502 features with 9 indexes
- Query time: <100ms for most queries
- Loading: ~45 seconds for all 57 genomes (cached NCBI lookups)
- Tests: 17 comprehensive tests, all passing
- Pathway completeness uses simplified KEGG pathway definitions (3 pathways currently)
- Cofactor mapping is heuristic-based (EC class → cofactor)
- Transporter affinity is inferred from product descriptions
- Full implementation would integrate BRENDA, KEGG, and BioCyc databases
- Expand pathway database to 35+ biosynthetic pathways
- Integrate BRENDA for comprehensive enzyme-cofactor mapping
- Add BioCyc pathway definitions for improved completeness scoring
- Implement transporter Km prediction from homology
- Support custom GFF3 file uploads
- Add comparative genomics visualizations