Skip to content

Latest commit

 

History

History
138 lines (102 loc) · 4.29 KB

File metadata and controls

138 lines (102 loc) · 4.29 KB

ChemSearch MCP Server

The ChemSearch MCP (Model Context Protocol) Server provides AI agents with access to chemical compound search and genome analysis capabilities through a standardized interface.

Overview

The MCP server exposes the core ChemSearch functionality, allowing AI agents to:

  • Search for chemical compounds using SMILES strings or ChEBI identifiers
  • Identify genomes with synthetic potential for target compounds
  • Manage database configurations
  • Access compound and genome database information

Architecture

The server is implemented in src/mcp_server_full.py and provides:

  • Resources: Database information endpoint
  • Tools: Seven main tools for compound search and configuration
  • Configuration: Flexible database path management

Available Tools

1. search_compounds_by_smiles

Search for compounds using SMILES chemical structure representations.

Parameters:

  • smiles: Array of SMILES strings (required)
  • threshold: Similarity threshold 0-1 (default: 0.7)
  • only_best_hit: Return only best match per compound (default: true)
  • include_genome_hits: Include genome analysis (default: true)

2. search_compounds_by_chebi_ids

Search for compounds using ChEBI database identifiers.

Parameters:

  • chebi_ids: Array of ChEBI IDs (e.g., 'CHEBI:15377' or '15377')
  • threshold: Similarity threshold 0-1 (default: 0.7)
  • include_exact_matches: Include exact ChEBI matches (default: true)
  • include_similar_structures: Include similar structures (default: true)
  • only_best_hit: Return only best match per compound (default: true)
  • include_genome_hits: Include genome analysis (default: true)

3. get_database_info

Get statistics and information about loaded databases.

Returns:

  • Compound database statistics (total compounds, unique EC numbers, SMILES coverage)
  • Genome database statistics (total genomes, sequences, sample IDs)
  • Fingerprint database status

4. get_current_config

Get current database configuration paths and their existence status.

Returns:

  • All configured database paths
  • File/directory existence status for each path

5. set_compounds_database

Configure the compounds database file path.

Parameters:

  • compounds_db_path: Path to compounds TSV file (required)
  • fingerprint_db_path: Optional fingerprint database path

6. set_genome_database

Configure the genome database directory path.

Parameters:

  • genome_db_path: Path to genome database directory (required)

7. set_taxonomy_file

Configure the taxonomy mapping file path.

Parameters:

  • taxonomy_path: Path to taxonomy TSV file (required)

Database Configuration

The server supports flexible database configuration through command-line arguments or runtime tool calls:

Command Line Arguments

python src/mcp_server_full.py \
  --compounds-db path/to/compounds.tsv \
  --genome-db path/to/genome_dir \
  --taxonomy path/to/taxonomy.tsv

Default Paths

If no paths are provided, the server uses test data:

  • Compounds: tests/data/rhea-compounds.tsv
  • Genomes: tests/data/genomes_ec/
  • Taxonomy: data/refseq/refseq_taxonomy_mapping.tsv
  • Fingerprints: tests/data/rhea_fingerprints.h5

Usage

Starting the Server

# With default test data
python src/mcp_server_full.py

# With custom databases
python src/mcp_server_full.py \
  --compounds-db /path/to/your/compounds.tsv \
  --genome-db /path/to/your/genomes/ \
  --taxonomy /path/to/your/taxonomy.tsv

Example Search Results

Compound Results include:

  • ChEBI ID and compound name
  • SMILES structure
  • Associated EC numbers
  • Similarity scores

Genome Results include:

  • Genome identifier
  • Taxonomic information (if available)
  • Matching EC numbers
  • Compound associations

Error Handling

The server includes comprehensive error handling:

  • Database file validation
  • Graceful fallbacks for missing data
  • Detailed error messages in tool responses
  • Logging for debugging

Resources

The server provides a single resource endpoint:

  • chemsearch://database-info: JSON-formatted database information

Integration

The MCP server can be integrated with any MCP-compatible AI system to provide chemical search capabilities. It maintains compatibility with the Model Context Protocol specification and provides structured JSON responses for all operations.