The ChemSearch MCP (Model Context Protocol) Server provides AI agents with access to chemical compound search and genome analysis capabilities through a standardized interface.
The MCP server exposes the core ChemSearch functionality, allowing AI agents to:
- Search for chemical compounds using SMILES strings or ChEBI identifiers
- Identify genomes with synthetic potential for target compounds
- Manage database configurations
- Access compound and genome database information
The server is implemented in src/mcp_server_full.py and provides:
- Resources: Database information endpoint
- Tools: Seven main tools for compound search and configuration
- Configuration: Flexible database path management
Search for compounds using SMILES chemical structure representations.
Parameters:
smiles: Array of SMILES strings (required)threshold: Similarity threshold 0-1 (default: 0.7)only_best_hit: Return only best match per compound (default: true)include_genome_hits: Include genome analysis (default: true)
Search for compounds using ChEBI database identifiers.
Parameters:
chebi_ids: Array of ChEBI IDs (e.g., 'CHEBI:15377' or '15377')threshold: Similarity threshold 0-1 (default: 0.7)include_exact_matches: Include exact ChEBI matches (default: true)include_similar_structures: Include similar structures (default: true)only_best_hit: Return only best match per compound (default: true)include_genome_hits: Include genome analysis (default: true)
Get statistics and information about loaded databases.
Returns:
- Compound database statistics (total compounds, unique EC numbers, SMILES coverage)
- Genome database statistics (total genomes, sequences, sample IDs)
- Fingerprint database status
Get current database configuration paths and their existence status.
Returns:
- All configured database paths
- File/directory existence status for each path
Configure the compounds database file path.
Parameters:
compounds_db_path: Path to compounds TSV file (required)fingerprint_db_path: Optional fingerprint database path
Configure the genome database directory path.
Parameters:
genome_db_path: Path to genome database directory (required)
Configure the taxonomy mapping file path.
Parameters:
taxonomy_path: Path to taxonomy TSV file (required)
The server supports flexible database configuration through command-line arguments or runtime tool calls:
python src/mcp_server_full.py \
--compounds-db path/to/compounds.tsv \
--genome-db path/to/genome_dir \
--taxonomy path/to/taxonomy.tsvIf no paths are provided, the server uses test data:
- Compounds:
tests/data/rhea-compounds.tsv - Genomes:
tests/data/genomes_ec/ - Taxonomy:
data/refseq/refseq_taxonomy_mapping.tsv - Fingerprints:
tests/data/rhea_fingerprints.h5
# With default test data
python src/mcp_server_full.py
# With custom databases
python src/mcp_server_full.py \
--compounds-db /path/to/your/compounds.tsv \
--genome-db /path/to/your/genomes/ \
--taxonomy /path/to/your/taxonomy.tsvCompound Results include:
- ChEBI ID and compound name
- SMILES structure
- Associated EC numbers
- Similarity scores
Genome Results include:
- Genome identifier
- Taxonomic information (if available)
- Matching EC numbers
- Compound associations
The server includes comprehensive error handling:
- Database file validation
- Graceful fallbacks for missing data
- Detailed error messages in tool responses
- Logging for debugging
The server provides a single resource endpoint:
chemsearch://database-info: JSON-formatted database information
The MCP server can be integrated with any MCP-compatible AI system to provide chemical search capabilities. It maintains compatibility with the Model Context Protocol specification and provides structured JSON responses for all operations.