A confidence-scoring framework for untargeted LC–MS metabolite annotation.
COMA sits above existing annotation tools. It accepts their outputs, standardises them, compares agreement at multiple levels of chemical evidence, scores confidence, and produces reproducible consensus tables.
Bring your annotation results. COMA tells you where they agree — and how much that agreement means.
Independent metabolite annotation tools produce divergent outputs from identical input data. A single LC–MS feature may match hundreds of candidate compounds — and two tools using the same probabilistic framework can return different top candidates for the same mass.
Currently, researchers either:
- Pick one tool and ignore the others
- Compare outputs manually in spreadsheets (not reproducible)
- Apply a simple name-matching intersection (losing the graded nature of partial agreement)
COMA provides a principled, reproducible, tool-agnostic framework for comparing annotation outputs and deriving a confidence-weighted consensus.
COMA assesses agreement at five levels in descending order of evidential strength:
| Level | Name | Criterion | Biological meaning |
|---|---|---|---|
| 5 | EXACT_ID | Same HMDB / KEGG / InChIKey | Tools agree on the specific compound |
| 4 | STRUCTURAL_SKELETON | Same InChIKey first block | Same carbon connectivity, possibly different stereo |
| 3 | FORMULA_MASS | Same formula within PPM tolerance | Same elemental composition |
| 2 | PATHWAY | Shared KEGG pathway | Same metabolic context |
| 1 | NAME_SIMILARITY | Similar compound names | Possible synonyms |
| 0 | NO_CONSENSUS | No agreement | Tool-specific annotation |
from coma import read_ipa, read_ipapy2, compare_annotations
from coma import score_consensus, score_summary
# Load annotation outputs
ipa = read_ipa("ipa_results.csv", ionisation_mode="positive")
ipapy2 = read_ipapy2("ipapy_results.csv", ionisation_mode="positive")
# Compare
results = compare_annotations(ipa, ipapy2, mass_tolerance_ppm=5.0)
# Score
ranked = score_consensus(results)
# Summary
print(score_summary(ranked))
# Export
import csv
with open("coma_consensus.csv", "w", newline="") as f:
if ranked:
writer = csv.DictWriter(f, fieldnames=ranked[0].to_row().keys())
writer.writeheader()
writer.writerows(r.to_row() for r in ranked)# From source (recommended during alpha)
git clone https://github.com/LDolanLDolan/coma-metabolomics.git
cd coma-metabolomics
pip install -e .PyPI release planned for v0.2.
COMA readers accept CSV files from annotation tools. Column names are resolved flexibly — COMA recognises common variants automatically.
Minimum required columns:
feature_id— unique identifier linking back to your feature table- At least one of:
name,formula,hmdb_id,kegg_id,inchikey
Recommended additional columns:
mz— observed m/zppm_errorormass_error_ppm— mass errorrank— tool's candidate rankingposteriororprobability— tool's confidence scorepathway_ids— pipe-separated KEGG pathway IDs
COMA produces a flat consensus table with one row per candidate pair comparison:
| feature_id | tool_a_annotation | tool_b_annotation | agreement_label | confidence_score | confidence_class |
|---|---|---|---|---|---|
| F001 | L-Glutamine | L-Glutamine | Exact database ID match | 9.0 | High |
| F002 | L-Serine | Serine | Name similarity | 3.0 | Low |
COMA implements the convergence-based confidence filtering framework introduced in:
Doolan, L. (2026). A Dual-Method Convergence Framework for High-Confidence Metabolite Annotation in Untargeted LC–MS Metabolomics: Method Development and Translational Potential. Current Analytical Chemistry. [under review]
The E. coli dataset from that paper is included in data/example_dataset/ as the worked example.
v0.1 (current)
- R IPA and ipaPy2 readers
- Five-level agreement hierarchy
- Additive confidence scoring model
- Consensus table export
v0.2 (planned)
- SIRIUS, GNPS, MetFrag readers
- Probabilistic log-odds scoring model (weights learned from benchmark data)
- Visualisation module
- HTML report generation
MIT — see LICENSE
Lita Doolan
MSc Medical Genomics, City St George's / King's College London
ORCID: 0009-0005-2615-0052
lita.doolan@kcl.ac.uk