Skip to content

LDolanLDolan/coma-metabolomics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COMA: Consensus Of Metabolite Annotations

A confidence-scoring framework for untargeted LC–MS metabolite annotation.

COMA sits above existing annotation tools. It accepts their outputs, standardises them, compares agreement at multiple levels of chemical evidence, scores confidence, and produces reproducible consensus tables.

Bring your annotation results. COMA tells you where they agree — and how much that agreement means.

Python 3.9+ Licence: MIT Status: Alpha


The problem COMA solves

Independent metabolite annotation tools produce divergent outputs from identical input data. A single LC–MS feature may match hundreds of candidate compounds — and two tools using the same probabilistic framework can return different top candidates for the same mass.

Currently, researchers either:

  • Pick one tool and ignore the others
  • Compare outputs manually in spreadsheets (not reproducible)
  • Apply a simple name-matching intersection (losing the graded nature of partial agreement)

COMA provides a principled, reproducible, tool-agnostic framework for comparing annotation outputs and deriving a confidence-weighted consensus.


The agreement hierarchy

COMA assesses agreement at five levels in descending order of evidential strength:

Level Name Criterion Biological meaning
5 EXACT_ID Same HMDB / KEGG / InChIKey Tools agree on the specific compound
4 STRUCTURAL_SKELETON Same InChIKey first block Same carbon connectivity, possibly different stereo
3 FORMULA_MASS Same formula within PPM tolerance Same elemental composition
2 PATHWAY Shared KEGG pathway Same metabolic context
1 NAME_SIMILARITY Similar compound names Possible synonyms
0 NO_CONSENSUS No agreement Tool-specific annotation

Quick start

from coma import read_ipa, read_ipapy2, compare_annotations
from coma import score_consensus, score_summary

# Load annotation outputs
ipa    = read_ipa("ipa_results.csv", ionisation_mode="positive")
ipapy2 = read_ipapy2("ipapy_results.csv", ionisation_mode="positive")

# Compare
results = compare_annotations(ipa, ipapy2, mass_tolerance_ppm=5.0)

# Score
ranked = score_consensus(results)

# Summary
print(score_summary(ranked))

# Export
import csv
with open("coma_consensus.csv", "w", newline="") as f:
    if ranked:
        writer = csv.DictWriter(f, fieldnames=ranked[0].to_row().keys())
        writer.writeheader()
        writer.writerows(r.to_row() for r in ranked)

Installation

# From source (recommended during alpha)
git clone https://github.com/LDolanLDolan/coma-metabolomics.git
cd coma-metabolomics
pip install -e .

PyPI release planned for v0.2.


Input format

COMA readers accept CSV files from annotation tools. Column names are resolved flexibly — COMA recognises common variants automatically.

Minimum required columns:

  • feature_id — unique identifier linking back to your feature table
  • At least one of: name, formula, hmdb_id, kegg_id, inchikey

Recommended additional columns:

  • mz — observed m/z
  • ppm_error or mass_error_ppm — mass error
  • rank — tool's candidate ranking
  • posterior or probability — tool's confidence score
  • pathway_ids — pipe-separated KEGG pathway IDs

Output

COMA produces a flat consensus table with one row per candidate pair comparison:

feature_id tool_a_annotation tool_b_annotation agreement_label confidence_score confidence_class
F001 L-Glutamine L-Glutamine Exact database ID match 9.0 High
F002 L-Serine Serine Name similarity 3.0 Low

Scientific context

COMA implements the convergence-based confidence filtering framework introduced in:

Doolan, L. (2026). A Dual-Method Convergence Framework for High-Confidence Metabolite Annotation in Untargeted LC–MS Metabolomics: Method Development and Translational Potential. Current Analytical Chemistry. [under review]

The E. coli dataset from that paper is included in data/example_dataset/ as the worked example.


Roadmap

v0.1 (current)

  • R IPA and ipaPy2 readers
  • Five-level agreement hierarchy
  • Additive confidence scoring model
  • Consensus table export

v0.2 (planned)

  • SIRIUS, GNPS, MetFrag readers
  • Probabilistic log-odds scoring model (weights learned from benchmark data)
  • Visualisation module
  • HTML report generation

Licence

MIT — see LICENSE


Author

Lita Doolan
MSc Medical Genomics, City St George's / King's College London
ORCID: 0009-0005-2615-0052
lita.doolan@kcl.ac.uk

About

Confidence-scoring framework for untargeted LC–MS metabolite annotation — consensus layer for R IPA, ipaPy2 and beyond

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages