COMA: Consensus Of Metabolite Annotations

A confidence-scoring framework for untargeted LC–MS metabolite annotation.

COMA sits above existing annotation tools. It accepts their outputs, standardises them, compares agreement at multiple levels of chemical evidence, scores confidence, and produces reproducible consensus tables.

Bring your annotation results. COMA tells you where they agree — and how much that agreement means.

The problem COMA solves

Independent metabolite annotation tools produce divergent outputs from identical input data. A single LC–MS feature may match hundreds of candidate compounds — and two tools using the same probabilistic framework can return different top candidates for the same mass.

Currently, researchers either:

Pick one tool and ignore the others
Compare outputs manually in spreadsheets (not reproducible)
Apply a simple name-matching intersection (losing the graded nature of partial agreement)

COMA provides a principled, reproducible, tool-agnostic framework for comparing annotation outputs and deriving a confidence-weighted consensus.

The agreement hierarchy

COMA assesses agreement at five levels in descending order of evidential strength:

Level	Name	Criterion	Biological meaning
5	EXACT_ID	Same HMDB / KEGG / InChIKey	Tools agree on the specific compound
4	STRUCTURAL_SKELETON	Same InChIKey first block	Same carbon connectivity, possibly different stereo
3	FORMULA_MASS	Same formula within PPM tolerance	Same elemental composition
2	PATHWAY	Shared KEGG pathway	Same metabolic context
1	NAME_SIMILARITY	Similar compound names	Possible synonyms
0	NO_CONSENSUS	No agreement	Tool-specific annotation

Quick start

from coma import read_ipa, read_ipapy2, compare_annotations
from coma import score_consensus, score_summary

# Load annotation outputs
ipa    = read_ipa("ipa_results.csv", ionisation_mode="positive")
ipapy2 = read_ipapy2("ipapy_results.csv", ionisation_mode="positive")

# Compare
results = compare_annotations(ipa, ipapy2, mass_tolerance_ppm=5.0)

# Score
ranked = score_consensus(results)

# Summary
print(score_summary(ranked))

# Export
import csv
with open("coma_consensus.csv", "w", newline="") as f:
    if ranked:
        writer = csv.DictWriter(f, fieldnames=ranked[0].to_row().keys())
        writer.writeheader()
        writer.writerows(r.to_row() for r in ranked)

Installation

# From source (recommended during alpha)
git clone https://github.com/LDolanLDolan/coma-metabolomics.git
cd coma-metabolomics
pip install -e .

PyPI release planned for v0.2.

Input format

COMA readers accept CSV files from annotation tools. Column names are resolved flexibly — COMA recognises common variants automatically.

Minimum required columns:

feature_id — unique identifier linking back to your feature table
At least one of: name, formula, hmdb_id, kegg_id, inchikey

Recommended additional columns:

mz — observed m/z
ppm_error or mass_error_ppm — mass error
rank — tool's candidate ranking
posterior or probability — tool's confidence score
pathway_ids — pipe-separated KEGG pathway IDs

Output

COMA produces a flat consensus table with one row per candidate pair comparison:

feature_id	tool_a_annotation	tool_b_annotation	agreement_label	confidence_score	confidence_class
F001	L-Glutamine	L-Glutamine	Exact database ID match	9.0	High
F002	L-Serine	Serine	Name similarity	3.0	Low

Scientific context

COMA implements the convergence-based confidence filtering framework introduced in:

Doolan, L. (2026). A Dual-Method Convergence Framework for High-Confidence Metabolite Annotation in Untargeted LC–MS Metabolomics: Method Development and Translational Potential. Current Analytical Chemistry. [under review]

The E. coli dataset from that paper is included in data/example_dataset/ as the worked example.

Roadmap

v0.1 (current)

R IPA and ipaPy2 readers
Five-level agreement hierarchy
Additive confidence scoring model
Consensus table export

v0.2 (planned)

SIRIUS, GNPS, MetFrag readers
Probabilistic log-odds scoring model (weights learned from benchmark data)
Visualisation module
HTML report generation

Licence

MIT — see LICENSE

Author

Lita Doolan
MSc Medical Genomics, City St George's / King's College London
ORCID: 0009-0005-2615-0052
lita.doolan@kcl.ac.uk

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
coma		coma
data/example_dataset		data/example_dataset
tests		tests
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COMA: Consensus Of Metabolite Annotations

The problem COMA solves

The agreement hierarchy

Quick start

Installation

Input format

Output

Scientific context

Roadmap

Licence

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

COMA: Consensus Of Metabolite Annotations

The problem COMA solves

The agreement hierarchy

Quick start

Installation

Input format

Output

Scientific context

Roadmap

Licence

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages