Skip to content

Propose 36 literature-backed ENVIRONMENT + METABOLISM traits#84

Merged
realmarcin merged 2 commits into
mainfrom
propose-environment-metabolism-traits
Jun 1, 2026
Merged

Propose 36 literature-backed ENVIRONMENT + METABOLISM traits#84
realmarcin merged 2 commits into
mainfrom
propose-environment-metabolism-traits

Conversation

@realmarcin
Copy link
Copy Markdown
Contributor

Summary

Adds 39 PROPOSED candidate trait records (18 ENVIRONMENT + 21 METABOLISM) filling clear coverage gaps in two categories. Every PROPOSED record carries ≥2 distinct, verified literature citations (PMID/DOI), and that bar is enforced in CI.

Schema

  • Add PROPOSED value to MappingStatusEnum.

Validation

  • New scripts/audit_proposals.py: requires ≥2 distinct, well-formed citations per PROPOSED record (counted across definition_sourceevidence[].reference); wired into the qc justfile target and CI; emits reports/proposal_citation_audit.tsv.
  • tests/test_audit_proposals.py locks the rule; tests/test_seed.py relaxed to allow traitmech: IDs and PROPOSED.

ENVIRONMENT (traitmech:000001–000018)

Pressure/piezophily, radiation (ionizing/UV), desiccation/xerophily, heavy-metal/metalloid tolerance (Cd/Zn/Co/Hg/As/Cu).

METABOLISM (traitmech:000019–000039)

Six autotrophic carbon-fixation pathways (+carbon_fixation head), product-specific fermentations, DNRA, dissimilatory iron reduction, manganese oxidation, anaerobic oxidation of methane, oxygenic/anoxygenic photosynthesis, proteorhodopsin phototrophy, plus intermediate axis classes (phototrophy, photosynthesis, dissimilatory_metal_reduction). DNRA/AOM/metal-reduction parent to the existing METPO:1000802 anaerobic respiration class.

Reports

reports/environment_trait_proposals.md, reports/metabolism_trait_proposals.md.

Verification

  • just validate-strict → 0 errors over 396 files
  • just qc / audit-proposals39/39 PROPOSED passing
  • pytest70 passed
  • minted IDs contiguous 000001–000039; all traitmech: parent refs resolve

🤖 Generated with Claude Code

Adds candidate trait records for coverage gaps in two categories, each backed
by >=2 distinct verified literature citations and enforced in CI.

Schema:
- Add PROPOSED value to MappingStatusEnum (candidate traits from literature
  research; must carry >=2 distinct citations).

Validation:
- New scripts/audit_proposals.py enforces >=2 distinct, well-formed
  (PMID/DOI/URL) citations per PROPOSED record, counted across
  definition_source + evidence[].reference. Wired into the `qc` justfile
  target (and thus CI). Emits reports/proposal_citation_audit.tsv.
- tests/test_audit_proposals.py locks the rule.
- Relax tests/test_seed.py to allow traitmech: identifiers and PROPOSED status.

ENVIRONMENT proposals (18, traitmech:000001-000018): pressure/piezophily,
radiation (ionizing/UV), desiccation/xerophily, and heavy-metal/metalloid
tolerance (Cd/Zn/Co/Hg/As/Cu) families.

METABOLISM proposals (21, traitmech:000019-000039): six autotrophic
carbon-fixation pathways (+carbon_fixation head), product-specific
fermentations, DNRA, dissimilatory iron reduction, manganese oxidation,
anaerobic oxidation of methane, oxygenic/anoxygenic photosynthesis,
proteorhodopsin phototrophy, plus intermediate axis classes
(phototrophy, photosynthesis, dissimilatory_metal_reduction). DNRA/AOM/
metal-reduction parent to the existing METPO:1000802 anaerobic respiration.

Reports: reports/environment_trait_proposals.md,
reports/metabolism_trait_proposals.md.

Verification: validate-strict 0 errors over 396 files; audit-proposals 39/39
PROPOSED passing; pytest 70 passed; minted IDs contiguous 000001-000039.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 1, 2026 18:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds 39 PROPOSED candidate TraitRecord YAMLs (18 ENVIRONMENT + 21 METABOLISM) filling clear coverage gaps (pressure/piezophily, radiation, desiccation, heavy-metal tolerance, carbon-fixation pathways, product-specific fermentations, DNRA, AOM, metal redox, phototrophy/photosynthesis). Each record carries ≥2 distinct verified citations, with a new PROPOSED enum value and a new scripts/audit_proposals.py audit (wired into just qc / CI) enforcing the citation bar.

Changes:

  • Schema: add PROPOSED value to MappingStatusEnum; relax tests/test_seed.py to accept traitmech: IDs and the new status.
  • New scripts/audit_proposals.py + unit tests; audit-proposals justfile recipe added to the qc composite target.
  • 39 candidate trait YAMLs under data/traits/{environment,metabolism}/ minted traitmech:000001traitmech:000039, plus two narrative reports and a generated TSV audit report.

Reviewed changes

Copilot reviewed 47 out of 47 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/traitmech/schema/traitmech.yaml Add PROPOSED mapping status with description referencing the citation audit.
scripts/audit_proposals.py New citation-bar audit for PROPOSED records (≥2 distinct, well-formed citations).
tests/test_audit_proposals.py Locks the audit rules (skip non-PROPOSED, dedupe, placeholders, malformed).
tests/test_seed.py Accept traitmech: identifiers and PROPOSED status.
justfile Add audit-proposals recipe and chain it into qc.
reports/proposal_citation_audit.tsv Generated audit output (uses absolute filesystem paths — see comment).
reports/environment_trait_proposals.md, reports/metabolism_trait_proposals.md Narrative justification + citation indexes for the two proposal cohorts.
data/traits/environment/*.yaml (18 files) New PROPOSED ENVIRONMENT traits (pressure, radiation, desiccation, metal tolerance).
data/traits/metabolism/*.yaml (21 files) New PROPOSED METABOLISM traits (carbon fixation pathways, fermentations, DNRA, AOM, metal redox, phototrophy).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/audit_proposals.py
Comment thread reports/proposal_citation_audit.tsv Outdated
The proposal citation audit wrote str(path) where path is anchored at
_REPO_ROOT (absolute), so the committed reports/proposal_citation_audit.tsv
contained one contributor's absolute filesystem paths and would produce a
spurious diff on every CI/contributor run. Relativize to _REPO_ROOT before
writing, and regenerate the TSV.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@realmarcin realmarcin merged commit c86faec into main Jun 1, 2026
3 checks passed
@realmarcin realmarcin deleted the propose-environment-metabolism-traits branch June 1, 2026 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants