Propose 36 literature-backed ENVIRONMENT + METABOLISM traits#84
Merged
Conversation
Adds candidate trait records for coverage gaps in two categories, each backed by >=2 distinct verified literature citations and enforced in CI. Schema: - Add PROPOSED value to MappingStatusEnum (candidate traits from literature research; must carry >=2 distinct citations). Validation: - New scripts/audit_proposals.py enforces >=2 distinct, well-formed (PMID/DOI/URL) citations per PROPOSED record, counted across definition_source + evidence[].reference. Wired into the `qc` justfile target (and thus CI). Emits reports/proposal_citation_audit.tsv. - tests/test_audit_proposals.py locks the rule. - Relax tests/test_seed.py to allow traitmech: identifiers and PROPOSED status. ENVIRONMENT proposals (18, traitmech:000001-000018): pressure/piezophily, radiation (ionizing/UV), desiccation/xerophily, and heavy-metal/metalloid tolerance (Cd/Zn/Co/Hg/As/Cu) families. METABOLISM proposals (21, traitmech:000019-000039): six autotrophic carbon-fixation pathways (+carbon_fixation head), product-specific fermentations, DNRA, dissimilatory iron reduction, manganese oxidation, anaerobic oxidation of methane, oxygenic/anoxygenic photosynthesis, proteorhodopsin phototrophy, plus intermediate axis classes (phototrophy, photosynthesis, dissimilatory_metal_reduction). DNRA/AOM/ metal-reduction parent to the existing METPO:1000802 anaerobic respiration. Reports: reports/environment_trait_proposals.md, reports/metabolism_trait_proposals.md. Verification: validate-strict 0 errors over 396 files; audit-proposals 39/39 PROPOSED passing; pytest 70 passed; minted IDs contiguous 000001-000039. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds 39 PROPOSED candidate TraitRecord YAMLs (18 ENVIRONMENT + 21 METABOLISM) filling clear coverage gaps (pressure/piezophily, radiation, desiccation, heavy-metal tolerance, carbon-fixation pathways, product-specific fermentations, DNRA, AOM, metal redox, phototrophy/photosynthesis). Each record carries ≥2 distinct verified citations, with a new PROPOSED enum value and a new scripts/audit_proposals.py audit (wired into just qc / CI) enforcing the citation bar.
Changes:
- Schema: add
PROPOSEDvalue toMappingStatusEnum; relaxtests/test_seed.pyto accepttraitmech:IDs and the new status. - New
scripts/audit_proposals.py+ unit tests;audit-proposalsjustfile recipe added to theqccomposite target. - 39 candidate trait YAMLs under
data/traits/{environment,metabolism}/mintedtraitmech:000001–traitmech:000039, plus two narrative reports and a generated TSV audit report.
Reviewed changes
Copilot reviewed 47 out of 47 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/traitmech/schema/traitmech.yaml | Add PROPOSED mapping status with description referencing the citation audit. |
| scripts/audit_proposals.py | New citation-bar audit for PROPOSED records (≥2 distinct, well-formed citations). |
| tests/test_audit_proposals.py | Locks the audit rules (skip non-PROPOSED, dedupe, placeholders, malformed). |
| tests/test_seed.py | Accept traitmech: identifiers and PROPOSED status. |
| justfile | Add audit-proposals recipe and chain it into qc. |
| reports/proposal_citation_audit.tsv | Generated audit output (uses absolute filesystem paths — see comment). |
| reports/environment_trait_proposals.md, reports/metabolism_trait_proposals.md | Narrative justification + citation indexes for the two proposal cohorts. |
| data/traits/environment/*.yaml (18 files) | New PROPOSED ENVIRONMENT traits (pressure, radiation, desiccation, metal tolerance). |
| data/traits/metabolism/*.yaml (21 files) | New PROPOSED METABOLISM traits (carbon fixation pathways, fermentations, DNRA, AOM, metal redox, phototrophy). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The proposal citation audit wrote str(path) where path is anchored at _REPO_ROOT (absolute), so the committed reports/proposal_citation_audit.tsv contained one contributor's absolute filesystem paths and would produce a spurious diff on every CI/contributor run. Relativize to _REPO_ROOT before writing, and regenerate the TSV. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds 39 PROPOSED candidate trait records (18 ENVIRONMENT + 21 METABOLISM) filling clear coverage gaps in two categories. Every PROPOSED record carries ≥2 distinct, verified literature citations (PMID/DOI), and that bar is enforced in CI.
Schema
PROPOSEDvalue toMappingStatusEnum.Validation
scripts/audit_proposals.py: requires ≥2 distinct, well-formed citations per PROPOSED record (counted acrossdefinition_source∪evidence[].reference); wired into theqcjustfile target and CI; emitsreports/proposal_citation_audit.tsv.tests/test_audit_proposals.pylocks the rule;tests/test_seed.pyrelaxed to allowtraitmech:IDs andPROPOSED.ENVIRONMENT (traitmech:000001–000018)
Pressure/piezophily, radiation (ionizing/UV), desiccation/xerophily, heavy-metal/metalloid tolerance (Cd/Zn/Co/Hg/As/Cu).
METABOLISM (traitmech:000019–000039)
Six autotrophic carbon-fixation pathways (+
carbon_fixationhead), product-specific fermentations, DNRA, dissimilatory iron reduction, manganese oxidation, anaerobic oxidation of methane, oxygenic/anoxygenic photosynthesis, proteorhodopsin phototrophy, plus intermediate axis classes (phototrophy,photosynthesis,dissimilatory_metal_reduction). DNRA/AOM/metal-reduction parent to the existingMETPO:1000802anaerobic respiration class.Reports
reports/environment_trait_proposals.md,reports/metabolism_trait_proposals.md.Verification
just validate-strict→ 0 errors over 396 filesjust qc/audit-proposals→ 39/39 PROPOSED passingpytest→ 70 passedtraitmech:parent refs resolve🤖 Generated with Claude Code