Predicate residual v5: rename 10 paraphrases + re-ground (−56 edges)#82
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
Normalizes additional residual free-text causal-edge predicate labels to existing canonical predicate labels already mapped in mappings/predicate_grounding.tsv, then re-runs grounding to populate predicate_id for the affected edges (reducing residual edges by 56).
Changes:
- Expanded
scripts/rename_predicate_labels.pywith 10 additional label renames for the v5 residual cohort. - Re-grounded affected trait YAML causal edges by updating
predicate:and fillingpredicate_idfor the 56 impacted edges. - Updated the residual report to reflect the reduced set of ungrounded predicate labels.
Reviewed changes
Copilot reviewed 43 out of 43 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| scripts/rename_predicate_labels.py | Adds the next cohort of predicate-label renames (old → canonical label) used prior to grounding. |
| reports/predicate_grounding_residual.tsv | Updates residual predicate report after renames + grounding remove several labels from the residual set. |
| data/traits/physiology/trophic_type.yaml | Renames determines→causes and adds predicate_id: biolink:causes on affected edges. |
| data/traits/physiology/photolithotrophic.yaml | Renames powers→enables and adds predicate_id: RO:0002327. |
| data/traits/physiology/nutrient_adaptation.yaml | Renames triggers→causes and adds predicate_id: biolink:causes. |
| data/traits/physiology/methanotrophic.yaml | Renames input to→participates in and adds predicate_id: biolink:participates_in. |
| data/traits/physiology/carboxydotrophic.yaml | Renames requires→depends on and adds predicate_id: RO:0002502. |
| data/traits/morphology/spiral_shaped.yaml | Renames influences→regulates and adds predicate_id: RO:0002211. |
| data/traits/morphology/sphere_shaped.yaml | Renames organizes→enables and adds predicate_id: RO:0002327. |
| data/traits/morphology/rod_shaped.yaml | Renames organizes→enables and adds predicate_id: RO:0002327. |
| data/traits/morphology/motility.yaml | Renames powers→enables and adds predicate_id: RO:0002327. |
| data/traits/morphology/motile.yaml | Renames powers→enables and adds predicate_id: RO:0002327. |
| data/traits/morphology/gram_stain.yaml | Renames influences→regulates and determines→causes, adds corresponding predicate_ids. |
| data/traits/morphology/gliding.yaml | Renames powers→enables and adds predicate_id: RO:0002327. |
| data/traits/morphology/flagellated.yaml | Renames powers→enables and adds predicate_id: RO:0002327. |
| data/traits/morphology/filament_shaped.yaml | Renames organizes→enables and adds predicate_id: RO:0002327. |
| data/traits/morphology/crescent_shaped.yaml | Renames constrains→regulates and adds predicate_id: RO:0002211. |
| data/traits/morphology/cell_width.yaml | Renames organizes→enables and constrains→regulates, adds corresponding predicate_ids. |
| data/traits/morphology/cell_width_small.yaml | Renames organizes→enables and adds predicate_id: RO:0002327. |
| data/traits/morphology/cell_shape.yaml | Renames determines→causes and adds predicate_id: biolink:causes. |
| data/traits/morphology/cell_length.yaml | Renames triggers→causes and adds predicate_id: biolink:causes. |
| data/traits/morphology/bacillus_shaped.yaml | Renames organizes→enables and adds predicate_id: RO:0002327. |
| data/traits/metabolism/syntrophy.yaml | Renames input to→participates in and adds predicate_id: biolink:participates_in. |
| data/traits/metabolism/methanogenesis.yaml | Renames occurs under→occurs in and adds predicate_id: biolink:occurs_in. |
| data/traits/metabolism/metabolism.yaml | Renames powers→enables and adds predicate_id: RO:0002327. |
| data/traits/metabolism/homoacetogenesis.yaml | Renames occurs under→occurs in and input to→participates in, adds corresponding predicate_ids. |
| data/traits/metabolism/disproportionation.yaml | Renames occurs under→occurs in and adds predicate_id: biolink:occurs_in. |
| data/traits/metabolism/anaerobic_respiration.yaml | Renames occurs under→occurs in and adds predicate_id: biolink:occurs_in. |
| data/traits/metabolism/acetogenesis.yaml | Renames occurs under→occurs in and input to→participates in, adds corresponding predicate_ids. |
| data/traits/environment/thermotolerant.yaml | Renames triggers→causes and adds predicate_id: biolink:causes. |
| data/traits/environment/temperature_range.yaml | Renames sets→defines and adds predicate_id: METPO:2007500 on affected edges. |
| data/traits/environment/temperature_preference.yaml | Renames influences→regulates and constrains→regulates, adds predicate_id: RO:0002211. |
| data/traits/environment/temperature_optimum.yaml | Renames influences→regulates and adds predicate_id: RO:0002211. |
| data/traits/environment/stenohaline.yaml | Renames constrains→regulates and adds predicate_id: RO:0002211. |
| data/traits/environment/ph_range.yaml | Renames sets→defines and adds predicate_id: METPO:2007500 on affected edges. |
| data/traits/environment/ph_growth_preference.yaml | Renames influences→regulates and adds predicate_id: RO:0002211. |
| data/traits/environment/obligately_alkaphilic.yaml | Renames constrains→regulates and adds predicate_id: RO:0002211. |
| data/traits/environment/obligately_aerobic.yaml | Renames requires→depends on and adds predicate_id: RO:0002502. |
| data/traits/environment/nacl_range.yaml | Renames sets→defines and adds predicate_id: METPO:2007500 on affected edges. |
| data/traits/environment/nacl_optimum.yaml | Renames triggers→causes and adds predicate_id: biolink:causes. |
| data/traits/environment/halophily_preference.yaml | Renames influences→regulates and adds predicate_id: RO:0002211. |
| data/traits/environment/aerobic.yaml | Renames requires→depends on and adds predicate_id: RO:0002502 on affected edges. |
| data/traits/ecology/biosafety_level.yaml | Renames determines→causes and adds predicate_id: biolink:causes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
realmarcin
added a commit
that referenced
this pull request
May 26, 2026
Replace "PR_F" placeholder references in rename_predicate_labels.py
with descriptive cohort labels — the PR-cycle name isn't stable
once merged. Two occurrences:
"The next 10 rows land in PR_F (predicate residual v5)"
→ "The next 10 rows landed in the predicate residual v5 batch"
"# PR_F cohort"
→ "# predicate residual v5 cohort"
The reference to PR #67 (predicate residual v3, already merged) is
kept verbatim — that PR number is stable.
Verified:
- grep -r "PR_F" scripts mappings proposals → 0 hits
- just validate-strict → 0 ERROR rows
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
One-off migration following the PR #67 pattern — normalizes the next-tier of curator-paraphrased predicate labels to canonical synonyms already in the grounding mappings, then re-runs ground-predicates to fill predicate_id on the 56 affected edges. Renames added to scripts/rename_predicate_labels.py (10 new rules): influences → regulates (7 edges → RO:0002211) determines → causes (6 edges → biolink:causes) sets → defines (6 edges → METPO:2007500) constrains → regulates (6 edges → RO:0002211) input to → participates in (6 edges → biolink:participates_in) powers → enables (6 edges → RO:0002327) organizes → enables (6 edges → RO:0002327) occurs under → occurs in (5 edges → biolink:occurs_in) requires → depends on (4 edges → RO:0002502) triggers → causes (4 edges → biolink:causes) Skipped: engages (vague), incorporated into (wrong-direction), drives production of (too specific). Per-corpus impact: Edges grounded: +56 (from this batch) Distinct labels: −7 Idempotency: re-running both rename and ground-predicates produces no further changes. Verified: - rename --apply → 56 edits, 0 invalid files - ground-predicates --apply → 56 new groundings - validate-strict → 0 ERROR rows / 357 files Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0b0c679 to
37448c9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-on to PR #67 (predicate residual v3). Normalizes the next-tier of curator-paraphrased predicate labels to canonical synonyms already grounded in
mappings/predicate_grounding.tsv, then re-runsground-predicatesto fillpredicate_idon the 56 affected edges. No new METPO terms minted.Renames
influencesregulatesRO:0002211determinescausesbiolink:causessetsdefinesMETPO:2007500constrainsregulatesRO:0002211input toparticipates inbiolink:participates_inpowersenablesRO:0002327organizesenablesRO:0002327occurs underoccurs inbiolink:occurs_inrequiresdepends onRO:0002502triggerscausesbiolink:causesSkipped (semantically vague or wrong-direction)
engages(7 edges) — vague; no clean fitincorporated into(5 edges) —biolink:part_ofis static structural; "incorporated into" is dynamicdrives production of(5 edges) — too specific; would lose information mapping to barecausesCorpus impact
Verified locally
Test plan
🤖 Generated with Claude Code