Skip to content

Predicate residual v5: rename 10 paraphrases + re-ground (−56 edges)#82

Merged
realmarcin merged 1 commit into
mainfrom
predicate-residual-v5-renames
May 26, 2026
Merged

Predicate residual v5: rename 10 paraphrases + re-ground (−56 edges)#82
realmarcin merged 1 commit into
mainfrom
predicate-residual-v5-renames

Conversation

@realmarcin
Copy link
Copy Markdown
Contributor

Summary

Follow-on to PR #67 (predicate residual v3). Normalizes the next-tier of curator-paraphrased predicate labels to canonical synonyms already grounded in mappings/predicate_grounding.tsv, then re-runs ground-predicates to fill predicate_id on the 56 affected edges. No new METPO terms minted.

Renames

old label new label CURIE edges corpus pattern
influences regulates RO:0002211 7 env-axis → trait-pref
determines causes biolink:causes 6 X-utilization → trophic-type
sets defines METPO:2007500 6 tolerance → bounded-window (the new defines predicate from #73)
constrains regulates RO:0002211 6 X → Y where X bounds Y
input to participates in biolink:participates_in 6 substrate → pathway
powers enables RO:0002327 6 ATP/PMF → process
organizes enables RO:0002327 6 cytoskeleton → biosynthesis
occurs under occurs in biolink:occurs_in 5 metabolic process → env condition
requires depends on RO:0002502 4 process → cofactor/substrate
triggers causes biolink:causes 4 signal → response

Skipped (semantically vague or wrong-direction)

  • engages (7 edges) — vague; no clean fit
  • incorporated into (5 edges) — biolink:part_of is static structural; "incorporated into" is dynamic
  • drives production of (5 edges) — too specific; would lose information mapping to bare causes

Corpus impact

Before After
Edges grounded 702 758 (+56, 68% → 74%)
Edges residual 317 261 (−56)
Distinct residual labels 173 166 (−7)

Verified locally

$ uv run python scripts/rename_predicate_labels.py --apply
  edges renamed: 56

$ uv run python scripts/ground_causal_predicates.py --apply
  edges grounded: 56
  by target CURIE:
    RO:0002211 ×13   RO:0002327 ×12   biolink:causes ×10
    METPO:2007500 ×6  biolink:participates_in ×6  biolink:occurs_in ×5
    RO:0002502 ×4

$ uv run python scripts/rename_predicate_labels.py   # idempotency
  edges renamed: 0

$ just validate-strict
  files with ERROR:   0

Test plan

  • Idempotency on both rename + ground passes
  • validate-strict clean
  • All 56 renames pick up groundings on the next pass
  • CI re-runs validate-strict + pytest

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 26, 2026 02:34
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Normalizes additional residual free-text causal-edge predicate labels to existing canonical predicate labels already mapped in mappings/predicate_grounding.tsv, then re-runs grounding to populate predicate_id for the affected edges (reducing residual edges by 56).

Changes:

  • Expanded scripts/rename_predicate_labels.py with 10 additional label renames for the v5 residual cohort.
  • Re-grounded affected trait YAML causal edges by updating predicate: and filling predicate_id for the 56 impacted edges.
  • Updated the residual report to reflect the reduced set of ungrounded predicate labels.

Reviewed changes

Copilot reviewed 43 out of 43 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
scripts/rename_predicate_labels.py Adds the next cohort of predicate-label renames (old → canonical label) used prior to grounding.
reports/predicate_grounding_residual.tsv Updates residual predicate report after renames + grounding remove several labels from the residual set.
data/traits/physiology/trophic_type.yaml Renames determines→causes and adds predicate_id: biolink:causes on affected edges.
data/traits/physiology/photolithotrophic.yaml Renames powers→enables and adds predicate_id: RO:0002327.
data/traits/physiology/nutrient_adaptation.yaml Renames triggers→causes and adds predicate_id: biolink:causes.
data/traits/physiology/methanotrophic.yaml Renames input to→participates in and adds predicate_id: biolink:participates_in.
data/traits/physiology/carboxydotrophic.yaml Renames requires→depends on and adds predicate_id: RO:0002502.
data/traits/morphology/spiral_shaped.yaml Renames influences→regulates and adds predicate_id: RO:0002211.
data/traits/morphology/sphere_shaped.yaml Renames organizes→enables and adds predicate_id: RO:0002327.
data/traits/morphology/rod_shaped.yaml Renames organizes→enables and adds predicate_id: RO:0002327.
data/traits/morphology/motility.yaml Renames powers→enables and adds predicate_id: RO:0002327.
data/traits/morphology/motile.yaml Renames powers→enables and adds predicate_id: RO:0002327.
data/traits/morphology/gram_stain.yaml Renames influences→regulates and determines→causes, adds corresponding predicate_ids.
data/traits/morphology/gliding.yaml Renames powers→enables and adds predicate_id: RO:0002327.
data/traits/morphology/flagellated.yaml Renames powers→enables and adds predicate_id: RO:0002327.
data/traits/morphology/filament_shaped.yaml Renames organizes→enables and adds predicate_id: RO:0002327.
data/traits/morphology/crescent_shaped.yaml Renames constrains→regulates and adds predicate_id: RO:0002211.
data/traits/morphology/cell_width.yaml Renames organizes→enables and constrains→regulates, adds corresponding predicate_ids.
data/traits/morphology/cell_width_small.yaml Renames organizes→enables and adds predicate_id: RO:0002327.
data/traits/morphology/cell_shape.yaml Renames determines→causes and adds predicate_id: biolink:causes.
data/traits/morphology/cell_length.yaml Renames triggers→causes and adds predicate_id: biolink:causes.
data/traits/morphology/bacillus_shaped.yaml Renames organizes→enables and adds predicate_id: RO:0002327.
data/traits/metabolism/syntrophy.yaml Renames input to→participates in and adds predicate_id: biolink:participates_in.
data/traits/metabolism/methanogenesis.yaml Renames occurs under→occurs in and adds predicate_id: biolink:occurs_in.
data/traits/metabolism/metabolism.yaml Renames powers→enables and adds predicate_id: RO:0002327.
data/traits/metabolism/homoacetogenesis.yaml Renames occurs under→occurs in and input to→participates in, adds corresponding predicate_ids.
data/traits/metabolism/disproportionation.yaml Renames occurs under→occurs in and adds predicate_id: biolink:occurs_in.
data/traits/metabolism/anaerobic_respiration.yaml Renames occurs under→occurs in and adds predicate_id: biolink:occurs_in.
data/traits/metabolism/acetogenesis.yaml Renames occurs under→occurs in and input to→participates in, adds corresponding predicate_ids.
data/traits/environment/thermotolerant.yaml Renames triggers→causes and adds predicate_id: biolink:causes.
data/traits/environment/temperature_range.yaml Renames sets→defines and adds predicate_id: METPO:2007500 on affected edges.
data/traits/environment/temperature_preference.yaml Renames influences→regulates and constrains→regulates, adds predicate_id: RO:0002211.
data/traits/environment/temperature_optimum.yaml Renames influences→regulates and adds predicate_id: RO:0002211.
data/traits/environment/stenohaline.yaml Renames constrains→regulates and adds predicate_id: RO:0002211.
data/traits/environment/ph_range.yaml Renames sets→defines and adds predicate_id: METPO:2007500 on affected edges.
data/traits/environment/ph_growth_preference.yaml Renames influences→regulates and adds predicate_id: RO:0002211.
data/traits/environment/obligately_alkaphilic.yaml Renames constrains→regulates and adds predicate_id: RO:0002211.
data/traits/environment/obligately_aerobic.yaml Renames requires→depends on and adds predicate_id: RO:0002502.
data/traits/environment/nacl_range.yaml Renames sets→defines and adds predicate_id: METPO:2007500 on affected edges.
data/traits/environment/nacl_optimum.yaml Renames triggers→causes and adds predicate_id: biolink:causes.
data/traits/environment/halophily_preference.yaml Renames influences→regulates and adds predicate_id: RO:0002211.
data/traits/environment/aerobic.yaml Renames requires→depends on and adds predicate_id: RO:0002502 on affected edges.
data/traits/ecology/biosafety_level.yaml Renames determines→causes and adds predicate_id: biolink:causes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/rename_predicate_labels.py
realmarcin added a commit that referenced this pull request May 26, 2026
Replace "PR_F" placeholder references in rename_predicate_labels.py
with descriptive cohort labels — the PR-cycle name isn't stable
once merged. Two occurrences:

  "The next 10 rows land in PR_F (predicate residual v5)"
    → "The next 10 rows landed in the predicate residual v5 batch"
  "# PR_F cohort"
    → "# predicate residual v5 cohort"

The reference to PR #67 (predicate residual v3, already merged) is
kept verbatim — that PR number is stable.

Verified:
  - grep -r "PR_F" scripts mappings proposals → 0 hits
  - just validate-strict                       → 0 ERROR rows

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
One-off migration following the PR #67 pattern — normalizes the
next-tier of curator-paraphrased predicate labels to canonical
synonyms already in the grounding mappings, then re-runs
ground-predicates to fill predicate_id on the 56 affected edges.

Renames added to scripts/rename_predicate_labels.py (10 new rules):
  influences   → regulates          (7 edges → RO:0002211)
  determines   → causes             (6 edges → biolink:causes)
  sets         → defines            (6 edges → METPO:2007500)
  constrains   → regulates          (6 edges → RO:0002211)
  input to     → participates in    (6 edges → biolink:participates_in)
  powers       → enables            (6 edges → RO:0002327)
  organizes    → enables            (6 edges → RO:0002327)
  occurs under → occurs in          (5 edges → biolink:occurs_in)
  requires     → depends on         (4 edges → RO:0002502)
  triggers     → causes             (4 edges → biolink:causes)

Skipped: engages (vague), incorporated into (wrong-direction),
drives production of (too specific).

Per-corpus impact:
  Edges grounded:   +56 (from this batch)
  Distinct labels:  −7

Idempotency: re-running both rename and ground-predicates produces
no further changes.

Verified:
  - rename --apply → 56 edits, 0 invalid files
  - ground-predicates --apply → 56 new groundings
  - validate-strict → 0 ERROR rows / 357 files

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@realmarcin realmarcin force-pushed the predicate-residual-v5-renames branch from 0b0c679 to 37448c9 Compare May 26, 2026 05:01
@realmarcin realmarcin merged commit 78dc1a8 into main May 26, 2026
3 checks passed
@realmarcin realmarcin deleted the predicate-residual-v5-renames branch May 26, 2026 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants