Skip to content

Add HRA uterus NTR batch outputs and review reports#3723

Open
zhengj2007 wants to merge 1 commit into
masterfrom
add-hra-uterus-ntr
Open

Add HRA uterus NTR batch outputs and review reports#3723
zhengj2007 wants to merge 1 commit into
masterfrom
add-hra-uterus-ntr

Conversation

@zhengj2007
Copy link
Copy Markdown
Collaborator

Generated via bulk_ntr_workflow from source_data/uterus.csv (BOM header normalized into source_data/uterus_only.csv for processing).

Pipeline completed through Stage 4:

  • 27 new UBERON leaf terms in final template
  • 0 group terms
  • 6 confirmed existing matches excluded
  • 0 possible matches identified
  • 0 out-of-scope terms
  • 1 manual-curation item

QC status:

  • 0 [PENDING] definitions
  • 0 INFER/UNRESOLVABLE/NEEDS_MAPPING markers

Includes:

  • Source data snapshots in bulk_ntr_workflow/source_data/
  • Template and groups TSVs in src/templates/
  • Review reports: candidates, errors, input, manual_curation

Signed-off-by: @dragon-ai-agent

Generated via bulk_ntr_workflow from source_data/uterus.csv (BOM header normalized into source_data/uterus_only.csv for processing).

Pipeline completed through Stage 4:
- 27 new UBERON leaf terms in final template
- 0 group terms
- 6 confirmed existing matches excluded
- 0 possible matches identified
- 0 out-of-scope terms
- 1 manual-curation item

QC status:
- 0 [PENDING] definitions
- 0 INFER/UNRESOLVABLE/NEEDS_MAPPING markers

Includes:
- Source data snapshots in bulk_ntr_workflow/source_data/
- Template and groups TSVs in src/templates/
- Review reports: candidates, errors, input, manual_curation

Signed-off-by: @dragon-ai-agent
@ai4c-agent
Copy link
Copy Markdown

ai4c-agent Bot commented May 26, 2026

Claude finished @zhengj2007's task in 6m 25s —— View job


PR Review: Add HRA uterus NTR batch outputs and review reports

Tasks:

  • Gather context
  • Read changed files
  • Check parent terms in ontology
  • Verify definitions, xrefs, PMIDs
  • Check OBO/ODK conventions
  • Post review findings

Review submitted: REQUEST_CHANGES

Review checklist:

  • Hierarchy and parents are consistent — PARTIAL: most parent IDs valid; 6 terms have UNKNOWN (see critical issue below)
  • Definitions and supporting references look adequate — PARTIAL: PMID:42152375 and PMID:42116181 need verification
  • Obsolete or replacement handling is safe — no obsoletions in this PR
  • OBO and ODK conventions appear preserved — template format correct; ID range 8930000–8931000 is properly allocated to Jie Zheng
  • No obvious CI, auth, or workflow regression

🔴 CRITICAL — Must fix before merge

1. Six terms have UNKNOWN in both is_a and part_of columns (src/templates/hra-uterus.template.tsv)

ROBOT will fail or produce invalid OWL when it encounters the literal string UNKNOWN as a class IRI. The affected terms are all left/right/central anterior/posterior subdivisions of the lower uterine segment (rows 5, 8, 16, 19, 25, 28 — UBERON_8930202, 8930205, 8930220, 8930223, 8930229, 8930232).

The pipeline could not find a UBERON ID for the parent "lower uterine segment" at processing time — but lower uterine segment is itself defined as UBERON:8930224 in this same batch. The fix is to set:

  • is_aUBERON:0034944 (zone of organ)
  • part_ofhttp://purl.obolibrary.org/obo/UBERON_8930224

🟡 IMPORTANT — Should fix before merge

2. Verify PMID:42152375 and PMID:42116181 — used for connective tissue of the uterine serosa and mesothelium of the uterine serosa. High-range PMIDs (possibly 2025 publications) should be confirmed as indexed and relevant before merge.

3. lower uterine segment parent too broad — UBERON_8930224 has part_of: UBERON:0000995 (uterus). Anatomically this region is a sub-part of the body of uterus (UBERON:0009853); using that as the part_of target would be more precise.


🔵 SUGGESTIONS — Optional

  • Inner/outer uterine myometrium use is_a: UBERON:0000064 (organ part); UBERON:0034944 (zone of organ) would be more specific and consistent with the other positional terms in this batch.
  • outer uterine myometrium cites a bioRxiv preprint DOI; peer-reviewed PMIDs present as well so not blocking.
  • Stage 5 (register_templates.py) not yet run — if intentionally deferred, please note in PR description.

What looks good ✅

  • ID range UBERON:8930xxx correctly allocated to Jie Zheng in uberon-idranges.owl
  • FMA:17561 → UBERON:1200003 (uterine fundus) resolved correctly for all 8 fundus sub-terms; UBERON:1200003 confirmed to exist in ontology
  • All other parent IDs confirmed present in ontology
  • columnar cell of endocervix correctly excluded from template (routed to manual_curation.tsv — cell type belongs in CL)
  • 6 confirmed matches correctly excluded via candidates.tsv
  • Template column directives, subset IRI, ORCID, NCBITaxon, and date format all correct

Copy link
Copy Markdown

@ai4c-agent ai4c-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review checklist

  • Hierarchy and parents are consistent — PARTIAL: most parent IDs valid; 6 terms have UNKNOWN (see below)
  • Definitions and supporting references look adequate — PARTIAL: PMID:42152375, PMID:42116181 need verification
  • Obsolete or replacement handling is safe — no obsoletions in this PR
  • OBO and ODK conventions appear preserved — template format correct; ID range 8930000–8931000 is properly allocated to Jie Zheng
  • No obvious CI, auth, or workflow regression — no workflow files touched

Summary

This batch adds 27 new UBERON leaf terms for uterus anatomy via the HRA NTR pipeline. The overall structure is well-organised, the FMA:17561 → UBERON:1200003 parent mapping for fundus terms was resolved correctly, and 6 confirmed existing terms were appropriately excluded. However, there are blocking issues that need to be resolved before the template can be processed by ROBOT.


🔴 CRITICAL — Must fix before merge

1. Six terms have UNKNOWN in both is_a and part_of columns ()

ROBOT will fail or produce invalid OWL when it encounters the literal string UNKNOWN as a class IRI. The six affected terms are the left/right/central anterior/posterior subdivisions of the lower uterine segment:

Term UBERON ID Row
central anterior lower uterine segment UBERON_8930202 row 5
central posterior lower uterine segment UBERON_8930205 row 8
left anterior lower uterine segment UBERON_8930220 row 16
left posterior lower uterine segment UBERON_8930223 row 19
right anterior lower uterine segment UBERON_8930229 row 25
right posterior lower uterine segment UBERON_8930232 row 28

The pipeline flagged these in ("unknown_parent" — no UBERON ID for parent label "lower uterine segment") but left the parent as rather than resolving it. The parent term ** (UBERON:8930224)** is defined in this same template batch. The fix is to set:

  • → (zone of organ)
  • → (lower uterine segment — the new term also in this template)

🟡 IMPORTANT — Should fix before merge

2. Verify PMID:42152375 and PMID:42116181

These are used as the sole publication references (beyond the ASCTB-TEMP placeholder) for:

  • (UBERON_8930210)
  • (UBERON_8930225)

PMIDs in the 42-million range are plausible for 2025 publications, but they should be explicitly verified as real indexed papers with content relevant to the uterine serosa. If they cannot be confirmed, alternative references should be found.

3. Parent of is too broad

(UBERON_8930224, row 20) has (uterus). Anatomically the lower uterine segment is a sub-region of the body of uterus (UBERON:0009853), which is itself part of the uterus. Using would be more precise and consistent with how the body-of-uterus terms in this same batch are handled.


🔵 SUGGESTIONS — Optional improvements

4. Genus specificity for myometrium layers

(UBERON_8930216) and (UBERON_8930226) use (organ part). A more specific genus — (zone of organ) — would be consistent with how the positional uterus-body/fundus/lower-segment terms are typed in this same batch.

5. Preprint reference for outer uterine myometrium

(UBERON_8930226) includes a bioRxiv preprint DOI () as a def_xref. This is not critical since the term also cites PMID:20551073 and PMID:32427986, but prefer published references where possible.

6. Stage 5 (template registration) not yet run

The script (Stage 5 of the workflow) has not been run to register the new templates in . If this is intentional (e.g., registration is a follow-up step), please note it explicitly in the PR description.


What is fine ✅

  • ID range is correctly allocated to Jie Zheng in
  • FMA:17561 → UBERON:1200003 (uterine fundus) mapping resolved correctly for all 8 fundus sub-terms; UBERON:1200003 confirmed to exist in the ontology
  • All other and parent IDs confirmed present: UBERON:0034944, UBERON:0009853, UBERON:1200003, UBERON:0002384, UBERON:0001136, UBERON:0003891, UBERON:0012249, UBERON:0000458, UBERON:0001638, UBERON:0000995, UBERON:0000064, UBERON:0001296, UBERON:0001297
  • correctly routed to and excluded from the ROBOT template (it belongs in CL, not UBERON)
  • 6 confirmed existing terms correctly excluded in
  • Template column directives, subset IRI, ORCID, NCBITaxon, and date format all look correct
  • is empty (no group terms in this batch), which is consistent with the pipeline report

Copy link
Copy Markdown

@ai4c-agent ai4c-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review checklist

  • Hierarchy and parents are consistent — PARTIAL: most parent IDs valid; 6 terms have UNKNOWN (see below)
  • Definitions and supporting references look adequate — PARTIAL: PMID:42152375, PMID:42116181 need verification
  • Obsolete or replacement handling is safe — no obsoletions in this PR
  • OBO and ODK conventions appear preserved — template format correct; ID range 8930000-8931000 is properly allocated to Jie Zheng
  • No obvious CI, auth, or workflow regression — no workflow files touched

Summary

This batch adds 27 new UBERON leaf terms for uterus anatomy via the HRA NTR pipeline. The overall structure is well-organised, the FMA:17561 to UBERON:1200003 parent mapping for fundus terms was resolved correctly, and 6 confirmed existing terms were appropriately excluded. However, there are blocking issues that need to be resolved before the template can be processed by ROBOT.


CRITICAL — Must fix before merge

1. Six terms have UNKNOWN in both is_a and part_of columns

File: src/templates/hra-uterus.template.tsv

ROBOT will fail or produce invalid OWL when it encounters the literal string UNKNOWN as a class IRI. The six affected terms are the left/right/central anterior/posterior subdivisions of the lower uterine segment:

Term UBERON ID Row
central anterior lower uterine segment UBERON_8930202 row 5
central posterior lower uterine segment UBERON_8930205 row 8
left anterior lower uterine segment UBERON_8930220 row 16
left posterior lower uterine segment UBERON_8930223 row 19
right anterior lower uterine segment UBERON_8930229 row 25
right posterior lower uterine segment UBERON_8930232 row 28

The pipeline flagged these in errors.tsv ("unknown_parent" — no UBERON ID found for parent label "lower uterine segment") but left the parent as UNKNOWN rather than resolving it. The parent term lower uterine segment (UBERON:8930224) is defined in this same template batch. The fix is to set:

  • is_a column: UBERON:0034944 (zone of organ)
  • part_of column: http://purl.obolibrary.org/obo/UBERON_8930224 (lower uterine segment — the new term also in this template)

IMPORTANT — Should fix before merge

2. Verify PMID:42152375 and PMID:42116181

These are used as the sole publication references (beyond the ASCTB-TEMP placeholder) for:

  • connective tissue of the uterine serosa (UBERON_8930210)
  • mesothelium of the uterine serosa (UBERON_8930225)

PMIDs in the 42-million range are plausible for 2025 publications, but they should be explicitly verified as real indexed papers with content relevant to the uterine serosa. If they cannot be confirmed, alternative references should be found.

3. Parent of lower uterine segment is too broad

lower uterine segment (UBERON_8930224, row 20) has part_of: UBERON:0000995 (uterus). Anatomically the lower uterine segment is a sub-region of the body of uterus (UBERON:0009853), which is itself part of the uterus. Using part_of: UBERON:0009853 would be more precise and consistent with how the body-of-uterus terms in this same batch are handled.


SUGGESTIONS — Optional improvements

4. Genus specificity for myometrium layers

inner uterine myometrium (UBERON_8930216) and outer uterine myometrium (UBERON_8930226) use is_a: UBERON:0000064 (organ part). A more specific genus — UBERON:0034944 (zone of organ) — would be consistent with how the positional uterus-body/fundus/lower-segment terms are typed in this same batch.

5. Preprint reference for outer uterine myometrium

outer uterine myometrium (UBERON_8930226) includes a bioRxiv preprint DOI (https://doi.org/10.1101/2021.01.02.425073) as a def_xref. This is not critical since the term also cites PMID:20551073 and PMID:32427986, but published references are preferred.

6. Stage 5 (template registration) not yet run

The register_templates.py script (Stage 5 of the workflow) has not been run to register the new templates in uberon-odk.yaml. If this is intentional (e.g., registration is a follow-up step), please note it explicitly in the PR description.


What looks good

  • ID range UBERON:8930xxx is correctly allocated to Jie Zheng in uberon-idranges.owl
  • FMA:17561 to UBERON:1200003 (uterine fundus) mapping resolved correctly for all 8 fundus sub-terms; UBERON:1200003 confirmed to exist in the ontology
  • All other parent IDs confirmed present: UBERON:0034944, UBERON:0009853, UBERON:1200003, UBERON:0002384, UBERON:0001136, UBERON:0003891, UBERON:0012249, UBERON:0000458, UBERON:0001638, UBERON:0000995, UBERON:0000064, UBERON:0001296, UBERON:0001297
  • columnar cell of endocervix correctly routed to manual_curation.tsv and excluded from the ROBOT template (belongs in CL, not UBERON)
  • 6 confirmed existing terms correctly excluded in candidates.tsv
  • Template column directives, subset IRI, ORCID, NCBITaxon, and date format all look correct
  • hra-uterus-groups.template.tsv is empty (no group terms), consistent with the pipeline report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant