Add HRA uterus NTR batch outputs and review reports#3723
Conversation
Generated via bulk_ntr_workflow from source_data/uterus.csv (BOM header normalized into source_data/uterus_only.csv for processing). Pipeline completed through Stage 4: - 27 new UBERON leaf terms in final template - 0 group terms - 6 confirmed existing matches excluded - 0 possible matches identified - 0 out-of-scope terms - 1 manual-curation item QC status: - 0 [PENDING] definitions - 0 INFER/UNRESOLVABLE/NEEDS_MAPPING markers Includes: - Source data snapshots in bulk_ntr_workflow/source_data/ - Template and groups TSVs in src/templates/ - Review reports: candidates, errors, input, manual_curation Signed-off-by: @dragon-ai-agent
|
Claude finished @zhengj2007's task in 6m 25s —— View job PR Review: Add HRA uterus NTR batch outputs and review reportsTasks:
Review submitted: REQUEST_CHANGESReview checklist:
🔴 CRITICAL — Must fix before merge1. Six terms have ROBOT will fail or produce invalid OWL when it encounters the literal string The pipeline could not find a UBERON ID for the parent "lower uterine segment" at processing time — but
🟡 IMPORTANT — Should fix before merge2. Verify PMID:42152375 and PMID:42116181 — used for 3. 🔵 SUGGESTIONS — Optional
What looks good ✅
|
There was a problem hiding this comment.
Review checklist
- Hierarchy and parents are consistent — PARTIAL: most parent IDs valid; 6 terms have
UNKNOWN(see below) - Definitions and supporting references look adequate — PARTIAL: PMID:42152375, PMID:42116181 need verification
- Obsolete or replacement handling is safe — no obsoletions in this PR
- OBO and ODK conventions appear preserved — template format correct; ID range 8930000–8931000 is properly allocated to Jie Zheng
- No obvious CI, auth, or workflow regression — no workflow files touched
Summary
This batch adds 27 new UBERON leaf terms for uterus anatomy via the HRA NTR pipeline. The overall structure is well-organised, the FMA:17561 → UBERON:1200003 parent mapping for fundus terms was resolved correctly, and 6 confirmed existing terms were appropriately excluded. However, there are blocking issues that need to be resolved before the template can be processed by ROBOT.
🔴 CRITICAL — Must fix before merge
1. Six terms have UNKNOWN in both is_a and part_of columns ()
ROBOT will fail or produce invalid OWL when it encounters the literal string UNKNOWN as a class IRI. The six affected terms are the left/right/central anterior/posterior subdivisions of the lower uterine segment:
| Term | UBERON ID | Row |
|---|---|---|
| central anterior lower uterine segment | UBERON_8930202 | row 5 |
| central posterior lower uterine segment | UBERON_8930205 | row 8 |
| left anterior lower uterine segment | UBERON_8930220 | row 16 |
| left posterior lower uterine segment | UBERON_8930223 | row 19 |
| right anterior lower uterine segment | UBERON_8930229 | row 25 |
| right posterior lower uterine segment | UBERON_8930232 | row 28 |
The pipeline flagged these in ("unknown_parent" — no UBERON ID for parent label "lower uterine segment") but left the parent as rather than resolving it. The parent term ** (UBERON:8930224)** is defined in this same template batch. The fix is to set:
- → (zone of organ)
- → (lower uterine segment — the new term also in this template)
🟡 IMPORTANT — Should fix before merge
2. Verify PMID:42152375 and PMID:42116181
These are used as the sole publication references (beyond the ASCTB-TEMP placeholder) for:
- (UBERON_8930210)
- (UBERON_8930225)
PMIDs in the 42-million range are plausible for 2025 publications, but they should be explicitly verified as real indexed papers with content relevant to the uterine serosa. If they cannot be confirmed, alternative references should be found.
3. Parent of is too broad
(UBERON_8930224, row 20) has (uterus). Anatomically the lower uterine segment is a sub-region of the body of uterus (UBERON:0009853), which is itself part of the uterus. Using would be more precise and consistent with how the body-of-uterus terms in this same batch are handled.
🔵 SUGGESTIONS — Optional improvements
4. Genus specificity for myometrium layers
(UBERON_8930216) and (UBERON_8930226) use (organ part). A more specific genus — (zone of organ) — would be consistent with how the positional uterus-body/fundus/lower-segment terms are typed in this same batch.
5. Preprint reference for outer uterine myometrium
(UBERON_8930226) includes a bioRxiv preprint DOI () as a def_xref. This is not critical since the term also cites PMID:20551073 and PMID:32427986, but prefer published references where possible.
6. Stage 5 (template registration) not yet run
The script (Stage 5 of the workflow) has not been run to register the new templates in . If this is intentional (e.g., registration is a follow-up step), please note it explicitly in the PR description.
What is fine ✅
- ID range is correctly allocated to Jie Zheng in
- FMA:17561 → UBERON:1200003 (uterine fundus) mapping resolved correctly for all 8 fundus sub-terms; UBERON:1200003 confirmed to exist in the ontology
- All other and parent IDs confirmed present: UBERON:0034944, UBERON:0009853, UBERON:1200003, UBERON:0002384, UBERON:0001136, UBERON:0003891, UBERON:0012249, UBERON:0000458, UBERON:0001638, UBERON:0000995, UBERON:0000064, UBERON:0001296, UBERON:0001297
- correctly routed to and excluded from the ROBOT template (it belongs in CL, not UBERON)
- 6 confirmed existing terms correctly excluded in
- Template column directives, subset IRI, ORCID, NCBITaxon, and date format all look correct
- is empty (no group terms in this batch), which is consistent with the pipeline report
There was a problem hiding this comment.
Review checklist
- Hierarchy and parents are consistent — PARTIAL: most parent IDs valid; 6 terms have
UNKNOWN(see below) - Definitions and supporting references look adequate — PARTIAL: PMID:42152375, PMID:42116181 need verification
- Obsolete or replacement handling is safe — no obsoletions in this PR
- OBO and ODK conventions appear preserved — template format correct; ID range 8930000-8931000 is properly allocated to Jie Zheng
- No obvious CI, auth, or workflow regression — no workflow files touched
Summary
This batch adds 27 new UBERON leaf terms for uterus anatomy via the HRA NTR pipeline. The overall structure is well-organised, the FMA:17561 to UBERON:1200003 parent mapping for fundus terms was resolved correctly, and 6 confirmed existing terms were appropriately excluded. However, there are blocking issues that need to be resolved before the template can be processed by ROBOT.
CRITICAL — Must fix before merge
1. Six terms have UNKNOWN in both is_a and part_of columns
File: src/templates/hra-uterus.template.tsv
ROBOT will fail or produce invalid OWL when it encounters the literal string UNKNOWN as a class IRI. The six affected terms are the left/right/central anterior/posterior subdivisions of the lower uterine segment:
| Term | UBERON ID | Row |
|---|---|---|
| central anterior lower uterine segment | UBERON_8930202 | row 5 |
| central posterior lower uterine segment | UBERON_8930205 | row 8 |
| left anterior lower uterine segment | UBERON_8930220 | row 16 |
| left posterior lower uterine segment | UBERON_8930223 | row 19 |
| right anterior lower uterine segment | UBERON_8930229 | row 25 |
| right posterior lower uterine segment | UBERON_8930232 | row 28 |
The pipeline flagged these in errors.tsv ("unknown_parent" — no UBERON ID found for parent label "lower uterine segment") but left the parent as UNKNOWN rather than resolving it. The parent term lower uterine segment (UBERON:8930224) is defined in this same template batch. The fix is to set:
is_acolumn:UBERON:0034944(zone of organ)part_ofcolumn:http://purl.obolibrary.org/obo/UBERON_8930224(lower uterine segment — the new term also in this template)
IMPORTANT — Should fix before merge
2. Verify PMID:42152375 and PMID:42116181
These are used as the sole publication references (beyond the ASCTB-TEMP placeholder) for:
connective tissue of the uterine serosa(UBERON_8930210)mesothelium of the uterine serosa(UBERON_8930225)
PMIDs in the 42-million range are plausible for 2025 publications, but they should be explicitly verified as real indexed papers with content relevant to the uterine serosa. If they cannot be confirmed, alternative references should be found.
3. Parent of lower uterine segment is too broad
lower uterine segment (UBERON_8930224, row 20) has part_of: UBERON:0000995 (uterus). Anatomically the lower uterine segment is a sub-region of the body of uterus (UBERON:0009853), which is itself part of the uterus. Using part_of: UBERON:0009853 would be more precise and consistent with how the body-of-uterus terms in this same batch are handled.
SUGGESTIONS — Optional improvements
4. Genus specificity for myometrium layers
inner uterine myometrium (UBERON_8930216) and outer uterine myometrium (UBERON_8930226) use is_a: UBERON:0000064 (organ part). A more specific genus — UBERON:0034944 (zone of organ) — would be consistent with how the positional uterus-body/fundus/lower-segment terms are typed in this same batch.
5. Preprint reference for outer uterine myometrium
outer uterine myometrium (UBERON_8930226) includes a bioRxiv preprint DOI (https://doi.org/10.1101/2021.01.02.425073) as a def_xref. This is not critical since the term also cites PMID:20551073 and PMID:32427986, but published references are preferred.
6. Stage 5 (template registration) not yet run
The register_templates.py script (Stage 5 of the workflow) has not been run to register the new templates in uberon-odk.yaml. If this is intentional (e.g., registration is a follow-up step), please note it explicitly in the PR description.
What looks good
- ID range
UBERON:8930xxxis correctly allocated to Jie Zheng inuberon-idranges.owl - FMA:17561 to UBERON:1200003 (uterine fundus) mapping resolved correctly for all 8 fundus sub-terms; UBERON:1200003 confirmed to exist in the ontology
- All other parent IDs confirmed present: UBERON:0034944, UBERON:0009853, UBERON:1200003, UBERON:0002384, UBERON:0001136, UBERON:0003891, UBERON:0012249, UBERON:0000458, UBERON:0001638, UBERON:0000995, UBERON:0000064, UBERON:0001296, UBERON:0001297
columnar cell of endocervixcorrectly routed tomanual_curation.tsvand excluded from the ROBOT template (belongs in CL, not UBERON)- 6 confirmed existing terms correctly excluded in
candidates.tsv - Template column directives, subset IRI, ORCID, NCBITaxon, and date format all look correct
hra-uterus-groups.template.tsvis empty (no group terms), consistent with the pipeline report
Generated via bulk_ntr_workflow from source_data/uterus.csv (BOM header normalized into source_data/uterus_only.csv for processing).
Pipeline completed through Stage 4:
QC status:
Includes:
Signed-off-by: @dragon-ai-agent