From 5e82dd32bdeecb61100fb89e1a3a65dcc564a915 Mon Sep 17 00:00:00 2001 From: zhengj2007 Date: Tue, 26 May 2026 11:02:05 -0400 Subject: [PATCH] Add HRA small-intestine NTR batch outputs and review reports Generated via bulk_ntr_workflow from source_data/small-intestine.csv. Pipeline completed through Stage 4: - 13 input rows processed - 0 new terms remained in final template after merge filtering - 10 confirmed existing UBERON matches excluded - 2 out-of-scope terms excluded - 2 manual-curation entries flagged - 4 name corrections reported Adds the small-intestine source CSV, generated template stubs, and report TSVs (candidates, errors, input, manual_curation, name_corrections, out_of_scope). Signed-off-by: dragon-ai-agent --- bulk_ntr_workflow/source_data/small-intestine.csv | 14 ++++++++++++++ .../hra-small-intestine-groups.template.tsv | 2 ++ .../hra-small-intestine-reports/candidates.tsv | 12 ++++++++++++ .../hra-small-intestine-reports/errors.tsv | 2 ++ .../hra-small-intestine-reports/input.tsv | 14 ++++++++++++++ .../manual_curation.tsv | 3 +++ .../name_corrections.tsv | 5 +++++ .../hra-small-intestine-reports/out_of_scope.tsv | 3 +++ src/templates/hra-small-intestine.template.tsv | 2 ++ 9 files changed, 57 insertions(+) create mode 100644 bulk_ntr_workflow/source_data/small-intestine.csv create mode 100644 src/templates/hra-small-intestine-groups.template.tsv create mode 100644 src/templates/hra-small-intestine-reports/candidates.tsv create mode 100644 src/templates/hra-small-intestine-reports/errors.tsv create mode 100644 src/templates/hra-small-intestine-reports/input.tsv create mode 100644 src/templates/hra-small-intestine-reports/manual_curation.tsv create mode 100644 src/templates/hra-small-intestine-reports/name_corrections.tsv create mode 100644 src/templates/hra-small-intestine-reports/out_of_scope.tsv create mode 100644 src/templates/hra-small-intestine.template.tsv diff --git a/bulk_ntr_workflow/source_data/small-intestine.csv b/bulk_ntr_workflow/source_data/small-intestine.csv new file mode 100644 index 000000000..56e100ff8 --- /dev/null +++ b/bulk_ntr_workflow/source_data/small-intestine.csv @@ -0,0 +1,14 @@ +tables,as,as_label,UBERON ID,Pull Request/Issue,parents_as,parents_as_label,references +small-intestine,https://purl.org/ccf/ASCTB-TEMP_circular-longitudinal-muscle,circular/longitudinal muscle,,,https://purl.org/ccf/ASCTB-TEMP_muscularis-externa,muscularis externa, +small-intestine,https://purl.org/ccf/ASCTB-TEMP_endoethlial,endoethlial,,,UBERON:0003332,submucosa of duodenum, +small-intestine,https://purl.org/ccf/ASCTB-TEMP_epithelium,Epithelium,,,UBERON:0001213,intestinal villus, +small-intestine,https://purl.org/ccf/ASCTB-TEMP_lamina-propria-gut-associated-lymphoid-tissue-galt-,Lamina propria/Gut associated lymphoid tissue (GALT),,,UBERON:0001213,intestinal villus, +small-intestine,https://purl.org/ccf/ASCTB-TEMP_mucosa,mucosa,,,UBERON:0013644,duodenal ampulla, +small-intestine,https://purl.org/ccf/ASCTB-TEMP_muscularis-externa,muscularis externa,,,UBERON:0002115,jejunum, +small-intestine,https://purl.org/ccf/ASCTB-TEMP_muscularis-mucosa,muscularis mucosa,,,UBERON:0001213,intestinal villus, +small-intestine,https://purl.org/ccf/ASCTB-TEMP_muscularis-propria,muscularis propria,,,UBERON:0013644,duodenal ampulla, +small-intestine,https://purl.org/ccf/ASCTB-TEMP_myenteric-plexus-of-auerbach,myenteric plexus of Auerbach,,,UBERON:0012377,muscle layer of jejunum, +small-intestine,https://purl.org/ccf/ASCTB-TEMP_serosa,serosa,,,UBERON:0013644,duodenal ampulla, +small-intestine,https://purl.org/ccf/ASCTB-TEMP_submucosa,submucosa,,,UBERON:0013644,duodenal ampulla, +small-intestine,https://purl.org/ccf/ASCTB-TEMP_submucosal-plexus-of-meissner,submucosal plexus of Meissner,,,UBERON:0003332,submucosa of duodenum, +small-intestine,https://purl.org/ccf/ASCTB-TEMP_connective-tissue,connective tissue,,,UBERON:0003337,serosa of jejunum,"PMID: 20126587, PMID: 3310649" \ No newline at end of file diff --git a/src/templates/hra-small-intestine-groups.template.tsv b/src/templates/hra-small-intestine-groups.template.tsv new file mode 100644 index 000000000..dff359c07 --- /dev/null +++ b/src/templates/hra-small-intestine-groups.template.tsv @@ -0,0 +1,2 @@ +ID LABEL Definition def_xref genus location In_subset Date Contributor Present_in_taxon Wikipedia_image xref +ID LABEL A IAO:0000115 >A oboInOwl:hasDbXref SPLIT=| EC % EC BFO:0000050 some % AI oboInOwl:inSubset AT dcterms:date^^xsd:dateTime AI dcterms:contributor AI RO:0002175 A foaf:depiction A oboInOwl:hasDbXref SPLIT=| diff --git a/src/templates/hra-small-intestine-reports/candidates.tsv b/src/templates/hra-small-intestine-reports/candidates.tsv new file mode 100644 index 000000000..0da80bea0 --- /dev/null +++ b/src/templates/hra-small-intestine-reports/candidates.tsv @@ -0,0 +1,12 @@ +label as_iri uberon_id note +Epithelium UBERON:0013636 confirmed_match (confidence: high) +muscularis mucosa UBERON:0006676 confirmed_match (confidence: high) +muscularis externa UBERON:0006660 confirmed_match (confidence: high) +submucosal plexus of Meissner UBERON:0005304 confirmed_match (confidence: high) +connective tissue UBERON:0002384 confirmed_match (confidence: high) +myenteric plexus of Auerbach UBERON:0002439 confirmed_match (confidence: high) +mucosa UBERON:0000344 confirmed_match (confidence: high) +submucosa UBERON:0000009 confirmed_match (confidence: high) +serosa UBERON:0000042 confirmed_match (confidence: high) +muscularis propria UBERON:0006660 confirmed_match (confidence: high) +circular/longitudinal muscle UBERON:0006660 possible_match (The term might be an informal reference to the muscularis externa/muscular coat, which typically contains both circular and longitudinal layers. However, the slash notation is non-standard and UBERON already has this parent term.) diff --git a/src/templates/hra-small-intestine-reports/errors.tsv b/src/templates/hra-small-intestine-reports/errors.tsv new file mode 100644 index 000000000..d0b256411 --- /dev/null +++ b/src/templates/hra-small-intestine-reports/errors.tsv @@ -0,0 +1,2 @@ +label as_iri issue_type parent_id parent_label detail +circular/longitudinal muscle https://purl.org/ccf/ASCTB-TEMP_circular-longitudinal-muscle asctb_temp_parent https://purl.org/ccf/ASCTB-TEMP_muscularis-externa muscularis externa Parent not yet in UBERON; subagent should search OLS4 for correct parent diff --git a/src/templates/hra-small-intestine-reports/input.tsv b/src/templates/hra-small-intestine-reports/input.tsv new file mode 100644 index 000000000..3ba5fdc34 --- /dev/null +++ b/src/templates/hra-small-intestine-reports/input.tsv @@ -0,0 +1,14 @@ +table as_iri label uberon_id parent_id parent_label references term_type + https://purl.org/ccf/ASCTB-TEMP_circular-longitudinal-muscle circular/longitudinal muscle https://purl.org/ccf/ASCTB-TEMP_muscularis-externa muscularis externa leaf + https://purl.org/ccf/ASCTB-TEMP_endoethlial endoethlial UBERON:0003332 submucosa of duodenum leaf + https://purl.org/ccf/ASCTB-TEMP_epithelium Epithelium UBERON:0001213 intestinal villus leaf + https://purl.org/ccf/ASCTB-TEMP_lamina-propria-gut-associated-lymphoid-tissue-galt- Lamina propria/Gut associated lymphoid tissue (GALT) UBERON:0001213 intestinal villus leaf + https://purl.org/ccf/ASCTB-TEMP_mucosa mucosa UBERON:0013644 duodenal ampulla leaf + https://purl.org/ccf/ASCTB-TEMP_muscularis-externa muscularis externa UBERON:0002115 jejunum leaf + https://purl.org/ccf/ASCTB-TEMP_muscularis-mucosa muscularis mucosa UBERON:0001213 intestinal villus leaf + https://purl.org/ccf/ASCTB-TEMP_muscularis-propria muscularis propria UBERON:0013644 duodenal ampulla leaf + https://purl.org/ccf/ASCTB-TEMP_myenteric-plexus-of-auerbach myenteric plexus of Auerbach UBERON:0012377 muscle layer of jejunum leaf + https://purl.org/ccf/ASCTB-TEMP_serosa serosa UBERON:0013644 duodenal ampulla leaf + https://purl.org/ccf/ASCTB-TEMP_submucosa submucosa UBERON:0013644 duodenal ampulla leaf + https://purl.org/ccf/ASCTB-TEMP_submucosal-plexus-of-meissner submucosal plexus of Meissner UBERON:0003332 submucosa of duodenum leaf + https://purl.org/ccf/ASCTB-TEMP_connective-tissue connective tissue UBERON:0003337 serosa of jejunum PMID: 20126587, PMID: 3310649 leaf diff --git a/src/templates/hra-small-intestine-reports/manual_curation.tsv b/src/templates/hra-small-intestine-reports/manual_curation.tsv new file mode 100644 index 000000000..8a470f461 --- /dev/null +++ b/src/templates/hra-small-intestine-reports/manual_curation.tsv @@ -0,0 +1,3 @@ +label definition reason similar_terms suggestion +Lamina propria/Gut associated lymphoid tissue (GALT) The connective tissue core of an intestinal villus containing blood vessels, a central lacteal, immune cells including lymphocytes, and in some regions gut-associated lymphoid tissue comprising organized lymphoid aggregates such as Peyer's patches and solitary lymphoid follicles. This is a problematic compound term that conflates two related but distinct structures. Lamina propria (UBERON:0000030) is the general connective tissue layer, while GALT (UBERON:0001962) refers to specific organized lymphoid tissue. A villus-specific 'lamina propria of intestinal villus' term does not currently exist in UBERON but would be a reasonable addition. However, the compound naming with slash is non-standard and should be resolved. UBERON:0000030=lamina propria; UBERON:0016606=lamina propria of small intestine; UBERON:0001962=gut-associated lymphoid tissue Curator should either: (1) Create a new term 'lamina propria of intestinal villus' with intersection_of: UBERON:0000030 ! lamina propria AND part_of UBERON:0001213 ! intestinal villus, OR (2) Split this into two separate NTR entries - one for villus-specific lamina propria and maintain existing GALT term. The slash notation should not be preserved in the final ontology term name. Note that GALT is not present in all intestinal villi - it is concentrated in specific regions (ileum, Peyer's patches), so conflating it with lamina propria is anatomically imprecise. +circular/longitudinal muscle The combined circular and longitudinal smooth muscle layers of the muscularis externa. The inner circular layer narrows the lumen of the organ, while the outer longitudinal layer shortens its length, together enabling peristaltic movement. Term name is non-standard. In UBERON and standard anatomical nomenclature, the circular and longitudinal muscle layers are distinct anatomical structures, not combined with a slash. UBERON has separate terms: UBERON:0012368 'circular muscle layer of muscular coat' and UBERON:0012369 'longitudinal muscle layer of muscular coat'. This term needs curator decision on whether to: (1) split into two separate NTRs for the circular and longitudinal layers, (2) treat as referring to the parent muscularis externa (UBERON:0006660) which already encompasses both layers, or (3) reject as malformed. UBERON:0012368=circular muscle layer of muscular coat; UBERON:0012369=longitudinal muscle layer of muscular coat; UBERON:0006660=muscular coat; UBERON:0008857=stomach smooth muscle circular layer; UBERON:0008863=stomach smooth muscle outer longitudinal layer Curator should verify the source data (ASCTB-TEMP) intent. If this refers to both layers collectively, it's redundant with UBERON:0006660 (muscular coat/muscularis externa). If it should be two separate terms, create NTRs for specific organ circular/longitudinal layers following the pattern of stomach layers shown above. The slash notation is not used in UBERON or standard anatomical nomenclature. diff --git a/src/templates/hra-small-intestine-reports/name_corrections.tsv b/src/templates/hra-small-intestine-reports/name_corrections.tsv new file mode 100644 index 000000000..f956baca0 --- /dev/null +++ b/src/templates/hra-small-intestine-reports/name_corrections.tsv @@ -0,0 +1,5 @@ +source_label corrected_label reason +Epithelium epithelium of intestinal villus The generic term 'Epithelium' (UBERON:0000483) is too broad. In the context of intestinal villus anatomy, this should be 'epithelium of intestinal villus' (UBERON:0013636), which is the specific epithelial layer covering the villus structure. Keep 'Epithelium' as an exact synonym if needed for ASCTB compatibility. +Lamina propria/Gut associated lymphoid tissue (GALT) Split into two separate terms OR clarify as 'lamina propria of intestinal villus (including GALT)' This compound term with a slash represents TWO distinct but related anatomical structures: (1) lamina propria (UBERON:0000030) - the connective tissue core of the villus, and (2) gut-associated lymphoid tissue/GALT (UBERON:0001962) - organized lymphoid tissue that can be present within the lamina propria in certain intestinal regions. GALT is not universally present in all intestinal villi but is concentrated in specific areas like Peyer's patches. Standard anatomical nomenclature does not use slash notation for compound structures. Recommendation: either create separate NTR entries for 'lamina propria of intestinal villus' and maintain GALT as existing UBERON:0001962, OR clarify the term name to indicate GALT as a component that may be present within the villus lamina propria. +endoethlial endothelial (or more specifically, 'endothelium of [vessel type]') The term 'endoethlial' is a non-standard spelling. The correct spelling would be 'endothelial' (adjective) or 'endothelium' (noun for the anatomical structure). However, beyond the spelling error, the term is too vague for UBERON - it needs to specify which vessel's endothelium is being described. +circular/longitudinal muscle Split into 'circular muscle layer' and 'longitudinal muscle layer', or map to existing 'muscular coat' (UBERON:0006660) The slash notation combining two distinct anatomical layers is non-standard. In UBERON and TA98 (Terminologia Anatomica), the circular and longitudinal muscle layers of the muscularis externa are separate entities, each with specific structural and functional characteristics. The term should either be split into two separate NTR requests for the individual layers, or if referring to both collectively, should map to the existing 'muscular coat' term. diff --git a/src/templates/hra-small-intestine-reports/out_of_scope.tsv b/src/templates/hra-small-intestine-reports/out_of_scope.tsv new file mode 100644 index 000000000..fab2ea17f --- /dev/null +++ b/src/templates/hra-small-intestine-reports/out_of_scope.tsv @@ -0,0 +1,3 @@ +label reason suggestion +muscularis mucosa Anatomically incorrect placement. The muscularis mucosa is a smooth muscle layer located at the base of the intestinal mucosa, deep to the lamina propria and below the intestinal villi. It is NOT a component of the intestinal villus itself. Intestinal villi project upward from the mucosal surface, and the muscularis mucosa forms the deep boundary of the mucosa layer. This term should not be classified as 'part_of' intestinal villus. Remove from this NTR group. Muscularis mucosa exists as UBERON:0006676 and has appropriate relationships to mucosa and gastrointestinal tract, but should not be linked to intestinal villus. +endoethlial This term is problematic for multiple reasons: (1) it appears to be a misspelling of 'endothelial', which is an adjective describing cells/tissue, not an anatomical structure; (2) even if corrected to 'endothelium', it is too vague without specifying which blood or lymphatic vessel in the submucosa; (3) UBERON requires specific anatomical structures with clear boundaries, not general cell-type descriptors; (4) if this refers to endothelial cells, those belong in the Cell Ontology (CL), not UBERON which covers anatomy. This term should be rejected or the submitter should clarify what specific anatomical structure is intended. If they mean 'endothelium of [specific vessel type] in duodenal submucosa', that could be a valid term following UBERON's pattern (e.g., 'endothelium of capillary'). Without clarification, this is unsuitable for UBERON. diff --git a/src/templates/hra-small-intestine.template.tsv b/src/templates/hra-small-intestine.template.tsv new file mode 100644 index 000000000..1ed53ab4b --- /dev/null +++ b/src/templates/hra-small-intestine.template.tsv @@ -0,0 +1,2 @@ +ID LABEL Definition def_xref is_a part_of develops_from In_subset Date Contributor Present_in_taxon Wikipedia_image xref +ID LABEL A IAO:0000115 >A oboInOwl:hasDbXref SPLIT=| SC % SC BFO:0000050 some % SC RO:0002202 some % AI oboInOwl:inSubset AT dcterms:date^^xsd:dateTime AI dcterms:contributor AI RO:0002175 A foaf:depiction A oboInOwl:hasDbXref SPLIT=|