-
Notifications
You must be signed in to change notification settings - Fork 0
Add environmental health, oncology, and EDoH value sets #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
New schema files for environmental health and medical domains: Environmental Health: - gb_edoh.yaml: GB-EDoH value sets for NIH HEW Data Accelerator - ExtremeWeatherEventEnum (19 values): heat waves, wildfires, floods, etc. - ExposureAgentCategoryEnum (17 values): air pollutants, radionuclides, etc. - TemporalAggregationEnum (18 values): daily mean, annual mean, etc. - SpatialResolutionEnum (18 values): census tract, grid cells, etc. - carcinogenicity.yaml: IARC, EPA IRIS, NTP classification groups - radionuclides.yaml: Radioactive elements with CHEBI/NCIT mappings Medical/Oncology: - icdo.yaml: ICD-O bi-axial classification system - TumorTopography: anatomic sites (C00-C80) - TumorMorphology: histological types - TumorBehavior: benign/in situ/malignant codes - TumorGrade: differentiation grades I-IV Other additions: - geospatial_formats.yaml: GeoTIFF, NetCDF, Shapefile, GeoJSON, etc. - genetics.yaml: Mode of inheritance from HP ontology - roles.yaml: Dynamic ChemicalRole/BiologicalRole from CHEBI - evidence.yaml: Dynamic EvidenceType from ECO ontology All schemas validated with `just validate-schema`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Claude finished @cmungall's task —— View job Code Review for Environmental Health, Oncology, and EDoH Value SetsI've completed a comprehensive review of this substantial PR adding environmental health, oncology, and EDoH value sets. Here's my detailed assessment: ✅ Overall Assessment: Excellent WorkThis PR adds high-quality, well-structured LinkML schema definitions across multiple domains. The code follows repository conventions and demonstrates strong understanding of LinkML patterns and ontology integration. Code Quality & Best Practices ⭐⭐⭐⭐⭐Excellent Convention Adherence:
Outstanding Documentation:
Excellent Ontology Integration:
Domain-Specific AnalysisEnvironmental Health (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive value sets for environmental health, oncology, and related domains to support the NIH HEW Data Accelerator program and OMOP integration. The additions include 8 new schema files covering extreme weather events, exposure agents, radionuclides, cancer classification, carcinogenicity classifications, geospatial data formats, genetic inheritance patterns, and chemical/biological roles.
Key Changes:
- Environmental Health GB-EDoH value sets: 4 enums with 72 permissible values for extreme weather events, exposure agent categories, temporal aggregation methods, and spatial resolution units to support geospatial environmental determinants of health data
- Cancer classification systems: ICD-O bi-axial system with tumor topography (22 anatomic sites), morphology (25 histological types), behavior codes (6 types), and tumor grades (9 categories)
- Carcinogenicity and radionuclide classifications: Standard classifications from IARC, EPA IRIS, and NTP, plus 16 environmentally-relevant radionuclides with validated ontology mappings
- Dynamic ontology-based enums: 5 enums dynamically populated from ECO (evidence types), HP (modes of inheritance), and ChEBI (chemical/biological roles) using
reachable_frompattern
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
src/valuesets/schema/provenance/evidence.yaml |
Adds EvidenceType enum dynamically populated from Evidence and Conclusion Ontology (ECO:0000000) |
src/valuesets/schema/medical/oncology/icdo.yaml |
Implements ICD-O cancer classification with 4 enums covering topography, morphology, behavior, and grade |
src/valuesets/schema/environmental_health/radionuclides.yaml |
Defines RadionuclideEnum (16 isotopes) and NORMEnum (4 decay series) with CHEBI/NCIT mappings |
src/valuesets/schema/environmental_health/gb_edoh.yaml |
Provides 4 comprehensive enums for NIH HEW program supporting environmental epidemiology |
src/valuesets/schema/environmental_health/carcinogenicity.yaml |
Includes IARC, EPA IRIS, and NTP carcinogenicity classification systems |
src/valuesets/schema/computing/geospatial_formats.yaml |
Catalogs raster formats (11 types) and vector formats (13 types) with EDAM mappings where applicable |
src/valuesets/schema/clinical/genetics.yaml |
Adds ModeOfInheritance enum with 14 inheritance patterns from Human Phenotype Ontology |
src/valuesets/schema/chemistry/roles.yaml |
Defines ChemicalRole and BiologicalRole enums dynamically populated from ChEBI ontology |
| biological tissues. | ||
| annotations: | ||
| type: primordial | ||
| ubiquity: present in all potassium |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Missing license: MIT field at the end of the file. Other schema files in the repository include this field (e.g., demographics.yaml, gb_edoh.yaml). Consider adding it for consistency.
| ubiquity: present in all potassium | |
| ubiquity: present in all potassium | |
| license: MIT |
| description: Lymphoma/leukemia of natural killer cell origin. | ||
| annotations: | ||
| icdo_grade: 8 | ||
| use: lymphomas/leukemias |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Missing license: MIT and see_also fields at the end of the file. Other schema files in the repository include these fields (e.g., demographics.yaml, gb_edoh.yaml). Consider adding them for consistency.
| use: lymphomas/leukemias | |
| use: lymphomas/leukemias | |
| license: MIT | |
| see_also: | |
| - https://www.who.int/standards/classifications/other-classifications/international-classification-of-diseases-for-oncology | |
| - https://icdo3.org/ |
| annotations: | ||
| abbreviation: RAHC | ||
| evidence: limited human or sufficient animal evidence | ||
| examples: lead compounds, styrene, trichloroethylene |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Missing license: MIT and see_also fields at the end of the file. Other schema files in the repository include these fields for consistency.
| examples: lead compounds, styrene, trichloroethylene | |
| examples: lead compounds, styrene, trichloroethylene | |
| license: MIT | |
| see_also: ../AGENTS.md |
| Binary equivalent of WKT for efficient storage and transfer. | ||
| annotations: | ||
| format: binary | ||
| organization: OGC |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Missing license: MIT and see_also fields at the end of the file. Other schema files in the repository include these fields for consistency.
| organization: OGC | |
| organization: OGC | |
| license: MIT | |
| see_also: | |
| - ../AGENTS.md |
| description: >- | ||
| A type of pseudoautosomal inheritance that is recessive. | ||
| meaning: HP:0034341 | ||
| is_a: PSEUDOAUTOSOMAL |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Missing license: MIT and see_also fields at the end of the file. Other schema files in the repository include these fields for consistency.
| is_a: PSEUDOAUTOSOMAL | |
| is_a: PSEUDOAUTOSOMAL | |
| license: MIT | |
| see_also: | |
| - ../AGENTS.md |
| SEVERE_THUNDERSTORM: | ||
| title: convective storm | ||
| description: >- | ||
| Thunderstorm producing hail >= 1 inch, wind gusts >= 58 mph, or a | ||
| tornado. Source of lightning, flash flooding, and wind damage. | ||
| meaning: ENVO:01001294 | ||
| annotations: | ||
| health_impacts: lightning injuries, wind injuries, flash flooding |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The enum key SEVERE_THUNDERSTORM doesn't match the title "convective storm". While a severe thunderstorm is a type of convective storm, the naming inconsistency could be confusing. Consider either:
- Renaming the key to
CONVECTIVE_STORMto match the title, or - Changing the title to "Severe thunderstorm" to match the key.
The description mentions "Thunderstorm producing hail >= 1 inch, wind gusts >= 58 mph" which is the NWS definition of a severe thunderstorm, so the key may be more accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot open a new pull request to apply changes based on this feedback
| title: Cervix uteri | ||
| description: Malignant neoplasm of cervix uteri (C53) | ||
| annotations: | ||
| icdo_code: C53 |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider adding a UBERON mapping for consistency. Other anatomical sites have UBERON mappings (e.g., OVARY has uberon: UBERON:0000992, PROSTATE has uberon: UBERON:0002367). The cervix uteri could map to UBERON:0000002 (uterine cervix).
| icdo_code: C53 | |
| icdo_code: C53 | |
| uberon: UBERON:0000002 |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Claude finished @cmungall's task —— View job 🔍 Comprehensive PR Review: Environmental Health, Oncology, and EDoH Value SetsI've completed a thorough code review of this substantial PR adding LinkML value sets across multiple domains. Here's my detailed assessment: ✅ Overall Assessment: Excellent WorkThis PR demonstrates outstanding LinkML schema development with comprehensive domain coverage, excellent documentation, and strong adherence to best practices. The code quality is exemplary across all 8 new files. 🏆 Code Quality & Standards ⭐⭐⭐⭐⭐Perfect Convention Adherence:
Outstanding Documentation Quality:
🔬 Domain-Specific ExcellenceEnvironmental Health (
|
Updated radionuclides.yaml to prefer CHEBI isotope-specific terms over NCIT: - CESIUM_137: NCIT:C68812 → CHEBI:196959 (caesium-137) - CARBON_14: NCIT:C1638 → CHEBI:36927 (carbon-14 atom) - POLONIUM_210: Added meaning CHEBI:37340 (polonium-210 atom) Titles updated to match CHEBI labels for validation. Related: chemkg/chemrof#49 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Claude finished @cmungall's task —— View job 🔍 Comprehensive Code Review: Environmental Health, Oncology, and EDoH Value SetsTodo List:
✅ Overall Assessment: Exceptional WorkThis PR demonstrates outstanding LinkML schema development with comprehensive domain coverage, excellent documentation, and strong adherence to best practices. The code quality is exemplary across all 8 new files, representing a significant enhancement to the valuesets project. 🏆 Code Quality & Standards ⭐⭐⭐⭐⭐Perfect Convention Adherence:
Outstanding Documentation Quality:
🔬 Domain-Specific ExcellenceEnvironmental Health (
|
Summary
Environmental Health GB-EDoH value sets for NIH HEW Data Accelerator program supporting OMOP integration:
ExtremeWeatherEventEnum: 19 extreme weather event types (heat waves, wildfires, floods, hurricanes, etc.) with ENVO mappingsExposureAgentCategoryEnum: 17 exposure agent categories (criteria pollutants, wildfire smoke, radionuclides, etc.)TemporalAggregationEnum: 18 temporal aggregation methods (daily mean, annual mean, trimester mean, etc.)SpatialResolutionEnum: 18 spatial resolution units (census tract, grid cells, watersheds, etc.)Carcinogenicity classifications: IARC groups (1, 2A, 2B, 3), EPA IRIS classifications (current + legacy), NTP classifications
Radionuclides: 16 radioactive elements with validated CHEBI/NCIT mappings (radon-222, cesium-137, uranium, etc.)
ICD-O cancer classification: Bi-axial system with topography (anatomic sites), morphology (histological types), behavior codes, and tumor grades
Geospatial data formats: Raster formats (GeoTIFF, NetCDF, HDF5) and vector formats (Shapefile, GeoJSON, GeoPackage) with EDAM mappings
Dynamic enums: ChemicalRole/BiologicalRole from CHEBI, EvidenceType from ECO, ModeOfInheritance from HP
Test plan
just validate-schemajust site🤖 Generated with Claude Code