-
Notifications
You must be signed in to change notification settings - Fork 0
feat: implement PHIX validation for schools and daycares #152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
eswarchandravidyasagar
commented
Jan 14, 2026
- Added PHIX validation module to validate school/daycare names against the official PHIX reference list.
- Integrated validation into the preprocessing step in orchestrator.py.
- Configurable options added to parameters.yaml for enabling validation and handling unmatched facilities.
- Created unit tests for the validation module covering various scenarios.
- Added documentation for the validation plan and updated the plans directory.
- Added PHIX validation module to validate school/daycare names against the official PHIX reference list. - Integrated validation into the preprocessing step in orchestrator.py. - Configurable options added to parameters.yaml for enabling validation and handling unmatched facilities. - Created unit tests for the validation module covering various scenarios. - Added documentation for the validation plan and updated the plans directory.
|
We don't have redistribution permission on the phix reference list file, so that will need to be removed and commits squashed. It'll also blow up the size of this repository and its history. Users will have to BYO phix reference list |
config/parameters.yaml
Outdated
| # Path to PHIX reference Excel file (relative to project root) | ||
| reference_file: PHIX Reference Lists v5.2 - 2025Jun30.xlsx | ||
| # Minimum fuzzy match score (0-100) to consider a match | ||
| match_threshold: 85 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required. It should be exact? This could enable bypass of the exact issues we'd like to protect against like similarly named schools being accidentally selected when a panorama user creates a forecast query
|
We likely need a mapping file that converts the PHU name from phix reference document, to standardized PHU acronyms (which should be enforced for template folders, etc) We also may need to allow functionality for this map to be many-to-one, in the case of PHUs which have merged since this was last updated. |
|
I know in this case that this is important to run early in pipeline before other processing, but I wonder also if we can emit something in the per-pdf validation log regarding valid facility being used for the target PHU? |
- Updated `validate_phix.py` to remove fuzzy matching and implement strict exact matching for facility names against the PHIX reference list. - Introduced PHU alias mapping to restrict validation to specific Public Health Units (PHUs) using a YAML configuration file. - Enhanced the `validate_facilities` function to support PHU scoping and improved error handling for unmatched facilities. - Updated tests to reflect changes in matching strategy and added new tests for PHU alias mapping and validation behavior. - Modified documentation to clarify the new validation process and configuration options.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |