feat(analyzer): Add Philippine PhilHealth recognizer#2047
Open
jacobArquiza wants to merge 4 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a new predefined recognizer for the Philippine PhilHealth Identification Number (PH_HEALTH_INSURANCE), with regex patterns, context words, and a modulus-11 checksum validator. The recognizer is registered (disabled by default) and documented.
Changes:
- New
PhPhilhealthRecognizerclass with formatted/unformatted regex patterns and a checksum-basedinvalidate_result. - Registration via
country_specific/philippines/__init__.py, the country-specific package, the top-levelpredefined_recognizerspackage, anddefault_recognizers.yaml(disabled). - Docs (
supported_entities.md) andCHANGELOG.mdupdated; new pytest module with parametrized recognition, checksum, and performance tests.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| presidio-analyzer/presidio_analyzer/predefined_recognizers/country_specific/philippines/ph_philhealth_recognizer.py | New recognizer implementation with patterns, context, and mod-11 check. |
| presidio-analyzer/presidio_analyzer/predefined_recognizers/country_specific/philippines/init.py | Exposes the new recognizer from the Philippines subpackage. |
| presidio-analyzer/presidio_analyzer/predefined_recognizers/country_specific/init.py | Adds the recognizer to the country-specific package exports. |
| presidio-analyzer/presidio_analyzer/predefined_recognizers/init.py | Exports PhPhilhealthRecognizer from the top-level recognizers package. |
| presidio-analyzer/presidio_analyzer/conf/default_recognizers.yaml | Registers the recognizer (English, country ph) as disabled by default. |
| docs/supported_entities.md | Documents the new PH_HEALTH_INSURANCE entity. |
| CHANGELOG.md | Notes the addition under Analyzer/Added. |
| presidio-analyzer/tests/test_ph_philhealth_recognizer.py | Adds parametrized recognition, checksum, and performance tests. |
Comment on lines
+103
to
+106
| weights = range(2, 13) | ||
| total = sum(int(digit) * weight for digit, weight in zip(digits[:11], weights)) | ||
| check_digit = (11 - (total % 11)) % 10 | ||
| return check_digit == int(digits[-1]) |
Comment on lines
+81
to
+88
| def test_performance(recognizer, entities): | ||
| text = "PhilHealth number 12-000015726-6 was verified. " * 4 | ||
| start = time.time() | ||
|
|
||
| recognizer.analyze(text, entities) | ||
|
|
||
| elapsed = (time.time() - start) * 1000 | ||
| assert elapsed < 100, f"Too slow: {elapsed:.1f}ms" |
Comment on lines
+46
to
+59
| CONTEXT = [ | ||
| "philhealth", | ||
| "philhealth number", | ||
| "philhealth no", | ||
| "philhealth no.", | ||
| "philhealth id", | ||
| "philhealth identification number", | ||
| "pin", | ||
| "health insurance", | ||
| "member data record", | ||
| "mdr", | ||
| "kalusugan", | ||
| "seguro", | ||
| ] |
Comment on lines
+24
to
+27
| - Check digit convention: compute 11 - (weighted total mod 11), map 11 to | ||
| 0, and reject 10 because it cannot be represented as a single decimal | ||
| digit. This follows the standard single-digit modulus 11 handling also | ||
| used by ISO/IEC 7064 Mod 11,10 identifiers. |
| - PhilHealth Electronic Claims Implementation Guide v3.1 states that | ||
| pPIN is a 12-digit PhilHealth Identification Number and that the last | ||
| character is a modulus 11 check digit. The guide is published under | ||
| philhealth.gov.ph/downloads/eclaims/. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change Description
Adds a new predefined recognizer for Philippine PhilHealth Identification Numbers (
PH_HEALTH_INSURANCE).Supported formats:
12-000015726-6120000157266The recognizer includes context terms, checksum validation, public supported-entities documentation, default registry wiring, and is disabled by default with
country_code: ph.Issue reference
Part of #2015
Checklist
Notes
Official PhilHealth documentation confirms the 12-digit PIN format and modulus 11 check digit. The exact weighting algorithm was not published in the official guide found, so this implementation uses the common weighted modulus 11 convention and validates it against a public valid PhilHealth PIN example.