Skip to content

feat(analyzer): Add Philippine PhilHealth recognizer#2047

Open
jacobArquiza wants to merge 4 commits into
data-privacy-stack:mainfrom
jacobArquiza:feature/ph-philhealth-recognizer
Open

feat(analyzer): Add Philippine PhilHealth recognizer#2047
jacobArquiza wants to merge 4 commits into
data-privacy-stack:mainfrom
jacobArquiza:feature/ph-philhealth-recognizer

Conversation

@jacobArquiza

Copy link
Copy Markdown

Change Description

Adds a new predefined recognizer for Philippine PhilHealth Identification Numbers (PH_HEALTH_INSURANCE).

Supported formats:

  • 12-000015726-6
  • 120000157266

The recognizer includes context terms, checksum validation, public supported-entities documentation, default registry wiring, and is disabled by default with country_code: ph.

Issue reference

Part of #2015

Checklist

  • I have reviewed the contribution guidelines
  • I have signed the CLA (if required)
  • My code includes unit tests
  • All unit tests and lint checks pass locally
  • My PR contains documentation updates / additions if required

Notes

Official PhilHealth documentation confirms the 12-digit PIN format and modulus 11 check digit. The exact weighting algorithm was not published in the official guide found, so this implementation uses the common weighted modulus 11 convention and validates it against a public valid PhilHealth PIN example.

Copilot AI review requested due to automatic review settings May 30, 2026 08:27

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new predefined recognizer for the Philippine PhilHealth Identification Number (PH_HEALTH_INSURANCE), with regex patterns, context words, and a modulus-11 checksum validator. The recognizer is registered (disabled by default) and documented.

Changes:

  • New PhPhilhealthRecognizer class with formatted/unformatted regex patterns and a checksum-based invalidate_result.
  • Registration via country_specific/philippines/__init__.py, the country-specific package, the top-level predefined_recognizers package, and default_recognizers.yaml (disabled).
  • Docs (supported_entities.md) and CHANGELOG.md updated; new pytest module with parametrized recognition, checksum, and performance tests.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
presidio-analyzer/presidio_analyzer/predefined_recognizers/country_specific/philippines/ph_philhealth_recognizer.py New recognizer implementation with patterns, context, and mod-11 check.
presidio-analyzer/presidio_analyzer/predefined_recognizers/country_specific/philippines/init.py Exposes the new recognizer from the Philippines subpackage.
presidio-analyzer/presidio_analyzer/predefined_recognizers/country_specific/init.py Adds the recognizer to the country-specific package exports.
presidio-analyzer/presidio_analyzer/predefined_recognizers/init.py Exports PhPhilhealthRecognizer from the top-level recognizers package.
presidio-analyzer/presidio_analyzer/conf/default_recognizers.yaml Registers the recognizer (English, country ph) as disabled by default.
docs/supported_entities.md Documents the new PH_HEALTH_INSURANCE entity.
CHANGELOG.md Notes the addition under Analyzer/Added.
presidio-analyzer/tests/test_ph_philhealth_recognizer.py Adds parametrized recognition, checksum, and performance tests.

Comment on lines +103 to +106
weights = range(2, 13)
total = sum(int(digit) * weight for digit, weight in zip(digits[:11], weights))
check_digit = (11 - (total % 11)) % 10
return check_digit == int(digits[-1])
Comment on lines +81 to +88
def test_performance(recognizer, entities):
text = "PhilHealth number 12-000015726-6 was verified. " * 4
start = time.time()

recognizer.analyze(text, entities)

elapsed = (time.time() - start) * 1000
assert elapsed < 100, f"Too slow: {elapsed:.1f}ms"
Comment on lines +46 to +59
CONTEXT = [
"philhealth",
"philhealth number",
"philhealth no",
"philhealth no.",
"philhealth id",
"philhealth identification number",
"pin",
"health insurance",
"member data record",
"mdr",
"kalusugan",
"seguro",
]
Copilot AI review requested due to automatic review settings June 7, 2026 08:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Comment on lines +24 to +27
- Check digit convention: compute 11 - (weighted total mod 11), map 11 to
0, and reject 10 because it cannot be represented as a single decimal
digit. This follows the standard single-digit modulus 11 handling also
used by ISO/IEC 7064 Mod 11,10 identifiers.
- PhilHealth Electronic Claims Implementation Guide v3.1 states that
pPIN is a 12-digit PhilHealth Identification Number and that the last
character is a modulus 11 check digit. The guide is published under
philhealth.gov.ph/downloads/eclaims/.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants