Skip to content

feat(zip): add offline US ZIP enrichment to form-fill pipeline#490

Open
CrepuscularIRIS wants to merge 1 commit intofireform-core:mainfrom
CrepuscularIRIS:feat/offline-zip-resolver
Open

feat(zip): add offline US ZIP enrichment to form-fill pipeline#490
CrepuscularIRIS wants to merge 1 commit intofireform-core:mainfrom
CrepuscularIRIS:feat/offline-zip-resolver

Conversation

@CrepuscularIRIS
Copy link
Copy Markdown

What

Adds a ZipResolver class that infers missing US ZIP codes from already-extracted form fields (city/state, embedded address text, or county strings) using an offline SQLite database — no network calls at fill time.

Why

Fixes #435

Cal Fire responders often remember a location (town, district, address) but not the ZIP code during field reporting. FireForm can now auto-fill missing ZIP fields from context already present in the transcript, reducing manual correction and speeding up report completion in low-connectivity environments.

How

A new src/zip_resolver.py module provides ZipResolver.enrich(extracted_data: dict) -> dict. It is called in src/filler.py immediately after the LLM extracts field values and before they are written positionally to the PDF.

Three inference strategies run in priority order:

  1. Extract an embedded 5-digit ZIP from any address field text ("123 Main St, Sacramento, CA 95814"95814)
  2. City + state lookup via uszipcode (handles both abbreviations "CA" and full names "California")
  3. Parse a "City, ST" pattern from address or county fields ("Pine Valley, CA"91962)

Key design decisions:

  • Non-breaking: existing non-empty ZIPs are never overwritten
  • Conservative fallback: leaves field unchanged if ZIP cannot be inferred — a missing ZIP on an official report is better than a wrong one
  • Graceful degradation: if uszipcode is not installed, enrichment is silently skipped via a try/except ImportError
  • Immutable: returns a copy of the input dict, never mutates LLM state
  • uszipcode was chosen over pgeocode for this use case: it is US-specific, bundles its data as a local SQLite DB (no GeoNames download step), and has a purpose-built city/state search API

Scope

  • Included: ZipResolver class, integration into filler.py, 20 unit tests, uszipcode dependency
  • NOT included: non-US postal codes, integration tests requiring a live Ollama instance, UI changes

Verification

  • PYTHONPATH=src python -m pytest src/test/test_zip_resolver.py -v20 passed, 0 failed
  • Tests placed in src/test/ to match the CI workflow (tests.yml runs src/test/)
  • No changes to existing test files; no regressions
  • src/filler.py change is 2 lines (import + one enrichment call); all existing behaviour preserved

Test configuration:

  • Python 3.11.15
  • OS: Linux (Ubuntu)
  • uszipcode mocked in all tests — zero network dependency in the test suite

AI Disclosure

AI-assisted implementation, manually reviewed, tested, and verified line by line.

Adds ZipResolver class that infers missing ZIP codes from extracted
form data (city/state, address text, or county strings) using the
uszipcode offline SQLite database — no network calls at fill time.

Integrates in filler.py after LLM extraction, before positional fill.
Existing valid ZIPs are never overwritten; leaves field unchanged when
ZIP cannot be inferred to avoid wrong ZIP on official reports.

Three inference strategies in priority order:
1. Extract embedded 5-digit ZIP from address field text
2. City + state lookup (handles full state names and abbreviations)
3. Parse City, ST pattern from address/county fields

Adds 20 unit tests in src/test/test_zip_resolver.py (all mocked,
no network dependency). Tests run under PYTHONPATH=src matching CI.

Closes fireform-core#435
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Offline ZIP/postal derivation

1 participant