feat(zip): add offline US ZIP enrichment to form-fill pipeline#490
Open
CrepuscularIRIS wants to merge 1 commit intofireform-core:mainfrom
Open
feat(zip): add offline US ZIP enrichment to form-fill pipeline#490CrepuscularIRIS wants to merge 1 commit intofireform-core:mainfrom
CrepuscularIRIS wants to merge 1 commit intofireform-core:mainfrom
Conversation
Adds ZipResolver class that infers missing ZIP codes from extracted form data (city/state, address text, or county strings) using the uszipcode offline SQLite database — no network calls at fill time. Integrates in filler.py after LLM extraction, before positional fill. Existing valid ZIPs are never overwritten; leaves field unchanged when ZIP cannot be inferred to avoid wrong ZIP on official reports. Three inference strategies in priority order: 1. Extract embedded 5-digit ZIP from address field text 2. City + state lookup (handles full state names and abbreviations) 3. Parse City, ST pattern from address/county fields Adds 20 unit tests in src/test/test_zip_resolver.py (all mocked, no network dependency). Tests run under PYTHONPATH=src matching CI. Closes fireform-core#435
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a
ZipResolverclass that infers missing US ZIP codes from already-extracted form fields (city/state, embedded address text, or county strings) using an offline SQLite database — no network calls at fill time.Why
Fixes #435
Cal Fire responders often remember a location (town, district, address) but not the ZIP code during field reporting. FireForm can now auto-fill missing ZIP fields from context already present in the transcript, reducing manual correction and speeding up report completion in low-connectivity environments.
How
A new
src/zip_resolver.pymodule providesZipResolver.enrich(extracted_data: dict) -> dict. It is called insrc/filler.pyimmediately after the LLM extracts field values and before they are written positionally to the PDF.Three inference strategies run in priority order:
"123 Main St, Sacramento, CA 95814"→95814)uszipcode(handles both abbreviations"CA"and full names"California")"City, ST"pattern from address or county fields ("Pine Valley, CA"→91962)Key design decisions:
uszipcodeis not installed, enrichment is silently skipped via atry/except ImportErroruszipcodewas chosen overpgeocodefor this use case: it is US-specific, bundles its data as a local SQLite DB (no GeoNames download step), and has a purpose-built city/state search APIScope
ZipResolverclass, integration intofiller.py, 20 unit tests,uszipcodedependencyVerification
PYTHONPATH=src python -m pytest src/test/test_zip_resolver.py -v→ 20 passed, 0 failedsrc/test/to match the CI workflow (tests.ymlrunssrc/test/)src/filler.pychange is 2 lines (import + one enrichment call); all existing behaviour preservedTest configuration:
uszipcodemocked in all tests — zero network dependency in the test suiteAI Disclosure
AI-assisted implementation, manually reviewed, tested, and verified line by line.