Skip to content

Fix user profile address view and improve historical order logic#1

Open
jacobmpeters wants to merge 5 commits into
initial_developmentfrom
feature/update_uphx_handling
Open

Fix user profile address view and improve historical order logic#1
jacobmpeters wants to merge 5 commits into
initial_developmentfrom
feature/update_uphx_handling

Conversation

@jacobmpeters
Copy link
Copy Markdown
Contributor

This pull request introduces important updates to the user profile address processing pipeline, focusing on how address history is handled and delivered prior to our second delivery to NORC. The changes correct the ordering of historical address entries, implement a carry-forward mechanism for NULL address fields, and provide clear documentation on the downstream impact. Additionally, a new dry_run.py script is added to validate the generated SQL before execution in BigQuery.

User Profile Address History Logic:

  • Corrected the historical_order field in user_profile_address_view.sql to number history entries newest-first (ROW_NUMBER() ... ORDER BY element_position DESC), ensuring historical_order = 1 is always the most recent history snapshot. Previously, entries were numbered oldest-first. [1] [2] [3]
  • Implemented a carry-forward mechanism using LAST_VALUE(... IGNORE NULLS) OVER w for address fields (address_line_1, address_line_2, city, state, zip_code) to fill true NULLs from the nearest newer non-null entry per participant and address type. Empty strings are preserved and not filled. [1] [2]

Documentation and Impact Analysis:

  • Added a detailed internal note (delivery_impact_of_change_to_up_hx.md) explaining the behavioral changes, their effect on address_hash computation, delivery state, and guidelines for merging returned geocoded data.

Tooling and Validation:

  • Introduced core/dry_run.py, a script that renders the combined address view SQL, saves it for inspection, and performs a BigQuery dry run to validate the SQL without executing it. This helps catch errors before deployment.

Peters added 3 commits May 5, 2026 12:35
…y-forward for NULL fields

- Replace (element_position + 1) with ROW_NUMBER() OVER (PARTITION BY
  Connect_ID ORDER BY element_position DESC) in Queries 3, 4, and 6
  so that historical_order = 1 is the most recent history entry,
  consistent with the reverse_chron_order convention in the demo

- Wrap all six UNION ALL queries in an outer SELECT with a named WINDOW
  that applies LAST_VALUE(... IGNORE NULLS) to fill true NULLs in
  address_line_1, address_line_2, city, state, and zip_code using the
  most recent non-null value per (Connect_ID, address_nickname)

- Empty strings are intentionally preserved as-is (Step 5 behavior);
  NULLIF/TRIM normalization continues to happen downstream in the
  standardized_addresses CTE in address_processing.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants