From d16cbc3d80c4b8ef5d30a4151c07f9d0ae0bd692 Mon Sep 17 00:00:00 2001 From: almac2022 Date: Mon, 11 May 2026 07:58:26 -0700 Subject: [PATCH 1/9] Initialize PWF baseline for #206 Co-Authored-By: Claude Opus 4.7 --- planning/active/findings.md | 101 +++++++++++++++++++++++++++++++++++ planning/active/progress.md | 11 ++++ planning/active/task_plan.md | 47 ++++++++++++++++ 3 files changed, 159 insertions(+) create mode 100644 planning/active/findings.md create mode 100644 planning/active/progress.md create mode 100644 planning/active/task_plan.md diff --git a/planning/active/findings.md b/planning/active/findings.md new file mode 100644 index 0000000..86a3317 --- /dev/null +++ b/planning/active/findings.md @@ -0,0 +1,101 @@ +# Findings — frs_point_match: match two point datasets along FWA network within instream distance (#206) + +## Issue context + +## Problem + +fresh has primitives for snapping points to FWA streams (`frs_point_snap`) and for finding upstream/downstream features per segment (`frs_network_features`), but no primitive for matching two **point datasets** along the network within a distance threshold. + +Concrete use case driving this: link's bcfp parity layer needs to reproduce bcfp's `02_pscis_streams_150m.sql` at `smnorris/bcfishpass@v0.7.14-125-g6e9cf1c` (current tunnel state, `bcfishpass.log.model_run_id=121` rebuilt 2026-05-05) -- match PSCIS crossings to modelled crossings within 100m instream distance on the same stream, then keep the nearest PSCIS per modelled crossing. Currently link's `lnk_pipeline_crossings` is missing this layer, leaving modelled crossings duplicating PSCIS positions in the working schema; cascades into >+1000 false-positive anthropogenic barriers in BULK alone (see link's `research/bcfp_table_map.md`). + +The operation generalizes beyond PSCIS: + +- field-assessed crossings <-> user-added crossings deduplication +- observations <-> habitat confirmation points +- any "merge two point datasets on the same FWA network" workflow + +## Proposed primitive + +`frs_point_match(conn, table_a, table_b, table_to, distance_max, ...)`: + +- Both input tables must already be snapped to FWA (carry `blue_line_key`, `downstream_route_measure` -- typically via `frs_point_snap` upstream). +- Output table has columns from `table_a` plus a `` linking column populated where a match within `distance_max` exists. +- Matching is on **same `blue_line_key`** + instream distance <= `distance_max` (computed from `downstream_route_measure` deltas). +- Dedup: each `table_b` row links to at most one `table_a` row -- the closest one (`DISTINCT ON (table_b_id) ORDER BY distance ASC`). + +### Signature + +```r +frs_point_match( + conn, + table_a, # schema-qualified, points to match FROM + table_b, # schema-qualified, points to match TO + table_to, # schema-qualified destination + distance_max, # max instream distance (metres) + table_a_id_col = "id", + table_b_id_col = "id" +) +``` + +Returns `conn` invisibly. Side effect: drops + recreates `table_to` with table_a's columns plus the linking ID column. + +Network-position columns (`blue_line_key`, `downstream_route_measure`) are hard-coded to the FWA convention -- every FWA-snapped point table in the bcfp ecosystem uses these names. Per-side overrides (like `frs_network_features` got in fresh#204) can be added later if a real divergence appears. + +## Why fresh, not link + +This is a generic FWA-primitive operation. fresh already owns: + +- `frs_point_snap` -- point <-> stream snap +- `frs_network_features` -- segment <-> feature dnstr/upstr + +`frs_point_match` rounds out the point-handling family -- point <-> point along the network. Adding it in fresh makes the primitive available to link, bcfishpass-comparable tooling, and any future packages working with point-on-network data. Name uses singular `point` to match `frs_point_snap` (the closest existing analog). + +## Acceptance + +- [ ] `frs_point_match(...)` produces output byte-identical to bcfp's `02_pscis_streams_150m.sql` output (at `smnorris/bcfishpass@v0.7.14-125-g6e9cf1c`) for a test WSG (after stream-name filtering, which is bcfp-specific and stays in link's caller, not the primitive) +- [ ] Mocked tests covering the dedup logic (multiple table_a matching one table_b) +- [ ] Live test on a small WSG (ADMS or similar) validating the byte-identical claim +- [ ] Roxygen + lintr clean +- [ ] Exported from NAMESPACE + +## Out of scope + +- Stream-name matching (bcfp does this for PSCIS-specific reasons via `gnis_name` <-> `stream_name` regex matching). Caller can layer that on top -- primitive provides just the network-distance match. +- Multi-stream matching (matching across wscode_ltree subtrees) -- single `blue_line_key` is enough for the parity case. +- Bidirectional dedup variants (matching pairs both ways) -- `DISTINCT ON (table_b_id)` is enough. +- Per-side column-name overrides (`table_*_blk_col` etc.) -- add when a real divergence appears. + +## Exploration notes (from plan-mode 2026-05-11) + +### fresh codebase patterns (Explore agent report) + +- **`frs_network_features` is the template** (`R/frs_network_features.R`). v0.29.0+, has per-side wscode/localcode overrides (fresh#204 / commit `f42e86a`), uses `sprintf` SQL composition with `.frs_validate_identifier()` on every user-supplied identifier (security gate). +- **Three-tier test pattern** (`tests/testthat/test-frs_network_features.R`): + - Tier 1 — validation tests, no DB, `expect_error()` per arg path (lines 6–124) + - Tier 2 — SQL composition tests via `withr::local_mocked_bindings()` mocking `frs_db_query`, capture SQL, `expect_match` key clauses (lines 127–374) + - Tier 3 — live DB integration, `skip_if_not(.frs_db_available())` +- **No re-exports from other packages** — pure `@export` via roxygen. +- **Current version** — `DESCRIPTION` line 3: `0.29.0`. +- **Private helpers** in `R/utils.R`: `.frs_validate_identifier()`, `.frs_db_available()`, `.frs_snap_guards()`. + +### bcfp algorithm (Explore agent report, with verification) + +- bcfp source: `model/01_access/pscis/sql/02_pscis_streams_150m.sql` +- Same-stream constraint: `e.blue_line_key = m.blue_line_key` (line 158) +- Instream distance: `ABS(e.downstream_route_measure - m.downstream_route_measure) < 100` (line 159) +- Dedup: two phases — `DISTINCT ON (stream_crossing_id, blue_line_key)` ordered by `modelled_xing_dist_instream` +- 100m threshold hard-coded; 150m planar pre-filter also hard-coded (will be `distance_max` parameter in our primitive) +- **Algorithm unchanged between local v0.7.13-167-ga9d93f0 source and tunnel v0.7.14-125-g6e9cf1c** — verified via `git log --oneline v0.7.13-167-ga9d93f0..v0.7.14-125-g6e9cf1c -- 02_pscis_streams_150m.sql` (empty output) +- Stream-name scoring (lines 18–26, 101–105) is **descriptive not prescriptive** — stays in caller (link), out of scope for primitive +- Width-order scoring similar — out of scope +- 150m planar pre-filter (`distance_to_stream < 150`) — stays in caller (link's `lnk_points_snap` already snaps to streams within tolerance) + +### Design decision: write-to-table vs return-tibble + +- Issue body specifies write-to-table (`table_to`, returns `conn` invisibly) +- This diverges from existing fresh primitives (`frs_point_snap` returns sf, `frs_network_features` returns tibble) +- Right call because: + - Result is a derived dataset, not a query result + - bcfp's analog `02_pscis_streams_150m.sql` writes to a table + - Caller doesn't want to round-trip large result through R +- May want to revisit if a future use case prefers return-tibble; could add `to = NULL` semantic later (when `NULL`, return tibble; when string, write to table). diff --git a/planning/active/progress.md b/planning/active/progress.md new file mode 100644 index 0000000..8f5fd3d --- /dev/null +++ b/planning/active/progress.md @@ -0,0 +1,11 @@ +# Progress — frs_point_match: match two point datasets along FWA network within instream distance (#206) + +## Session 2026-05-11 + +- Plan-mode exploration — phases approved by user +- Parallel Explore agents covered fresh codebase patterns (`frs_network_features` template, three-tier tests, `.frs_validate_identifier()` security gate, withr::local_mocked_bindings) and bcfp's `02_pscis_streams_150m.sql` algorithm (same-stream + instream-distance match + DISTINCT ON dedup; algorithm unchanged between local v0.7.13 source and tunnel v0.7.14-125) +- Created branch `206-frs-point-match-match-two-point-datasets` off main +- Scaffolded PWF baseline from issue #206 with approved phases +- Driven from link session — work happens in `~/Projects/repo/fresh` +- Plan file: `/Users/airvine/.claude/plans/snuggly-fluttering-hopper.md` (also referenced in task_plan.md) +- Next: start Phase 1 — read `R/frs_network_features.R` + `R/utils.R` as templates, then write `R/frs_point_match.R` diff --git a/planning/active/task_plan.md b/planning/active/task_plan.md new file mode 100644 index 0000000..7e916a2 --- /dev/null +++ b/planning/active/task_plan.md @@ -0,0 +1,47 @@ +# Task: frs_point_match: match two point datasets along FWA network within instream distance (#206) + +fresh has primitives for snapping points to FWA streams (`frs_point_snap`) and for finding upstream/downstream features per segment (`frs_network_features`), but no primitive for matching two **point datasets** along the network within a distance threshold. + +Concrete use case driving this: link's bcfp parity layer needs to reproduce bcfp's `02_pscis_streams_150m.sql` at `smnorris/bcfishpass@v0.7.14-125-g6e9cf1c` (current tunnel state, `bcfishpass.log.model_run_id=121` rebuilt 2026-05-05) — match PSCIS crossings to modelled crossings within 100m instream distance on the same stream, then keep the nearest PSCIS per modelled crossing. Currently link's `lnk_pipeline_crossings` is missing this layer, leaving modelled crossings duplicating PSCIS positions in the working schema; cascades into >+1000 false-positive anthropogenic barriers in BULK alone (see link's `research/bcfp_table_map.md`). + +## Phase 1: scaffold + R/frs_point_match.R + +- [ ] Read `R/frs_network_features.R` end-to-end as the template (the right v0.29.0+ template with per-side overrides, identifier validation pattern, sprintf composition) +- [ ] Read `R/utils.R` to find `.frs_validate_identifier()` and other private helpers +- [ ] Verify bcfp algorithm unchanged between local v0.7.13 source and tunnel v0.7.14-125 (already done — `git log` shows no commits to `02_pscis_streams_150m.sql` between those refs) +- [ ] Write `R/frs_point_match.R`: + - Signature per Approach (see `/Users/airvine/.claude/plans/snuggly-fluttering-hopper.md`) + - Input validation: identifier sanitization for `table_a`, `table_b`, `table_to`, `table_a_id_col`, `table_b_id_col`; numeric+positive check for `distance_max` + - SQL composition via sprintf template + - `frs_db_query()` or equivalent execute (no return value needed) + - Returns `invisible(conn)` + - Roxygen: `@family network`, `@export`, `@examples \dontrun{}`, NOT documented stream-name scoring (out of scope) + +## Phase 2: tests/testthat/test-frs_point_match.R + +- [ ] Tier 1 — validation tests (no DB): each arg validation path triggers `expect_error()`. Mirror the structure of `tests/testthat/test-frs_network_features.R` lines 6–124. +- [ ] Tier 2 — SQL composition tests (mocked `frs_db_query` via `withr::local_mocked_bindings`): capture the SQL string, `expect_match` for the key clauses (`DISTINCT ON`, `ABS(... - ...) < `, `blue_line_key = blue_line_key`, schema-qualified table refs). Mirror `test-frs_network_features.R` lines 127–374. +- [ ] Tier 3 — live DB integration test (`skip_if_not(.frs_db_available())`): create two small test point tables, run frs_point_match, verify expected matches and NULLs. + +## Phase 3: live byte-identical validation against bcfp + +- [ ] Run `frs_point_match` against `bcfishpass.pscis_assessment_svw` (filtered to ADMS) and `bcfishpass.modelled_stream_crossings` (filtered to ADMS), with `distance_max = 100`. +- [ ] Diff result vs `bcfishpass.pscis_streams_150m` for ADMS rows where `modelled_xing_dist_instream < 100` (the rows that would survive our 100m filter). +- [ ] Acceptance: every (stream_crossing_id, modelled_crossing_id) pair in our output matches bcfp's output. Document the diff in `tests/integration/` or PR body. +- [ ] Note: bcfp's `pscis_streams_150m` is a SCORING table (keeps multiple matches with name/width scores). Our output is the deduped subset (one match per PSCIS per stream). The byte-identical claim is on the final pairing, not the intermediate scoring. + +## Phase 4: release + +- [ ] DESCRIPTION: 0.29.0 → 0.30.0 (minor bump per R-package conventions — new exported function) +- [ ] NEWS.md: 0.30.0 entry — `frs_point_match` description + closes #206 +- [ ] `devtools::document()` to regenerate NAMESPACE + man/ +- [ ] `devtools::check()` clean (no ERROR/WARNING; pre-existing NOTEs OK if unchanged) +- [ ] `lintr::lint_package()` clean +- [ ] PR body cites the link parity context (link#154 consumes this primitive) + +## Validation + +- [ ] Tests pass +- [ ] `/code-check` clean on each commit +- [ ] PWF checkboxes match landed work +- [ ] `/planning-archive` on completion From 99e105a67d5788cd74b1f4b016a3d565fcd4e84c Mon Sep 17 00:00:00 2001 From: almac2022 Date: Mon, 11 May 2026 08:01:05 -0700 Subject: [PATCH 2/9] #206 Phase 1: frs_point_match scaffold R/frs_point_match.R + man/frs_point_match.Rd + NAMESPACE export. Function matches the fresh#206 issue body signature, follows table_* parameter convention from link/CLAUDE.md, uses .frs_validate_identifier() + .frs_db_execute() helpers, sprintf SQL composition mirroring frs_network_features pattern. Examples include PSCIS<->modelled (bcfp parity) + field-assessed<->user (generic dedup) cases. SQL implements bcfp's 02_pscis_streams_150m.sql algorithm (verified unchanged from local v0.7.13 source through tunnel v0.7.14-125-g6e9cf1c via git log): LEFT JOIN on same blue_line_key + ABS(drm_a - drm_b) < distance_max, DISTINCT ON (table_a_id, blue_line_key) ORDER BY distance_instream ASC NULLS LAST keeps the closest match (or NULL when no match). Lint clean; load_all signature verified. Tests + live byte-identical validation against bcfp follow in Phase 2+3. Co-Authored-By: Claude Opus 4.7 --- NAMESPACE | 1 + R/frs_point_match.R | 176 +++++++++++++++++++++++++++++++++++ man/frs_point_match.Rd | 116 +++++++++++++++++++++++ planning/active/task_plan.md | 10 +- 4 files changed, 298 insertions(+), 5 deletions(-) create mode 100644 R/frs_point_match.R create mode 100644 man/frs_point_match.Rd diff --git a/NAMESPACE b/NAMESPACE index 21655ea..ac901fb 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -36,6 +36,7 @@ export(frs_order_child) export(frs_order_filter) export(frs_params) export(frs_point_locate) +export(frs_point_match) export(frs_point_snap) export(frs_stream_fetch) export(frs_waterbody_network) diff --git a/R/frs_point_match.R b/R/frs_point_match.R new file mode 100644 index 0000000..d74e5f2 --- /dev/null +++ b/R/frs_point_match.R @@ -0,0 +1,176 @@ +#' Match Two Point Datasets Along FWA Network Within Instream Distance +#' +#' For each point in `table_a`, find the closest point in `table_b` on +#' the same FWA stream (`blue_line_key`) within `distance_max` instream +#' metres, and write the joined result to `table_to`. Each `table_a` +#' point links to at most one `table_b` point — the closest one within +#' the threshold; points with no match within the threshold appear in +#' the output with `` set to NULL. +#' +#' Generic over any pair of FWA-snapped point datasets (PSCIS to +#' modelled crossings, observations to habitat-confirmation points, +#' field-assessed crossings to user-added crossings, etc.). The +#' canonical bcfp use case it reproduces — PSCIS to modelled at 100m — +#' lives in `bcfishpass/model/01_access/pscis/sql/02_pscis_streams_150m.sql` +#' at `smnorris/bcfishpass@v0.7.14-125-g6e9cf1c` (current `bcfishpass.log` +#' tunnel state at the time of writing). +#' +#' @param conn A [DBI::DBIConnection-class] object pointing at fwapg. +#' @param table_a Character. Schema-qualified source table. Points to +#' match **from**. Must already be snapped to FWA — required columns +#' are `blue_line_key` and `downstream_route_measure` plus the ID +#' column named in `table_a_id_col`. +#' @param table_b Character. Schema-qualified target table. Points to +#' match **to**. Same column requirements as `table_a`. The +#' ID column named in `table_b_id_col` is the value carried over to +#' `table_to`. +#' @param table_to Character. Schema-qualified destination. Created by +#' this function via `DROP TABLE IF EXISTS` + `CREATE TABLE AS`. +#' Columns are all of `table_a`'s columns plus `` +#' (the matched target ID, nullable) plus `distance_instream` (numeric, +#' the absolute difference in `downstream_route_measure` between the +#' matched pair; NULL for unmatched rows). +#' @param distance_max Numeric scalar. Maximum instream distance in +#' metres. Computed as +#' `ABS(table_a.downstream_route_measure - table_b.downstream_route_measure)`. +#' bcfp's PSCIS↔modelled case uses 100. +#' @param table_a_id_col Character. Default `"id"`. The unique-key +#' column on `table_a`. +#' @param table_b_id_col Character. Default `"id"`. The unique-key +#' column on `table_b` carried forward into `table_to`. +#' +#' @return `conn` invisibly, for piping. Side effect: drops + recreates +#' `table_to`. +#' +#' @details +#' Network-position columns (`blue_line_key`, `downstream_route_measure`) +#' are hard-coded to the FWA convention. Per-side overrides (à la +#' [frs_network_features()] post-fresh#204) can be added if a real +#' divergence appears. +#' +#' **Dedup semantics**: SQL `DISTINCT ON (table_a_id, blue_line_key) +#' ORDER BY distance_instream ASC NULLS LAST` ensures each `table_a` +#' row appears once per `blue_line_key`. The closest non-NULL match +#' wins. Unmatched rows survive (LEFT JOIN keeps them; `NULLS LAST` +#' makes them lose to any real match). +#' +#' **Out of scope**: stream-name scoring (bcfp's `name_score`, +#' `width_order_score`) — those are descriptive evaluation columns; +#' callers wanting them apply downstream of this primitive. +#' +#' @family network +#' +#' @export +#' +#' @examples +#' \dontrun{ +#' conn <- frs_db_conn() +#' +#' # PSCIS ↔ modelled crossings at 100m instream distance (bcfp parity) +#' frs_point_match( +#' conn, +#' table_a = "working_adms.pscis_assessment_snapped", +#' table_b = "fresh.modelled_stream_crossings", +#' table_to = "working_adms.pscis", +#' distance_max = 100, +#' table_a_id_col = "stream_crossing_id", +#' table_b_id_col = "modelled_crossing_id" +#' ) +#' +#' # Field-assessed crossings vs user-added crossings (deduplication) +#' frs_point_match( +#' conn, +#' table_a = "wsg_adms.crossings_field", +#' table_b = "wsg_adms.crossings_user", +#' table_to = "wsg_adms.crossings_matched", +#' distance_max = 50, +#' table_a_id_col = "field_id", +#' table_b_id_col = "user_id" +#' ) +#' +#' DBI::dbDisconnect(conn) +#' } +frs_point_match <- function( + conn, + table_a, + table_b, + table_to, + distance_max, + table_a_id_col = "id", + table_b_id_col = "id") { + + if (missing(table_a) || !is.character(table_a) || + length(table_a) != 1L || !nzchar(table_a)) { + stop("`table_a` is required (no default).", call. = FALSE) + } + if (missing(table_b) || !is.character(table_b) || + length(table_b) != 1L || !nzchar(table_b)) { + stop("`table_b` is required (no default).", call. = FALSE) + } + if (missing(table_to) || !is.character(table_to) || + length(table_to) != 1L || !nzchar(table_to)) { + stop("`table_to` is required (no default).", call. = FALSE) + } + if (missing(distance_max) || !is.numeric(distance_max) || + length(distance_max) != 1L || is.na(distance_max) || + distance_max <= 0) { + stop("`distance_max` must be a single positive numeric.", call. = FALSE) + } + + .frs_validate_identifier(table_a, "table_a") + .frs_validate_identifier(table_b, "table_b") + .frs_validate_identifier(table_to, "table_to") + .frs_validate_identifier(table_a_id_col, "table_a_id_col") + .frs_validate_identifier(table_b_id_col, "table_b_id_col") + + if (identical(table_a_id_col, table_b_id_col)) { + stop("`table_a_id_col` and `table_b_id_col` must differ; the output ", + "carries both columns side-by-side, so identical names would ", + "collide. Alias one of them in a CTE upstream if the underlying ", + "ID column names are the same.", call. = FALSE) + } + + # SQL composition. Argument order in sprintf: + # 1 = table_a (FROM) + # 2 = table_b (LEFT JOIN) + # 3 = table_to (CREATE TABLE) + # 4 = table_a_id_col (DISTINCT ON + ORDER BY) + # 5 = table_b_id_col (the linking column carried to output) + # 6 = distance_max (numeric literal in the join predicate) + # + # LEFT JOIN preserves all table_a rows even when there's no match + # within `distance_max` on the same blue_line_key. The DISTINCT ON + # then keeps one row per (table_a_id, blue_line_key) — the matched + # one if it exists, the un-matched row otherwise. NULLS LAST on + # distance_instream ensures real matches outrank unmatched rows. + sql_fmt <- " + DROP TABLE IF EXISTS %3$s; + CREATE TABLE %3$s AS + SELECT DISTINCT ON (a.%4$s, a.blue_line_key) + a.*, + b.%5$s AS %5$s, + ABS(a.downstream_route_measure - b.downstream_route_measure) + AS distance_instream + FROM %1$s a + LEFT JOIN %2$s b + ON a.blue_line_key = b.blue_line_key + AND ABS(a.downstream_route_measure - b.downstream_route_measure) + < %6$s + ORDER BY a.%4$s, a.blue_line_key, + ABS(a.downstream_route_measure - b.downstream_route_measure) + ASC NULLS LAST" + + sql <- sprintf( + sql_fmt, + table_a, + table_b, + table_to, + table_a_id_col, + table_b_id_col, + .frs_sql_num(distance_max) + ) + + .frs_db_execute(conn, sql) + + invisible(conn) +} diff --git a/man/frs_point_match.Rd b/man/frs_point_match.Rd new file mode 100644 index 0000000..54415df --- /dev/null +++ b/man/frs_point_match.Rd @@ -0,0 +1,116 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/frs_point_match.R +\name{frs_point_match} +\alias{frs_point_match} +\title{Match Two Point Datasets Along FWA Network Within Instream Distance} +\usage{ +frs_point_match( + conn, + table_a, + table_b, + table_to, + distance_max, + table_a_id_col = "id", + table_b_id_col = "id" +) +} +\arguments{ +\item{conn}{A \link[DBI:DBIConnection-class]{DBI::DBIConnection} object pointing at fwapg.} + +\item{table_a}{Character. Schema-qualified source table. Points to +match \strong{from}. Must already be snapped to FWA — required columns +are \code{blue_line_key} and \code{downstream_route_measure} plus the ID +column named in \code{table_a_id_col}.} + +\item{table_b}{Character. Schema-qualified target table. Points to +match \strong{to}. Same column requirements as \code{table_a}. The +ID column named in \code{table_b_id_col} is the value carried over to +\code{table_to}.} + +\item{table_to}{Character. Schema-qualified destination. Created by +this function via \verb{DROP TABLE IF EXISTS} + \verb{CREATE TABLE AS}. +Columns are all of \code{table_a}'s columns plus \verb{} +(the matched target ID, nullable) plus \code{distance_instream} (numeric, +the absolute difference in \code{downstream_route_measure} between the +matched pair; NULL for unmatched rows).} + +\item{distance_max}{Numeric scalar. Maximum instream distance in +metres. Computed as +\code{ABS(table_a.downstream_route_measure - table_b.downstream_route_measure)}. +bcfp's PSCIS↔modelled case uses 100.} + +\item{table_a_id_col}{Character. Default \code{"id"}. The unique-key +column on \code{table_a}.} + +\item{table_b_id_col}{Character. Default \code{"id"}. The unique-key +column on \code{table_b} carried forward into \code{table_to}.} +} +\value{ +\code{conn} invisibly, for piping. Side effect: drops + recreates +\code{table_to}. +} +\description{ +For each point in \code{table_a}, find the closest point in \code{table_b} on +the same FWA stream (\code{blue_line_key}) within \code{distance_max} instream +metres, and write the joined result to \code{table_to}. Each \code{table_a} +point links to at most one \code{table_b} point — the closest one within +the threshold; points with no match within the threshold appear in +the output with \verb{} set to NULL. +} +\details{ +Generic over any pair of FWA-snapped point datasets (PSCIS to +modelled crossings, observations to habitat-confirmation points, +field-assessed crossings to user-added crossings, etc.). The +canonical bcfp use case it reproduces — PSCIS to modelled at 100m — +lives in \verb{bcfishpass/model/01_access/pscis/sql/02_pscis_streams_150m.sql} +at \code{smnorris/bcfishpass@v0.7.14-125-g6e9cf1c} (current \code{bcfishpass.log} +tunnel state at the time of writing). + +Network-position columns (\code{blue_line_key}, \code{downstream_route_measure}) +are hard-coded to the FWA convention. Per-side overrides (à la +\code{\link[=frs_network_features]{frs_network_features()}} post-fresh#204) can be added if a real +divergence appears. + +\strong{Dedup semantics}: SQL \verb{DISTINCT ON (table_a_id, blue_line_key) ORDER BY distance_instream ASC NULLS LAST} ensures each \code{table_a} +row appears once per \code{blue_line_key}. The closest non-NULL match +wins. Unmatched rows survive (LEFT JOIN keeps them; \verb{NULLS LAST} +makes them lose to any real match). + +\strong{Out of scope}: stream-name scoring (bcfp's \code{name_score}, +\code{width_order_score}) — those are descriptive evaluation columns; +callers wanting them apply downstream of this primitive. +} +\examples{ +\dontrun{ +conn <- frs_db_conn() + +# PSCIS ↔ modelled crossings at 100m instream distance (bcfp parity) +frs_point_match( + conn, + table_a = "working_adms.pscis_assessment_snapped", + table_b = "fresh.modelled_stream_crossings", + table_to = "working_adms.pscis", + distance_max = 100, + table_a_id_col = "stream_crossing_id", + table_b_id_col = "modelled_crossing_id" +) + +# Field-assessed crossings vs user-added crossings (deduplication) +frs_point_match( + conn, + table_a = "wsg_adms.crossings_field", + table_b = "wsg_adms.crossings_user", + table_to = "wsg_adms.crossings_matched", + distance_max = 50, + table_a_id_col = "field_id", + table_b_id_col = "user_id" +) + +DBI::dbDisconnect(conn) +} +} +\seealso{ +Other network: +\code{\link{frs_network_features}()} +} +\concept{network} diff --git a/planning/active/task_plan.md b/planning/active/task_plan.md index 7e916a2..95299fe 100644 --- a/planning/active/task_plan.md +++ b/planning/active/task_plan.md @@ -6,14 +6,14 @@ Concrete use case driving this: link's bcfp parity layer needs to reproduce bcfp ## Phase 1: scaffold + R/frs_point_match.R -- [ ] Read `R/frs_network_features.R` end-to-end as the template (the right v0.29.0+ template with per-side overrides, identifier validation pattern, sprintf composition) -- [ ] Read `R/utils.R` to find `.frs_validate_identifier()` and other private helpers -- [ ] Verify bcfp algorithm unchanged between local v0.7.13 source and tunnel v0.7.14-125 (already done — `git log` shows no commits to `02_pscis_streams_150m.sql` between those refs) -- [ ] Write `R/frs_point_match.R`: +- [x] Read `R/frs_network_features.R` end-to-end as the template (the right v0.29.0+ template with per-side overrides, identifier validation pattern, sprintf composition) +- [x] Read `R/utils.R` to find `.frs_validate_identifier()` and other private helpers +- [x] Verify bcfp algorithm unchanged between local v0.7.13 source and tunnel v0.7.14-125 (already done — `git log` shows no commits to `02_pscis_streams_150m.sql` between those refs) +- [x] Write `R/frs_point_match.R`: - Signature per Approach (see `/Users/airvine/.claude/plans/snuggly-fluttering-hopper.md`) - Input validation: identifier sanitization for `table_a`, `table_b`, `table_to`, `table_a_id_col`, `table_b_id_col`; numeric+positive check for `distance_max` - SQL composition via sprintf template - - `frs_db_query()` or equivalent execute (no return value needed) + - `.frs_db_execute()` (no return value needed — DDL) - Returns `invisible(conn)` - Roxygen: `@family network`, `@export`, `@examples \dontrun{}`, NOT documented stream-name scoring (out of scope) From 44f54c6c83a47bc3030a73588a9b5dde48b7e237 Mon Sep 17 00:00:00 2001 From: almac2022 Date: Mon, 11 May 2026 08:01:23 -0700 Subject: [PATCH 3/9] #206 Phase 1: regenerate frs_network_features.Rd for @family network cross-ref Co-Authored-By: Claude Opus 4.7 --- man/frs_network_features.Rd | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/man/frs_network_features.Rd b/man/frs_network_features.Rd index cbb0fdc..f71e6d0 100644 --- a/man/frs_network_features.Rd +++ b/man/frs_network_features.Rd @@ -160,4 +160,8 @@ obs_per_segment$feature_ids[[1]] # character vector DBI::dbDisconnect(conn) } } +\seealso{ +Other network: +\code{\link{frs_point_match}()} +} \concept{network} From b8d0147836ff6c1ab597dec89df9fd5bbaff961b Mon Sep 17 00:00:00 2001 From: almac2022 Date: Mon, 11 May 2026 08:19:14 -0700 Subject: [PATCH 4/9] #206 Phase 2: frs_point_match tests (20 pass) tests/testthat/test-frs_point_match.R mirrors frs_network_features test structure: Tier 1 -- validation (7 tests): required args, distance_max scalar positive numeric, .frs_validate_identifier rejection, table_a_id_col must differ from table_b_id_col. Tier 2 -- SQL composition via local_mocked_bindings on .frs_db_execute (5 tests): DROP + CREATE + DISTINCT ON + same-blk join, distance_max numeric literal in predicate, LEFT JOIN preserves unmatched rows, ASC NULLS LAST for closest-wins tiebreak, table_b_id_col carried to SELECT. Total: 20 PASS / 0 FAIL on filter=frs_point_match. Full suite: 987 PASS / 2 FAIL / 3 WARN -- both FAIL and the WARNs are pre-existing on main (verified via git checkout + re-run); not caused by this change. Specifically test-frs_params.R:92 fails the same way before and after. Tier 3 (live DB integration) deferred to Phase 3 byte-identical validation against bcfp -- consistent with frs_network_features's own Phase 3 still being TODO per its file header. Co-Authored-By: Claude Opus 4.7 --- planning/active/task_plan.md | 6 +- tests/testthat/test-frs_point_match.R | 199 ++++++++++++++++++++++++++ 2 files changed, 202 insertions(+), 3 deletions(-) create mode 100644 tests/testthat/test-frs_point_match.R diff --git a/planning/active/task_plan.md b/planning/active/task_plan.md index 95299fe..f09f911 100644 --- a/planning/active/task_plan.md +++ b/planning/active/task_plan.md @@ -19,9 +19,9 @@ Concrete use case driving this: link's bcfp parity layer needs to reproduce bcfp ## Phase 2: tests/testthat/test-frs_point_match.R -- [ ] Tier 1 — validation tests (no DB): each arg validation path triggers `expect_error()`. Mirror the structure of `tests/testthat/test-frs_network_features.R` lines 6–124. -- [ ] Tier 2 — SQL composition tests (mocked `frs_db_query` via `withr::local_mocked_bindings`): capture the SQL string, `expect_match` for the key clauses (`DISTINCT ON`, `ABS(... - ...) < `, `blue_line_key = blue_line_key`, schema-qualified table refs). Mirror `test-frs_network_features.R` lines 127–374. -- [ ] Tier 3 — live DB integration test (`skip_if_not(.frs_db_available())`): create two small test point tables, run frs_point_match, verify expected matches and NULLs. +- [x] Tier 1 — validation tests (no DB): each arg validation path triggers `expect_error()`. Mirror the structure of `tests/testthat/test-frs_network_features.R` lines 6–124. +- [x] Tier 2 — SQL composition tests (mocked `.frs_db_execute` via `local_mocked_bindings`): capture the SQL string, `expect_match` for the key clauses (`DISTINCT ON`, `ABS(... - ...) < `, `blue_line_key = blue_line_key`, schema-qualified table refs). Mirror `test-frs_network_features.R` lines 127–374. +- [ ] Tier 3 — live DB integration test: handled as part of Phase 3 byte-identical validation against bcfp (using fresh's existing live-DB test infrastructure isn't currently scoped — frs_network_features Phase 3 is also still TODO per its file header comment). ## Phase 3: live byte-identical validation against bcfp diff --git a/tests/testthat/test-frs_point_match.R b/tests/testthat/test-frs_point_match.R new file mode 100644 index 0000000..d325ae9 --- /dev/null +++ b/tests/testthat/test-frs_point_match.R @@ -0,0 +1,199 @@ +# Tests for frs_point_match(). +# Phase 1 — validation (no DB). Phase 2 — SQL composition (mocked +# .frs_db_execute). Phase 3 (live DB byte-identical against bcfp's +# pscis_streams_150m) handled outside testthat — see +# planning/active/findings.md for the procedure. + +# ----- Phase 1: validation ----- + +test_that("`table_a` is required (no default)", { + expect_error( + frs_point_match( + conn = "mock", + table_b = "fresh.modelled_stream_crossings", + table_to = "working_adms.pscis", + distance_max = 100, + table_a_id_col = "stream_crossing_id", + table_b_id_col = "modelled_crossing_id" + ), + regexp = "`table_a` is required" + ) +}) + +test_that("`table_b` is required (no default)", { + expect_error( + frs_point_match( + conn = "mock", + table_a = "working_adms.pscis_assessment_snapped", + table_to = "working_adms.pscis", + distance_max = 100, + table_a_id_col = "stream_crossing_id", + table_b_id_col = "modelled_crossing_id" + ), + regexp = "`table_b` is required" + ) +}) + +test_that("`table_to` is required (no default)", { + expect_error( + frs_point_match( + conn = "mock", + table_a = "working_adms.pscis_assessment_snapped", + table_b = "fresh.modelled_stream_crossings", + distance_max = 100, + table_a_id_col = "stream_crossing_id", + table_b_id_col = "modelled_crossing_id" + ), + regexp = "`table_to` is required" + ) +}) + +test_that("`distance_max` must be positive scalar numeric", { + base_args <- list( + conn = "mock", + table_a = "working_adms.pscis_assessment_snapped", + table_b = "fresh.modelled_stream_crossings", + table_to = "working_adms.pscis", + table_a_id_col = "stream_crossing_id", + table_b_id_col = "modelled_crossing_id" + ) + + expect_error(do.call(frs_point_match, c(base_args, list(distance_max = -1))), + regexp = "positive numeric") + expect_error(do.call(frs_point_match, c(base_args, list(distance_max = 0))), + regexp = "positive numeric") + expect_error(do.call(frs_point_match, c(base_args, list(distance_max = c(100, 200)))), + regexp = "positive numeric") + expect_error(do.call(frs_point_match, c(base_args, list(distance_max = NA_real_))), + regexp = "positive numeric") + expect_error(do.call(frs_point_match, c(base_args, list(distance_max = "100"))), + regexp = "positive numeric") +}) + +test_that("identifiers reject characters outside [A-Za-z_][A-Za-z0-9_.]*", { + base_args <- list( + conn = "mock", + table_a = "working_adms.pscis_assessment_snapped", + table_b = "fresh.modelled_stream_crossings", + table_to = "working_adms.pscis", + distance_max = 100, + table_a_id_col = "stream_crossing_id", + table_b_id_col = "modelled_crossing_id" + ) + + expect_error(do.call(frs_point_match, + modifyList(base_args, list(table_a = "schema; DROP TABLE x"))), + regexp = "invalid characters") + expect_error(do.call(frs_point_match, + modifyList(base_args, list(table_b_id_col = "id with spaces"))), + regexp = "invalid characters") +}) + +test_that("`table_a_id_col` and `table_b_id_col` must differ", { + expect_error( + frs_point_match( + conn = "mock", + table_a = "schema_a.points", + table_b = "schema_b.points", + table_to = "schema_out.matched", + distance_max = 100, + table_a_id_col = "id", + table_b_id_col = "id" + ), + regexp = "must differ" + ) +}) + +# ----- Phase 2: SQL composition (mocked .frs_db_execute) ----- + +with_captured_sql <- function(call_expr) { + captured <- NULL + local_mocked_bindings( + .frs_db_execute = function(conn, sql) { + captured <<- sql + 0L + } + ) + call_expr() + captured +} + +test_that("SQL composes DROP + CREATE + same-blk join + DISTINCT ON", { + sql <- with_captured_sql(function() { + frs_point_match( + conn = "mock", + table_a = "working_adms.pscis_assessment_snapped", + table_b = "fresh.modelled_stream_crossings", + table_to = "working_adms.pscis", + distance_max = 100, + table_a_id_col = "stream_crossing_id", + table_b_id_col = "modelled_crossing_id" + ) + }) + expect_match(sql, "DROP TABLE IF EXISTS working_adms\\.pscis") + expect_match(sql, "CREATE TABLE working_adms\\.pscis AS") + expect_match(sql, "DISTINCT ON \\(a\\.stream_crossing_id, a\\.blue_line_key\\)") + expect_match(sql, "a\\.blue_line_key = b\\.blue_line_key") +}) + +test_that("distance_max appears as a numeric literal in the join predicate", { + sql <- with_captured_sql(function() { + frs_point_match( + conn = "mock", + table_a = "schema_a.x", + table_b = "schema_b.y", + table_to = "schema_out.z", + distance_max = 100, + table_a_id_col = "a_id", + table_b_id_col = "b_id" + ) + }) + # ABS(a.drm - b.drm) < 100 + expect_match(sql, "ABS\\(a\\.downstream_route_measure - b\\.downstream_route_measure\\)\\s*\\n?\\s*<\\s*100") +}) + +test_that("LEFT JOIN preserves table_a rows with no match", { + sql <- with_captured_sql(function() { + frs_point_match( + conn = "mock", + table_a = "schema_a.x", + table_b = "schema_b.y", + table_to = "schema_out.z", + distance_max = 100, + table_a_id_col = "a_id", + table_b_id_col = "b_id" + ) + }) + expect_match(sql, "LEFT JOIN schema_b\\.y b") + expect_no_match(sql, "INNER JOIN") +}) + +test_that("ORDER BY uses distance_instream ASC NULLS LAST for dedup tiebreak", { + sql <- with_captured_sql(function() { + frs_point_match( + conn = "mock", + table_a = "schema_a.x", + table_b = "schema_b.y", + table_to = "schema_out.z", + distance_max = 100, + table_a_id_col = "a_id", + table_b_id_col = "b_id" + ) + }) + expect_match(sql, "ASC NULLS LAST") +}) + +test_that("table_b_id_col carried through SELECT as named column", { + sql <- with_captured_sql(function() { + frs_point_match( + conn = "mock", + table_a = "schema_a.x", + table_b = "schema_b.y", + table_to = "schema_out.z", + distance_max = 100, + table_a_id_col = "a_id", + table_b_id_col = "b_id" + ) + }) + expect_match(sql, "b\\.b_id AS b_id") +}) From a3fb6d40954bdacd96b08b6d060b1f8848d775c4 Mon Sep 17 00:00:00 2001 From: almac2022 Date: Mon, 11 May 2026 08:22:07 -0700 Subject: [PATCH 5/9] #206 Phase 3: split DROP+CREATE; live ADMS byte-identical 60/60 Bug surfaced by live integration test: RPostgres can't run multi- statement SQL ("cannot insert multiple commands into a prepared statement"). Split DROP TABLE IF EXISTS and CREATE TABLE AS into two .frs_db_execute calls. Updated tests to capture both SQL calls via a list-accumulator mock. Live byte-identical validation against bcfp tunnel (smnorris/bcfishpass@v0.7.14-125-g6e9cf1c via bcfishpass.log model_run_id=121, 2026-05-05): ours: 60 matches | ref: 60 matches (stream_crossing_id, modelled_crossing_id) pairs identical: 60 only in ours: 0 only in ref: 0 Script captured at /tmp/fresh_206_live_validation.R for PR body attachment. Tests: 21 PASS / 0 FAIL on filter=frs_point_match. Reference is bcfishpass.pscis.modelled_crossing_id (the canonical post-dedup linkage) not bcfishpass.pscis_streams_150m (the pre-dedup scoring intermediate) -- documented in findings + task_plan. Co-Authored-By: Claude Opus 4.7 --- R/frs_point_match.R | 6 +++- planning/active/task_plan.md | 12 +++++--- tests/testthat/test-frs_point_match.R | 40 +++++++++++++++------------ 3 files changed, 36 insertions(+), 22 deletions(-) diff --git a/R/frs_point_match.R b/R/frs_point_match.R index d74e5f2..c49ec71 100644 --- a/R/frs_point_match.R +++ b/R/frs_point_match.R @@ -143,8 +143,12 @@ frs_point_match <- function( # then keeps one row per (table_a_id, blue_line_key) — the matched # one if it exists, the un-matched row otherwise. NULLS LAST on # distance_instream ensures real matches outrank unmatched rows. + # RPostgres can't run multi-statement SQL in a single dbExecute call + # ("cannot insert multiple commands into a prepared statement"), so + # DROP and CREATE go in separate dispatches. + .frs_db_execute(conn, sprintf("DROP TABLE IF EXISTS %s", table_to)) + sql_fmt <- " - DROP TABLE IF EXISTS %3$s; CREATE TABLE %3$s AS SELECT DISTINCT ON (a.%4$s, a.blue_line_key) a.*, diff --git a/planning/active/task_plan.md b/planning/active/task_plan.md index f09f911..6013432 100644 --- a/planning/active/task_plan.md +++ b/planning/active/task_plan.md @@ -25,10 +25,14 @@ Concrete use case driving this: link's bcfp parity layer needs to reproduce bcfp ## Phase 3: live byte-identical validation against bcfp -- [ ] Run `frs_point_match` against `bcfishpass.pscis_assessment_svw` (filtered to ADMS) and `bcfishpass.modelled_stream_crossings` (filtered to ADMS), with `distance_max = 100`. -- [ ] Diff result vs `bcfishpass.pscis_streams_150m` for ADMS rows where `modelled_xing_dist_instream < 100` (the rows that would survive our 100m filter). -- [ ] Acceptance: every (stream_crossing_id, modelled_crossing_id) pair in our output matches bcfp's output. Document the diff in `tests/integration/` or PR body. -- [ ] Note: bcfp's `pscis_streams_150m` is a SCORING table (keeps multiple matches with name/width scores). Our output is the deduped subset (one match per PSCIS per stream). The byte-identical claim is on the final pairing, not the intermediate scoring. +- [x] Run `frs_point_match` against `bcfishpass.pscis` (filtered to ADMS, has linkage already populated by bcfp) and `bcfishpass.modelled_stream_crossings` (filtered to ADMS), with `distance_max = 100`. +- [x] Diff result vs `bcfishpass.pscis.modelled_crossing_id` (the canonical bcfp output of the snap+dedup) for ADMS rows. +- [x] **Acceptance met: 60 / 60 (stream_crossing_id, modelled_crossing_id) pairs identical. 0 in ours-not-ref. 0 in ref-not-ours.** +- [x] Note: bcfp's `pscis_streams_150m` is a SCORING intermediate (multiple matches per PSCIS pre-dedup). The canonical post-dedup linkage lives on `bcfishpass.pscis.modelled_crossing_id`. Our output is the deduped subset — byte-identical to that. + +### Live test script + +Captured at `/tmp/fresh_206_live_validation.R` for the PR body — runs against the bcfp tunnel, stages PSCIS+modelled subsets for ADMS, calls frs_point_match, diffs result vs `bcfishpass.pscis.modelled_crossing_id`. ## Phase 4: release diff --git a/tests/testthat/test-frs_point_match.R b/tests/testthat/test-frs_point_match.R index d325ae9..fc55a1a 100644 --- a/tests/testthat/test-frs_point_match.R +++ b/tests/testthat/test-frs_point_match.R @@ -107,10 +107,12 @@ test_that("`table_a_id_col` and `table_b_id_col` must differ", { # ----- Phase 2: SQL composition (mocked .frs_db_execute) ----- with_captured_sql <- function(call_expr) { - captured <- NULL + # RPostgres requires one statement per dbExecute call, so the + # function dispatches DROP and CREATE separately. Capture both. + captured <- list() local_mocked_bindings( .frs_db_execute = function(conn, sql) { - captured <<- sql + captured[[length(captured) + 1L]] <<- sql 0L } ) @@ -119,7 +121,7 @@ with_captured_sql <- function(call_expr) { } test_that("SQL composes DROP + CREATE + same-blk join + DISTINCT ON", { - sql <- with_captured_sql(function() { + sqls <- with_captured_sql(function() { frs_point_match( conn = "mock", table_a = "working_adms.pscis_assessment_snapped", @@ -130,14 +132,15 @@ test_that("SQL composes DROP + CREATE + same-blk join + DISTINCT ON", { table_b_id_col = "modelled_crossing_id" ) }) - expect_match(sql, "DROP TABLE IF EXISTS working_adms\\.pscis") - expect_match(sql, "CREATE TABLE working_adms\\.pscis AS") - expect_match(sql, "DISTINCT ON \\(a\\.stream_crossing_id, a\\.blue_line_key\\)") - expect_match(sql, "a\\.blue_line_key = b\\.blue_line_key") + expect_length(sqls, 2L) + expect_match(sqls[[1]], "DROP TABLE IF EXISTS working_adms\\.pscis") + expect_match(sqls[[2]], "CREATE TABLE working_adms\\.pscis AS") + expect_match(sqls[[2]], "DISTINCT ON \\(a\\.stream_crossing_id, a\\.blue_line_key\\)") + expect_match(sqls[[2]], "a\\.blue_line_key = b\\.blue_line_key") }) test_that("distance_max appears as a numeric literal in the join predicate", { - sql <- with_captured_sql(function() { + sqls <- with_captured_sql(function() { frs_point_match( conn = "mock", table_a = "schema_a.x", @@ -148,12 +151,15 @@ test_that("distance_max appears as a numeric literal in the join predicate", { table_b_id_col = "b_id" ) }) - # ABS(a.drm - b.drm) < 100 - expect_match(sql, "ABS\\(a\\.downstream_route_measure - b\\.downstream_route_measure\\)\\s*\\n?\\s*<\\s*100") + # ABS(a.drm - b.drm) < 100 (in the CREATE statement, sqls[[2]]) + expect_match( + sqls[[2]], + "ABS\\(a\\.downstream_route_measure - b\\.downstream_route_measure\\)\\s*\\n?\\s*<\\s*100" + ) }) test_that("LEFT JOIN preserves table_a rows with no match", { - sql <- with_captured_sql(function() { + sqls <- with_captured_sql(function() { frs_point_match( conn = "mock", table_a = "schema_a.x", @@ -164,12 +170,12 @@ test_that("LEFT JOIN preserves table_a rows with no match", { table_b_id_col = "b_id" ) }) - expect_match(sql, "LEFT JOIN schema_b\\.y b") - expect_no_match(sql, "INNER JOIN") + expect_match(sqls[[2]], "LEFT JOIN schema_b\\.y b") + expect_no_match(sqls[[2]], "INNER JOIN") }) test_that("ORDER BY uses distance_instream ASC NULLS LAST for dedup tiebreak", { - sql <- with_captured_sql(function() { + sqls <- with_captured_sql(function() { frs_point_match( conn = "mock", table_a = "schema_a.x", @@ -180,11 +186,11 @@ test_that("ORDER BY uses distance_instream ASC NULLS LAST for dedup tiebreak", { table_b_id_col = "b_id" ) }) - expect_match(sql, "ASC NULLS LAST") + expect_match(sqls[[2]], "ASC NULLS LAST") }) test_that("table_b_id_col carried through SELECT as named column", { - sql <- with_captured_sql(function() { + sqls <- with_captured_sql(function() { frs_point_match( conn = "mock", table_a = "schema_a.x", @@ -195,5 +201,5 @@ test_that("table_b_id_col carried through SELECT as named column", { table_b_id_col = "b_id" ) }) - expect_match(sql, "b\\.b_id AS b_id") + expect_match(sqls[[2]], "b\\.b_id AS b_id") }) From 57c50c18f6098f88746aee25f95e4974e98b7b72 Mon Sep 17 00:00:00 2001 From: almac2022 Date: Mon, 11 May 2026 08:26:01 -0700 Subject: [PATCH 6/9] =?UTF-8?q?#206=20Phase=204:=20release=20prep=20?= =?UTF-8?q?=E2=80=94=20bump=20DESCRIPTION=20+=20NEWS.md?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit DESCRIPTION 0.29.0 -> 0.30.0 (minor bump per R-package conventions for new exported function). NEWS.md 0.30.0 entry covers: - generic over any FWA-snapped point-pair use case - algorithm pinned to smnorris/bcfishpass@v0.7.14-125-g6e9cf1c - table_* parameter convention (per link/CLAUDE.md) - write-to-table contract + RPostgres two-call split rationale - live parity numbers (60/60 byte-identical ADMS) - test coverage summary (21 tests) - first consumer is link#154 R CMD check (--no-tests): 0 errors / 4 warnings / 4 notes -- identical to main per side-by-side check. lintr clean on the new R file. Co-Authored-By: Claude Opus 4.7 --- DESCRIPTION | 2 +- NEWS.md | 13 +++++++++++++ planning/active/task_plan.md | 12 ++++++------ 3 files changed, 20 insertions(+), 7 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index ba75101..f3a86c6 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: fresh Title: Freshwater Referenced Spatial Hydrology -Version: 0.29.0 +Version: 0.30.0 Authors@R: c( person("Allan", "Irvine", , "al@newgraphenvironment.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-3495-2128")), diff --git a/NEWS.md b/NEWS.md index 2054d87..7f4d56b 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,16 @@ +# fresh 0.30.0 + +Closes [#206](https://github.com/NewGraphEnvironment/fresh/issues/206). Adds `frs_point_match()` — a third primitive in the point-handling family alongside `frs_point_snap` (point↔stream) and `frs_network_features` (segment↔feature dnstr/upstr). Matches two point datasets along the FWA stream network within an instream-distance threshold and writes the joined result to a destination table. + +- Generic over any pair of FWA-snapped point datasets (PSCIS↔modelled crossings, observations↔habitat confirmations, field-assessed↔user-added crossing dedup, etc.). All point inputs must already carry `blue_line_key` and `downstream_route_measure` (the FWA convention) — typically via `frs_point_snap` upstream. +- Algorithm mirrors bcfp's `02_pscis_streams_150m.sql` (at `smnorris/bcfishpass@v0.7.14-125-g6e9cf1c`, tunnel `bcfishpass.log.model_run_id=121` rebuilt 2026-05-05) — same-`blue_line_key` join + `ABS(drm_a - drm_b) < distance_max` + `DISTINCT ON (table_a_id, blue_line_key) ORDER BY distance_instream ASC NULLS LAST`. LEFT JOIN preserves `table_a` rows with no match (their `table_b_id_col` ends up NULL). +- Function parameters follow the `table_*` convention from link/CLAUDE.md: `table_a` / `table_b` / `table_to` for the three tables, `table_a_id_col` / `table_b_id_col` for the ID columns. +- Network-position columns (`blue_line_key`, `downstream_route_measure`) hard-coded to the FWA convention. Per-side overrides (à la `frs_network_features` v0.29.0) can be added if a real divergence appears. +- Write-to-table contract: drops + recreates `table_to` via two separate `DBI::dbExecute` calls (RPostgres requires one statement per call). Returns `conn` invisibly. Different from `frs_point_snap` (returns sf) and `frs_network_features` (returns tibble) because the result here is a derived *dataset* not a query result, and bcfp's analog also writes table→table. +- Live parity on ADMS PSCIS↔modelled at 100m instream: **60 / 60 (stream_crossing_id, modelled_crossing_id) pairs byte-identical** to `bcfishpass.pscis.modelled_crossing_id`. 0 in ours-not-ref, 0 in ref-not-ours. +- 21 mocked tests covering validation (required args, scalar positive numeric `distance_max`, identifier sanitization, same-name guard) and SQL composition (DROP + CREATE order, DISTINCT ON, same-`blue_line_key` join predicate, `distance_max` literal, LEFT JOIN, ASC NULLS LAST tiebreak, ID-column carry-through). +- First consumer: link#154 (`lnk_pipeline_crossings: missing PSCIS↔modelled 100m-instream auto-snap layer`) which wires this primitive into link's per-WSG crossings build. + # fresh 0.29.0 Closes [#204](https://github.com/NewGraphEnvironment/fresh/issues/204). Two ergonomic upgrades to `frs_network_features()` surfaced when wiring it into link's `lnk_pipeline_access` (link#124). diff --git a/planning/active/task_plan.md b/planning/active/task_plan.md index 6013432..9524a07 100644 --- a/planning/active/task_plan.md +++ b/planning/active/task_plan.md @@ -36,12 +36,12 @@ Captured at `/tmp/fresh_206_live_validation.R` for the PR body — runs against ## Phase 4: release -- [ ] DESCRIPTION: 0.29.0 → 0.30.0 (minor bump per R-package conventions — new exported function) -- [ ] NEWS.md: 0.30.0 entry — `frs_point_match` description + closes #206 -- [ ] `devtools::document()` to regenerate NAMESPACE + man/ -- [ ] `devtools::check()` clean (no ERROR/WARNING; pre-existing NOTEs OK if unchanged) -- [ ] `lintr::lint_package()` clean -- [ ] PR body cites the link parity context (link#154 consumes this primitive) +- [x] DESCRIPTION: 0.29.0 → 0.30.0 (minor bump per R-package conventions — new exported function) +- [x] NEWS.md: 0.30.0 entry — `frs_point_match` description + closes #206 +- [x] `devtools::document()` regenerated NAMESPACE + man/frs_point_match.Rd + frs_network_features.Rd cross-ref (committed in Phase 1) +- [x] `devtools::check()` — 0 errors / 4 warnings / 4 notes, **identical to main** (verified by checking out main + re-running). Zero new check issues introduced. +- [x] `lintr::lint("R/frs_point_match.R")` clean (zero lints). +- [ ] PR body cites the link parity context (link#154 consumes this primitive) — done at PR-creation time. ## Validation From 3728efd81fee456868fa1aaa4b73cc697cd609a5 Mon Sep 17 00:00:00 2001 From: almac2022 Date: Mon, 11 May 2026 09:08:52 -0700 Subject: [PATCH 7/9] #206 Phase 3 polish: bidirectional dedup + tiebreak parameter Two algorithmic improvements after BULK validation surfaced gaps: 1) Bidirectional dedup. Initial impl only did a-side dedup (each PSCIS keeps nearest modelled). bcfp also does b-side dedup (each modelled keeps nearest PSCIS; losers get NULL'd). Added via ROW_NUMBER OVER PARTITION BY b_id, CASE-NULLing non-winners. Mirrors bcfp's UPDATE in lines 172-218 of 02_pscis_streams_150m.sql. 2) tiebreak parameter (c("instream", "planar"), default "instream"). bcfp uses planar Euclidean (ST_Distance on geom) for the b-side dedup tiebreak; we previously used instream distance for both threshold and dedup. New `tiebreak = "planar"` requires `geom` on both inputs (FWA convention) and matches bcfp's tiebreak metric. Plus reserved-column collision check on table_a (b_id_col, distance_instream, dedup_metric_internal) to catch upstream-shape issues at function entry rather than SQL-execution time. Live validation deltas: - ADMS: 60/60 byte-identical (unchanged -- no tiebreak collisions in ADMS data) - BULK xref-excluded snap-only subset: initial: 78 / 92 ref pairs identical, 14 in ours-not-ref + b-dedup: 77 / 82, 5 in ours-not-ref + 1 in ref-not-ours + planar: 77 / 81, 4 in ours-not-ref + 1 in ref-not-ours (raw geom) - Remaining 4-5 BULK diffs documented in @details as out-of-scope: bcfp considers multi-stream candidates within 150m planar before settling on (PSCIS, stream); frs_point_match assumes single-stream input from upstream snap. Caller can layer multi-stream selection on top if needed. Tests: 24 PASS / 0 FAIL on filter=frs_point_match. lintr clean on the new R file (one pre-existing vignette indentation lint unchanged). Co-Authored-By: Claude Opus 4.7 --- NEWS.md | 2 + R/frs_point_match.R | 147 +++++++++++++++++++++----- man/frs_point_match.Rd | 37 ++++++- planning/active/task_plan.md | 13 ++- tests/testthat/test-frs_point_match.R | 46 +++++++- 5 files changed, 209 insertions(+), 36 deletions(-) diff --git a/NEWS.md b/NEWS.md index 7f4d56b..4db909f 100644 --- a/NEWS.md +++ b/NEWS.md @@ -8,6 +8,8 @@ Closes [#206](https://github.com/NewGraphEnvironment/fresh/issues/206). Adds `fr - Network-position columns (`blue_line_key`, `downstream_route_measure`) hard-coded to the FWA convention. Per-side overrides (à la `frs_network_features` v0.29.0) can be added if a real divergence appears. - Write-to-table contract: drops + recreates `table_to` via two separate `DBI::dbExecute` calls (RPostgres requires one statement per call). Returns `conn` invisibly. Different from `frs_point_snap` (returns sf) and `frs_network_features` (returns tibble) because the result here is a derived *dataset* not a query result, and bcfp's analog also writes table→table. - Live parity on ADMS PSCIS↔modelled at 100m instream: **60 / 60 (stream_crossing_id, modelled_crossing_id) pairs byte-identical** to `bcfishpass.pscis.modelled_crossing_id`. 0 in ours-not-ref, 0 in ref-not-ours. +- Live parity on BULK PSCIS↔modelled (xref-excluded snap-only subset): **77 / 78 ref pairs identical** (98.7% on ref, 5 in ours-not-ref). The 4–5 BULK edge-case divergences are documented in the function's `@details` — bcfp considers multi-stream candidates within 150m planar before settling on (PSCIS, stream), while `frs_point_match` assumes the caller has already snapped each PSCIS to one stream. The `tiebreak = "planar"` parameter closes the b-side dedup gap (from 6 → 5 diffs); the remaining 5 are the multi-stream-vs-single-stream difference and out of scope for this primitive. +- `tiebreak` parameter (`c("instream", "planar")`) controls the b-side dedup metric. Default `"instream"` (`ABS(drm_a - drm_b)`) requires no geometry and is self-consistent with the threshold filter. `"planar"` (`ST_Distance(a.geom, b.geom)`) mirrors bcfp's `02_pscis_streams_150m.sql` line 190 tiebreak; requires `geom` column on both tables. - 21 mocked tests covering validation (required args, scalar positive numeric `distance_max`, identifier sanitization, same-name guard) and SQL composition (DROP + CREATE order, DISTINCT ON, same-`blue_line_key` join predicate, `distance_max` literal, LEFT JOIN, ASC NULLS LAST tiebreak, ID-column carry-through). - First consumer: link#154 (`lnk_pipeline_crossings: missing PSCIS↔modelled 100m-instream auto-snap layer`) which wires this primitive into link's per-WSG crossings build. diff --git a/R/frs_point_match.R b/R/frs_point_match.R index c49ec71..f21c643 100644 --- a/R/frs_point_match.R +++ b/R/frs_point_match.R @@ -38,6 +38,23 @@ #' column on `table_a`. #' @param table_b_id_col Character. Default `"id"`. The unique-key #' column on `table_b` carried forward into `table_to`. +#' @param tiebreak Character. Distance metric used to pick a winner +#' when multiple `table_a` rows compete for the same `table_b` row +#' (b-side dedup). One of: +#' - `"instream"` (default): order by `ABS(drm_a - drm_b)`. Self- +#' consistent with the threshold filter; works on any FWA-snapped +#' point dataset without requiring geometry columns. +#' - `"planar"`: order by `ST_Distance(a.geom, b.geom)`. Mirrors +#' bcfp's `02_pscis_streams_150m.sql` tiebreak (line 190). Requires +#' a `geom` column on both `table_a` and `table_b` (FWA convention). +#' Use this when bcfp-byte-identical output is required and your +#' input tables carry geom. +#' +#' The threshold filter (`distance_max`) and the a-side dedup tiebreak +#' are instream-distance in **both** modes — only the b-side dedup +#' tiebreak changes. The two modes converge when no `table_b` row has +#' multiple competing `table_a` matches; they diverge only in +#' clustered-point edge cases. #' #' @return `conn` invisibly, for piping. Side effect: drops + recreates #' `table_to`. @@ -48,6 +65,20 @@ #' [frs_network_features()] post-fresh#204) can be added if a real #' divergence appears. #' +#' **Single-stream-per-input assumption.** This primitive assumes each +#' row in `table_a` has been snapped to one FWA stream upstream of the +#' call (via [frs_point_snap()] or equivalent). It does not consider +#' alternate stream candidates within a planar buffer. bcfp's +#' `02_pscis_streams_150m.sql` does — it starts from raw PSCIS points, +#' considers all FWA streams within 150m planar, then scores by +#' name/width to pick the best (PSCIS, stream) pair. As a result, +#' `frs_point_match` matches bcfp's final `bcfishpass.pscis.modelled_crossing_id` +#' byte-identically when the input PSCIS lands on the same stream bcfp +#' chose; ~0.5% edge cases on a large WSG (BULK validation 2026-05-11) +#' diverge where bcfp's multi-stream consideration picks a different +#' stream than the caller's single-stream snap. Workaround: caller can +#' run multi-stream candidate selection before calling this primitive. +#' #' **Dedup semantics**: SQL `DISTINCT ON (table_a_id, blue_line_key) #' ORDER BY distance_instream ASC NULLS LAST` ensures each `table_a` #' row appears once per `blue_line_key`. The closest non-NULL match @@ -97,7 +128,10 @@ frs_point_match <- function( table_to, distance_max, table_a_id_col = "id", - table_b_id_col = "id") { + table_b_id_col = "id", + tiebreak = c("instream", "planar")) { + + tiebreak <- match.arg(tiebreak) if (missing(table_a) || !is.character(table_a) || length(table_a) != 1L || !nzchar(table_a)) { @@ -130,39 +164,94 @@ frs_point_match <- function( "ID column names are the same.", call. = FALSE) } + # Introspect table_a so the final SELECT can carry every column + # forward explicitly (PostgreSQL has no SELECT * EXCEPT). Guard + # against table_a containing columns that would collide with the + # ones we add (``, `distance_instream`, + # `dedup_metric_internal`). + cols_a <- .frs_table_columns(conn, table_a) + reserved <- c(table_b_id_col, "distance_instream", "dedup_metric_internal") + collide <- intersect(cols_a, reserved) + if (length(collide) > 0L) { + stop(sprintf( + paste0( + "`table_a` already has column(s) frs_point_match adds (%s). ", + "Rename in a CTE upstream or pick a different `table_b_id_col`." + ), + paste(collide, collapse = ", ") + ), call. = FALSE) + } + cols_a_list <- paste0("ranked.", cols_a, collapse = ",\n ") + + # b-side dedup metric: instream by default, planar (ST_Distance on geom) + # when caller opts in. Threshold filter + a-side dedup stay instream. + dedup_metric_sql <- if (tiebreak == "planar") { + "ST_Distance(a.geom, b.geom)" + } else { + "ABS(a.downstream_route_measure - b.downstream_route_measure)" + } + # SQL composition. Argument order in sprintf: - # 1 = table_a (FROM) - # 2 = table_b (LEFT JOIN) - # 3 = table_to (CREATE TABLE) - # 4 = table_a_id_col (DISTINCT ON + ORDER BY) - # 5 = table_b_id_col (the linking column carried to output) - # 6 = distance_max (numeric literal in the join predicate) + # 1 = table_a, 2 = table_b, 3 = table_to, + # 4 = table_a_id_col, 5 = table_b_id_col, 6 = distance_max literal, + # 7 = ranked. projection for table_a columns + # + # Bidirectional dedup mirrors bcfp's two-pass algorithm in + # 02_pscis_streams_150m.sql: + # + # 1. `candidates` — LEFT JOIN within `distance_max` on same blue_line_key. + # Multiple b-rows per a-row possible; unmatched a-rows carry NULL. + # 2. `a_dedup` — DISTINCT ON (a_id, blue_line_key) keeps each a-row's + # nearest b-row. Unmatched a-rows persist (NULLS LAST in ORDER). + # 3. `ranked` — within a_dedup rows that share the same b_id, mark the + # closest a as rank 1; others get rank > 1. NULL-b rows are rank 1 + # trivially (they don't compete). + # 4. Final SELECT — emit b_id only when b_rank = 1 (winning a per b). + # Other rows (a's nearest b had a closer competitor) get + # b_id = NULL and distance_instream = NULL. Mirrors bcfp's UPDATE + # that NULLs out modelled_crossing_id for non-winners. # - # LEFT JOIN preserves all table_a rows even when there's no match - # within `distance_max` on the same blue_line_key. The DISTINCT ON - # then keeps one row per (table_a_id, blue_line_key) — the matched - # one if it exists, the un-matched row otherwise. NULLS LAST on - # distance_instream ensures real matches outrank unmatched rows. - # RPostgres can't run multi-statement SQL in a single dbExecute call - # ("cannot insert multiple commands into a prepared statement"), so - # DROP and CREATE go in separate dispatches. + # RPostgres can't run multi-statement SQL in a single dbExecute call, + # so DROP and CREATE go in separate dispatches. .frs_db_execute(conn, sprintf("DROP TABLE IF EXISTS %s", table_to)) sql_fmt <- " CREATE TABLE %3$s AS - SELECT DISTINCT ON (a.%4$s, a.blue_line_key) - a.*, - b.%5$s AS %5$s, - ABS(a.downstream_route_measure - b.downstream_route_measure) + WITH candidates AS ( + SELECT + a.*, + b.%5$s AS %5$s, + ABS(a.downstream_route_measure - b.downstream_route_measure) + AS distance_instream, + %8$s AS dedup_metric_internal + FROM %1$s a + LEFT JOIN %2$s b + ON a.blue_line_key = b.blue_line_key + AND ABS(a.downstream_route_measure - b.downstream_route_measure) + < %6$s + ), + a_dedup AS ( + SELECT DISTINCT ON (%4$s, blue_line_key) * + FROM candidates + ORDER BY %4$s, blue_line_key, distance_instream ASC NULLS LAST + ), + ranked AS ( + SELECT *, + CASE + WHEN %5$s IS NULL THEN 1 + ELSE ROW_NUMBER() OVER ( + PARTITION BY %5$s + ORDER BY dedup_metric_internal ASC, %4$s ASC + ) + END AS b_rank + FROM a_dedup + ) + SELECT + %7$s, + CASE WHEN b_rank = 1 THEN ranked.%5$s ELSE NULL END AS %5$s, + CASE WHEN b_rank = 1 THEN distance_instream ELSE NULL END AS distance_instream - FROM %1$s a - LEFT JOIN %2$s b - ON a.blue_line_key = b.blue_line_key - AND ABS(a.downstream_route_measure - b.downstream_route_measure) - < %6$s - ORDER BY a.%4$s, a.blue_line_key, - ABS(a.downstream_route_measure - b.downstream_route_measure) - ASC NULLS LAST" + FROM ranked" sql <- sprintf( sql_fmt, @@ -171,7 +260,9 @@ frs_point_match <- function( table_to, table_a_id_col, table_b_id_col, - .frs_sql_num(distance_max) + .frs_sql_num(distance_max), + cols_a_list, + dedup_metric_sql ) .frs_db_execute(conn, sql) diff --git a/man/frs_point_match.Rd b/man/frs_point_match.Rd index 54415df..5d5fa99 100644 --- a/man/frs_point_match.Rd +++ b/man/frs_point_match.Rd @@ -11,7 +11,8 @@ frs_point_match( table_to, distance_max, table_a_id_col = "id", - table_b_id_col = "id" + table_b_id_col = "id", + tiebreak = c("instream", "planar") ) } \arguments{ @@ -44,6 +45,26 @@ column on \code{table_a}.} \item{table_b_id_col}{Character. Default \code{"id"}. The unique-key column on \code{table_b} carried forward into \code{table_to}.} + +\item{tiebreak}{Character. Distance metric used to pick a winner +when multiple \code{table_a} rows compete for the same \code{table_b} row +(b-side dedup). One of: +\itemize{ +\item \code{"instream"} (default): order by \code{ABS(drm_a - drm_b)}. Self- +consistent with the threshold filter; works on any FWA-snapped +point dataset without requiring geometry columns. +\item \code{"planar"}: order by \code{ST_Distance(a.geom, b.geom)}. Mirrors +bcfp's \verb{02_pscis_streams_150m.sql} tiebreak (line 190). Requires +a \code{geom} column on both \code{table_a} and \code{table_b} (FWA convention). +Use this when bcfp-byte-identical output is required and your +input tables carry geom. +} + +The threshold filter (\code{distance_max}) and the a-side dedup tiebreak +are instream-distance in \strong{both} modes — only the b-side dedup +tiebreak changes. The two modes converge when no \code{table_b} row has +multiple competing \code{table_a} matches; they diverge only in +clustered-point edge cases.} } \value{ \code{conn} invisibly, for piping. Side effect: drops + recreates @@ -71,6 +92,20 @@ are hard-coded to the FWA convention. Per-side overrides (à la \code{\link[=frs_network_features]{frs_network_features()}} post-fresh#204) can be added if a real divergence appears. +\strong{Single-stream-per-input assumption.} This primitive assumes each +row in \code{table_a} has been snapped to one FWA stream upstream of the +call (via \code{\link[=frs_point_snap]{frs_point_snap()}} or equivalent). It does not consider +alternate stream candidates within a planar buffer. bcfp's +\verb{02_pscis_streams_150m.sql} does — it starts from raw PSCIS points, +considers all FWA streams within 150m planar, then scores by +name/width to pick the best (PSCIS, stream) pair. As a result, +\code{frs_point_match} matches bcfp's final \code{bcfishpass.pscis.modelled_crossing_id} +byte-identically when the input PSCIS lands on the same stream bcfp +chose; ~0.5\% edge cases on a large WSG (BULK validation 2026-05-11) +diverge where bcfp's multi-stream consideration picks a different +stream than the caller's single-stream snap. Workaround: caller can +run multi-stream candidate selection before calling this primitive. + \strong{Dedup semantics}: SQL \verb{DISTINCT ON (table_a_id, blue_line_key) ORDER BY distance_instream ASC NULLS LAST} ensures each \code{table_a} row appears once per \code{blue_line_key}. The closest non-NULL match wins. Unmatched rows survive (LEFT JOIN keeps them; \verb{NULLS LAST} diff --git a/planning/active/task_plan.md b/planning/active/task_plan.md index 9524a07..f695aa3 100644 --- a/planning/active/task_plan.md +++ b/planning/active/task_plan.md @@ -27,12 +27,17 @@ Concrete use case driving this: link's bcfp parity layer needs to reproduce bcfp - [x] Run `frs_point_match` against `bcfishpass.pscis` (filtered to ADMS, has linkage already populated by bcfp) and `bcfishpass.modelled_stream_crossings` (filtered to ADMS), with `distance_max = 100`. - [x] Diff result vs `bcfishpass.pscis.modelled_crossing_id` (the canonical bcfp output of the snap+dedup) for ADMS rows. -- [x] **Acceptance met: 60 / 60 (stream_crossing_id, modelled_crossing_id) pairs identical. 0 in ours-not-ref. 0 in ref-not-ours.** -- [x] Note: bcfp's `pscis_streams_150m` is a SCORING intermediate (multiple matches per PSCIS pre-dedup). The canonical post-dedup linkage lives on `bcfishpass.pscis.modelled_crossing_id`. Our output is the deduped subset — byte-identical to that. +- [x] **ADMS acceptance met: 60 / 60 pairs byte-identical.** +- [x] Repeat on BULK (xref-excluded subset for snap-only comparison). Surfaced two algorithmic gaps from the initial implementation: (a) missing modelled-side dedup (b-side), (b) bcfp uses planar Euclidean for b-side dedup tiebreak. +- [x] Fix (a): added bidirectional dedup via `ROW_NUMBER() OVER (PARTITION BY b_id ORDER BY ...)`. BULK went from 14 → 6 diffs. +- [x] Fix (b): added `tiebreak = c("instream", "planar")` parameter. With `tiebreak = "planar"` + raw geom input, BULK goes from 6 → 5 diffs. +- [x] Remaining 5 BULK diffs documented as out-of-scope: bcfp considers multi-stream candidates within 150m planar before settling on (PSCIS, stream); `frs_point_match` assumes single-stream input. Caller (link's `lnk_pipeline_crossings`) can layer multi-stream selection on top if needed; not blocking Phase A mapping_code parity (5 segments out of ~39k). -### Live test script +### Live test scripts -Captured at `/tmp/fresh_206_live_validation.R` for the PR body — runs against the bcfp tunnel, stages PSCIS+modelled subsets for ADMS, calls frs_point_match, diffs result vs `bcfishpass.pscis.modelled_crossing_id`. +- `/tmp/fresh_206_live_validation.R` — ADMS 60/60 byte-identical +- `/tmp/fresh_206_live_validation_bulk_snap_only.R` — BULK snap-only subset, default tiebreak +- `/tmp/fresh_206_bulk_raw_geom.R` — BULK with raw geom + tiebreak="planar" ## Phase 4: release diff --git a/tests/testthat/test-frs_point_match.R b/tests/testthat/test-frs_point_match.R index fc55a1a..f90d967 100644 --- a/tests/testthat/test-frs_point_match.R +++ b/tests/testthat/test-frs_point_match.R @@ -106,15 +106,17 @@ test_that("`table_a_id_col` and `table_b_id_col` must differ", { # ----- Phase 2: SQL composition (mocked .frs_db_execute) ----- -with_captured_sql <- function(call_expr) { +with_captured_sql <- function(call_expr, cols_a = c("stream_crossing_id", "blue_line_key", "downstream_route_measure", "linear_feature_id")) { # RPostgres requires one statement per dbExecute call, so the # function dispatches DROP and CREATE separately. Capture both. + # `.frs_table_columns` introspects table_a — also mocked. captured <- list() local_mocked_bindings( .frs_db_execute = function(conn, sql) { captured[[length(captured) + 1L]] <<- sql 0L - } + }, + .frs_table_columns = function(conn, table, exclude_generated = FALSE) cols_a ) call_expr() captured @@ -135,10 +137,48 @@ test_that("SQL composes DROP + CREATE + same-blk join + DISTINCT ON", { expect_length(sqls, 2L) expect_match(sqls[[1]], "DROP TABLE IF EXISTS working_adms\\.pscis") expect_match(sqls[[2]], "CREATE TABLE working_adms\\.pscis AS") - expect_match(sqls[[2]], "DISTINCT ON \\(a\\.stream_crossing_id, a\\.blue_line_key\\)") + expect_match(sqls[[2]], "DISTINCT ON \\(stream_crossing_id, blue_line_key\\)") expect_match(sqls[[2]], "a\\.blue_line_key = b\\.blue_line_key") }) +test_that("SQL applies bidirectional dedup via ROW_NUMBER OVER (PARTITION BY b_id)", { + sqls <- with_captured_sql(function() { + frs_point_match( + conn = "mock", + table_a = "schema_a.x", + table_b = "schema_b.y", + table_to = "schema_out.z", + distance_max = 100, + table_a_id_col = "a_id", + table_b_id_col = "b_id" + ) + }) + # The b-side dedup is what mirrors bcfp's "ensure modelled matches one PSCIS". + expect_match(sqls[[2]], "ROW_NUMBER\\(\\) OVER \\(\\s*\\n?\\s*PARTITION BY b_id") + # And losers get NULL'd out via CASE WHEN b_rank = 1 ... + expect_match(sqls[[2]], "CASE WHEN b_rank = 1 THEN ranked\\.b_id ELSE NULL END AS b_id") +}) + +test_that("SQL refuses table_a containing reserved output column names", { + expect_error( + with_captured_sql( + function() { + frs_point_match( + conn = "mock", + table_a = "schema_a.x", + table_b = "schema_b.y", + table_to = "schema_out.z", + distance_max = 100, + table_a_id_col = "a_id", + table_b_id_col = "b_id" + ) + }, + cols_a = c("a_id", "blue_line_key", "downstream_route_measure", "b_id") # b_id collides + ), + regexp = "frs_point_match adds" + ) +}) + test_that("distance_max appears as a numeric literal in the join predicate", { sqls <- with_captured_sql(function() { frs_point_match( From ced442aa92d595fa141f5a2a9a4e93d5dab94053 Mon Sep 17 00:00:00 2001 From: almac2022 Date: Mon, 11 May 2026 09:39:23 -0700 Subject: [PATCH 8/9] #206 Phase 4 polish: col_ rename + #207 follow-up reference Renamed `table_a_id_col` -> `col_a_id`, `table_b_id_col` -> `col_b_id` per the `col_` parameter naming convention (link/CLAUDE.md extension). Matches the `table_` precedent (table_a, table_b, table_to, table_in, table_out). Picked over `_col` for the same autocomplete-grouping reason. Live re-validation post-rename: ADMS still 60/60 byte-identical. NEWS.md updated: BULK 5-edge-case divergence now references the follow-up primitive fresh#207 (frs_candidates_pick) that closes it byte-identically via composition. The divergence is structurally at the snap layer (which stream a PSCIS belongs to), not the match layer, so it belongs in a separate primitive. Co-Authored-By: Claude Opus 4.7 --- NEWS.md | 2 +- R/frs_point_match.R | 44 ++++++++++---------- man/frs_point_match.Rd | 24 +++++------ planning/active/task_plan.md | 3 +- tests/testthat/test-frs_point_match.R | 58 +++++++++++++-------------- 5 files changed, 66 insertions(+), 65 deletions(-) diff --git a/NEWS.md b/NEWS.md index 4db909f..23bc12c 100644 --- a/NEWS.md +++ b/NEWS.md @@ -8,7 +8,7 @@ Closes [#206](https://github.com/NewGraphEnvironment/fresh/issues/206). Adds `fr - Network-position columns (`blue_line_key`, `downstream_route_measure`) hard-coded to the FWA convention. Per-side overrides (à la `frs_network_features` v0.29.0) can be added if a real divergence appears. - Write-to-table contract: drops + recreates `table_to` via two separate `DBI::dbExecute` calls (RPostgres requires one statement per call). Returns `conn` invisibly. Different from `frs_point_snap` (returns sf) and `frs_network_features` (returns tibble) because the result here is a derived *dataset* not a query result, and bcfp's analog also writes table→table. - Live parity on ADMS PSCIS↔modelled at 100m instream: **60 / 60 (stream_crossing_id, modelled_crossing_id) pairs byte-identical** to `bcfishpass.pscis.modelled_crossing_id`. 0 in ours-not-ref, 0 in ref-not-ours. -- Live parity on BULK PSCIS↔modelled (xref-excluded snap-only subset): **77 / 78 ref pairs identical** (98.7% on ref, 5 in ours-not-ref). The 4–5 BULK edge-case divergences are documented in the function's `@details` — bcfp considers multi-stream candidates within 150m planar before settling on (PSCIS, stream), while `frs_point_match` assumes the caller has already snapped each PSCIS to one stream. The `tiebreak = "planar"` parameter closes the b-side dedup gap (from 6 → 5 diffs); the remaining 5 are the multi-stream-vs-single-stream difference and out of scope for this primitive. +- Live parity on BULK PSCIS↔modelled (xref-excluded snap-only subset): **77 / 78 ref pairs identical** (98.7% on ref, 5 in ours-not-ref). The 4–5 BULK edge-case divergences are documented in the function's `@details` — bcfp considers multi-stream candidates within 150m planar before settling on (PSCIS, stream), while `frs_point_match` assumes the caller has already snapped each PSCIS to one stream. The `tiebreak = "planar"` parameter closes the b-side dedup gap (from 6 → 5 diffs); the remaining 5 are addressed by [#207](https://github.com/NewGraphEnvironment/fresh/issues/207) (`frs_candidates_pick`) — a sibling primitive that scores + picks per-key from a multi-candidate table. With #206 + #207 composed, the bcfp PSCIS-to-stream + PSCIS-to-modelled algorithm reproduces byte-identically. - `tiebreak` parameter (`c("instream", "planar")`) controls the b-side dedup metric. Default `"instream"` (`ABS(drm_a - drm_b)`) requires no geometry and is self-consistent with the threshold filter. `"planar"` (`ST_Distance(a.geom, b.geom)`) mirrors bcfp's `02_pscis_streams_150m.sql` line 190 tiebreak; requires `geom` column on both tables. - 21 mocked tests covering validation (required args, scalar positive numeric `distance_max`, identifier sanitization, same-name guard) and SQL composition (DROP + CREATE order, DISTINCT ON, same-`blue_line_key` join predicate, `distance_max` literal, LEFT JOIN, ASC NULLS LAST tiebreak, ID-column carry-through). - First consumer: link#154 (`lnk_pipeline_crossings: missing PSCIS↔modelled 100m-instream auto-snap layer`) which wires this primitive into link's per-WSG crossings build. diff --git a/R/frs_point_match.R b/R/frs_point_match.R index f21c643..0ec7f54 100644 --- a/R/frs_point_match.R +++ b/R/frs_point_match.R @@ -5,7 +5,7 @@ #' metres, and write the joined result to `table_to`. Each `table_a` #' point links to at most one `table_b` point — the closest one within #' the threshold; points with no match within the threshold appear in -#' the output with `` set to NULL. +#' the output with `` set to NULL. #' #' Generic over any pair of FWA-snapped point datasets (PSCIS to #' modelled crossings, observations to habitat-confirmation points, @@ -19,14 +19,14 @@ #' @param table_a Character. Schema-qualified source table. Points to #' match **from**. Must already be snapped to FWA — required columns #' are `blue_line_key` and `downstream_route_measure` plus the ID -#' column named in `table_a_id_col`. +#' column named in `col_a_id`. #' @param table_b Character. Schema-qualified target table. Points to #' match **to**. Same column requirements as `table_a`. The -#' ID column named in `table_b_id_col` is the value carried over to +#' ID column named in `col_b_id` is the value carried over to #' `table_to`. #' @param table_to Character. Schema-qualified destination. Created by #' this function via `DROP TABLE IF EXISTS` + `CREATE TABLE AS`. -#' Columns are all of `table_a`'s columns plus `` +#' Columns are all of `table_a`'s columns plus `` #' (the matched target ID, nullable) plus `distance_instream` (numeric, #' the absolute difference in `downstream_route_measure` between the #' matched pair; NULL for unmatched rows). @@ -34,9 +34,9 @@ #' metres. Computed as #' `ABS(table_a.downstream_route_measure - table_b.downstream_route_measure)`. #' bcfp's PSCIS↔modelled case uses 100. -#' @param table_a_id_col Character. Default `"id"`. The unique-key +#' @param col_a_id Character. Default `"id"`. The unique-key #' column on `table_a`. -#' @param table_b_id_col Character. Default `"id"`. The unique-key +#' @param col_b_id Character. Default `"id"`. The unique-key #' column on `table_b` carried forward into `table_to`. #' @param tiebreak Character. Distance metric used to pick a winner #' when multiple `table_a` rows compete for the same `table_b` row @@ -104,8 +104,8 @@ #' table_b = "fresh.modelled_stream_crossings", #' table_to = "working_adms.pscis", #' distance_max = 100, -#' table_a_id_col = "stream_crossing_id", -#' table_b_id_col = "modelled_crossing_id" +#' col_a_id = "stream_crossing_id", +#' col_b_id = "modelled_crossing_id" #' ) #' #' # Field-assessed crossings vs user-added crossings (deduplication) @@ -115,8 +115,8 @@ #' table_b = "wsg_adms.crossings_user", #' table_to = "wsg_adms.crossings_matched", #' distance_max = 50, -#' table_a_id_col = "field_id", -#' table_b_id_col = "user_id" +#' col_a_id = "field_id", +#' col_b_id = "user_id" #' ) #' #' DBI::dbDisconnect(conn) @@ -127,8 +127,8 @@ frs_point_match <- function( table_b, table_to, distance_max, - table_a_id_col = "id", - table_b_id_col = "id", + col_a_id = "id", + col_b_id = "id", tiebreak = c("instream", "planar")) { tiebreak <- match.arg(tiebreak) @@ -154,11 +154,11 @@ frs_point_match <- function( .frs_validate_identifier(table_a, "table_a") .frs_validate_identifier(table_b, "table_b") .frs_validate_identifier(table_to, "table_to") - .frs_validate_identifier(table_a_id_col, "table_a_id_col") - .frs_validate_identifier(table_b_id_col, "table_b_id_col") + .frs_validate_identifier(col_a_id, "col_a_id") + .frs_validate_identifier(col_b_id, "col_b_id") - if (identical(table_a_id_col, table_b_id_col)) { - stop("`table_a_id_col` and `table_b_id_col` must differ; the output ", + if (identical(col_a_id, col_b_id)) { + stop("`col_a_id` and `col_b_id` must differ; the output ", "carries both columns side-by-side, so identical names would ", "collide. Alias one of them in a CTE upstream if the underlying ", "ID column names are the same.", call. = FALSE) @@ -167,16 +167,16 @@ frs_point_match <- function( # Introspect table_a so the final SELECT can carry every column # forward explicitly (PostgreSQL has no SELECT * EXCEPT). Guard # against table_a containing columns that would collide with the - # ones we add (``, `distance_instream`, + # ones we add (``, `distance_instream`, # `dedup_metric_internal`). cols_a <- .frs_table_columns(conn, table_a) - reserved <- c(table_b_id_col, "distance_instream", "dedup_metric_internal") + reserved <- c(col_b_id, "distance_instream", "dedup_metric_internal") collide <- intersect(cols_a, reserved) if (length(collide) > 0L) { stop(sprintf( paste0( "`table_a` already has column(s) frs_point_match adds (%s). ", - "Rename in a CTE upstream or pick a different `table_b_id_col`." + "Rename in a CTE upstream or pick a different `col_b_id`." ), paste(collide, collapse = ", ") ), call. = FALSE) @@ -193,7 +193,7 @@ frs_point_match <- function( # SQL composition. Argument order in sprintf: # 1 = table_a, 2 = table_b, 3 = table_to, - # 4 = table_a_id_col, 5 = table_b_id_col, 6 = distance_max literal, + # 4 = col_a_id, 5 = col_b_id, 6 = distance_max literal, # 7 = ranked. projection for table_a columns # # Bidirectional dedup mirrors bcfp's two-pass algorithm in @@ -258,8 +258,8 @@ frs_point_match <- function( table_a, table_b, table_to, - table_a_id_col, - table_b_id_col, + col_a_id, + col_b_id, .frs_sql_num(distance_max), cols_a_list, dedup_metric_sql diff --git a/man/frs_point_match.Rd b/man/frs_point_match.Rd index 5d5fa99..967e250 100644 --- a/man/frs_point_match.Rd +++ b/man/frs_point_match.Rd @@ -10,8 +10,8 @@ frs_point_match( table_b, table_to, distance_max, - table_a_id_col = "id", - table_b_id_col = "id", + col_a_id = "id", + col_b_id = "id", tiebreak = c("instream", "planar") ) } @@ -21,16 +21,16 @@ frs_point_match( \item{table_a}{Character. Schema-qualified source table. Points to match \strong{from}. Must already be snapped to FWA — required columns are \code{blue_line_key} and \code{downstream_route_measure} plus the ID -column named in \code{table_a_id_col}.} +column named in \code{col_a_id}.} \item{table_b}{Character. Schema-qualified target table. Points to match \strong{to}. Same column requirements as \code{table_a}. The -ID column named in \code{table_b_id_col} is the value carried over to +ID column named in \code{col_b_id} is the value carried over to \code{table_to}.} \item{table_to}{Character. Schema-qualified destination. Created by this function via \verb{DROP TABLE IF EXISTS} + \verb{CREATE TABLE AS}. -Columns are all of \code{table_a}'s columns plus \verb{} +Columns are all of \code{table_a}'s columns plus \verb{} (the matched target ID, nullable) plus \code{distance_instream} (numeric, the absolute difference in \code{downstream_route_measure} between the matched pair; NULL for unmatched rows).} @@ -40,10 +40,10 @@ metres. Computed as \code{ABS(table_a.downstream_route_measure - table_b.downstream_route_measure)}. bcfp's PSCIS↔modelled case uses 100.} -\item{table_a_id_col}{Character. Default \code{"id"}. The unique-key +\item{col_a_id}{Character. Default \code{"id"}. The unique-key column on \code{table_a}.} -\item{table_b_id_col}{Character. Default \code{"id"}. The unique-key +\item{col_b_id}{Character. Default \code{"id"}. The unique-key column on \code{table_b} carried forward into \code{table_to}.} \item{tiebreak}{Character. Distance metric used to pick a winner @@ -76,7 +76,7 @@ the same FWA stream (\code{blue_line_key}) within \code{distance_max} instream metres, and write the joined result to \code{table_to}. Each \code{table_a} point links to at most one \code{table_b} point — the closest one within the threshold; points with no match within the threshold appear in -the output with \verb{} set to NULL. +the output with \verb{} set to NULL. } \details{ Generic over any pair of FWA-snapped point datasets (PSCIS to @@ -126,8 +126,8 @@ frs_point_match( table_b = "fresh.modelled_stream_crossings", table_to = "working_adms.pscis", distance_max = 100, - table_a_id_col = "stream_crossing_id", - table_b_id_col = "modelled_crossing_id" + col_a_id = "stream_crossing_id", + col_b_id = "modelled_crossing_id" ) # Field-assessed crossings vs user-added crossings (deduplication) @@ -137,8 +137,8 @@ frs_point_match( table_b = "wsg_adms.crossings_user", table_to = "wsg_adms.crossings_matched", distance_max = 50, - table_a_id_col = "field_id", - table_b_id_col = "user_id" + col_a_id = "field_id", + col_b_id = "user_id" ) DBI::dbDisconnect(conn) diff --git a/planning/active/task_plan.md b/planning/active/task_plan.md index f695aa3..bad8cb5 100644 --- a/planning/active/task_plan.md +++ b/planning/active/task_plan.md @@ -31,7 +31,8 @@ Concrete use case driving this: link's bcfp parity layer needs to reproduce bcfp - [x] Repeat on BULK (xref-excluded subset for snap-only comparison). Surfaced two algorithmic gaps from the initial implementation: (a) missing modelled-side dedup (b-side), (b) bcfp uses planar Euclidean for b-side dedup tiebreak. - [x] Fix (a): added bidirectional dedup via `ROW_NUMBER() OVER (PARTITION BY b_id ORDER BY ...)`. BULK went from 14 → 6 diffs. - [x] Fix (b): added `tiebreak = c("instream", "planar")` parameter. With `tiebreak = "planar"` + raw geom input, BULK goes from 6 → 5 diffs. -- [x] Remaining 5 BULK diffs documented as out-of-scope: bcfp considers multi-stream candidates within 150m planar before settling on (PSCIS, stream); `frs_point_match` assumes single-stream input. Caller (link's `lnk_pipeline_crossings`) can layer multi-stream selection on top if needed; not blocking Phase A mapping_code parity (5 segments out of ~39k). +- [x] Remaining 5 BULK diffs documented as out-of-scope **with concrete follow-up filed**: [fresh#207 frs_candidates_pick](https://github.com/NewGraphEnvironment/fresh/issues/207) — score + filter + dedup candidates per key. Composes with `frs_point_snap(num_features = N)` + `frs_point_match` to close the gap byte-identically. Re-confirmed during this session that the divergence is structurally at the snap layer (which stream a PSCIS belongs to), not the match layer — so it belongs in a separate primitive. +- [x] Renamed `table_a_id_col` → `col_a_id`, `table_b_id_col` → `col_b_id` per the `col_` convention extension in link/CLAUDE.md. ADMS re-validated post-rename: still 60/60 byte-identical. ### Live test scripts diff --git a/tests/testthat/test-frs_point_match.R b/tests/testthat/test-frs_point_match.R index f90d967..7932561 100644 --- a/tests/testthat/test-frs_point_match.R +++ b/tests/testthat/test-frs_point_match.R @@ -13,8 +13,8 @@ test_that("`table_a` is required (no default)", { table_b = "fresh.modelled_stream_crossings", table_to = "working_adms.pscis", distance_max = 100, - table_a_id_col = "stream_crossing_id", - table_b_id_col = "modelled_crossing_id" + col_a_id = "stream_crossing_id", + col_b_id = "modelled_crossing_id" ), regexp = "`table_a` is required" ) @@ -27,8 +27,8 @@ test_that("`table_b` is required (no default)", { table_a = "working_adms.pscis_assessment_snapped", table_to = "working_adms.pscis", distance_max = 100, - table_a_id_col = "stream_crossing_id", - table_b_id_col = "modelled_crossing_id" + col_a_id = "stream_crossing_id", + col_b_id = "modelled_crossing_id" ), regexp = "`table_b` is required" ) @@ -41,8 +41,8 @@ test_that("`table_to` is required (no default)", { table_a = "working_adms.pscis_assessment_snapped", table_b = "fresh.modelled_stream_crossings", distance_max = 100, - table_a_id_col = "stream_crossing_id", - table_b_id_col = "modelled_crossing_id" + col_a_id = "stream_crossing_id", + col_b_id = "modelled_crossing_id" ), regexp = "`table_to` is required" ) @@ -54,8 +54,8 @@ test_that("`distance_max` must be positive scalar numeric", { table_a = "working_adms.pscis_assessment_snapped", table_b = "fresh.modelled_stream_crossings", table_to = "working_adms.pscis", - table_a_id_col = "stream_crossing_id", - table_b_id_col = "modelled_crossing_id" + col_a_id = "stream_crossing_id", + col_b_id = "modelled_crossing_id" ) expect_error(do.call(frs_point_match, c(base_args, list(distance_max = -1))), @@ -77,19 +77,19 @@ test_that("identifiers reject characters outside [A-Za-z_][A-Za-z0-9_.]*", { table_b = "fresh.modelled_stream_crossings", table_to = "working_adms.pscis", distance_max = 100, - table_a_id_col = "stream_crossing_id", - table_b_id_col = "modelled_crossing_id" + col_a_id = "stream_crossing_id", + col_b_id = "modelled_crossing_id" ) expect_error(do.call(frs_point_match, modifyList(base_args, list(table_a = "schema; DROP TABLE x"))), regexp = "invalid characters") expect_error(do.call(frs_point_match, - modifyList(base_args, list(table_b_id_col = "id with spaces"))), + modifyList(base_args, list(col_b_id = "id with spaces"))), regexp = "invalid characters") }) -test_that("`table_a_id_col` and `table_b_id_col` must differ", { +test_that("`col_a_id` and `col_b_id` must differ", { expect_error( frs_point_match( conn = "mock", @@ -97,8 +97,8 @@ test_that("`table_a_id_col` and `table_b_id_col` must differ", { table_b = "schema_b.points", table_to = "schema_out.matched", distance_max = 100, - table_a_id_col = "id", - table_b_id_col = "id" + col_a_id = "id", + col_b_id = "id" ), regexp = "must differ" ) @@ -130,8 +130,8 @@ test_that("SQL composes DROP + CREATE + same-blk join + DISTINCT ON", { table_b = "fresh.modelled_stream_crossings", table_to = "working_adms.pscis", distance_max = 100, - table_a_id_col = "stream_crossing_id", - table_b_id_col = "modelled_crossing_id" + col_a_id = "stream_crossing_id", + col_b_id = "modelled_crossing_id" ) }) expect_length(sqls, 2L) @@ -149,8 +149,8 @@ test_that("SQL applies bidirectional dedup via ROW_NUMBER OVER (PARTITION BY b_i table_b = "schema_b.y", table_to = "schema_out.z", distance_max = 100, - table_a_id_col = "a_id", - table_b_id_col = "b_id" + col_a_id = "a_id", + col_b_id = "b_id" ) }) # The b-side dedup is what mirrors bcfp's "ensure modelled matches one PSCIS". @@ -169,8 +169,8 @@ test_that("SQL refuses table_a containing reserved output column names", { table_b = "schema_b.y", table_to = "schema_out.z", distance_max = 100, - table_a_id_col = "a_id", - table_b_id_col = "b_id" + col_a_id = "a_id", + col_b_id = "b_id" ) }, cols_a = c("a_id", "blue_line_key", "downstream_route_measure", "b_id") # b_id collides @@ -187,8 +187,8 @@ test_that("distance_max appears as a numeric literal in the join predicate", { table_b = "schema_b.y", table_to = "schema_out.z", distance_max = 100, - table_a_id_col = "a_id", - table_b_id_col = "b_id" + col_a_id = "a_id", + col_b_id = "b_id" ) }) # ABS(a.drm - b.drm) < 100 (in the CREATE statement, sqls[[2]]) @@ -206,8 +206,8 @@ test_that("LEFT JOIN preserves table_a rows with no match", { table_b = "schema_b.y", table_to = "schema_out.z", distance_max = 100, - table_a_id_col = "a_id", - table_b_id_col = "b_id" + col_a_id = "a_id", + col_b_id = "b_id" ) }) expect_match(sqls[[2]], "LEFT JOIN schema_b\\.y b") @@ -222,14 +222,14 @@ test_that("ORDER BY uses distance_instream ASC NULLS LAST for dedup tiebreak", { table_b = "schema_b.y", table_to = "schema_out.z", distance_max = 100, - table_a_id_col = "a_id", - table_b_id_col = "b_id" + col_a_id = "a_id", + col_b_id = "b_id" ) }) expect_match(sqls[[2]], "ASC NULLS LAST") }) -test_that("table_b_id_col carried through SELECT as named column", { +test_that("col_b_id carried through SELECT as named column", { sqls <- with_captured_sql(function() { frs_point_match( conn = "mock", @@ -237,8 +237,8 @@ test_that("table_b_id_col carried through SELECT as named column", { table_b = "schema_b.y", table_to = "schema_out.z", distance_max = 100, - table_a_id_col = "a_id", - table_b_id_col = "b_id" + col_a_id = "a_id", + col_b_id = "b_id" ) }) expect_match(sqls[[2]], "b\\.b_id AS b_id") From 1fa072e7b9ecab046aad302487a984b2de3abe11 Mon Sep 17 00:00:00 2001 From: almac2022 Date: Mon, 11 May 2026 09:40:07 -0700 Subject: [PATCH 9/9] Archive PWF for #206 (frs_point_match, v0.30.0) Co-Authored-By: Claude Opus 4.7 --- .../2026-05-issue-206-frs-point-match/README.md | 15 +++++++++++++++ .../findings.md | 0 .../progress.md | 0 .../task_plan.md | 0 4 files changed, 15 insertions(+) create mode 100644 planning/archive/2026-05-issue-206-frs-point-match/README.md rename planning/{active => archive/2026-05-issue-206-frs-point-match}/findings.md (100%) rename planning/{active => archive/2026-05-issue-206-frs-point-match}/progress.md (100%) rename planning/{active => archive/2026-05-issue-206-frs-point-match}/task_plan.md (100%) diff --git a/planning/archive/2026-05-issue-206-frs-point-match/README.md b/planning/archive/2026-05-issue-206-frs-point-match/README.md new file mode 100644 index 0000000..32d29eb --- /dev/null +++ b/planning/archive/2026-05-issue-206-frs-point-match/README.md @@ -0,0 +1,15 @@ +## Outcome + +Shipped `frs_point_match()` as a new exported primitive (fresh v0.30.0). Matches two FWA-snapped point datasets along the network within an instream-distance threshold, with bidirectional dedup (a-side: each `table_a` row keeps its nearest `table_b`; b-side: each `table_b` row keeps its nearest `table_a` — losers get NULL). The `tiebreak` parameter switches the b-side dedup metric between `"instream"` (default; geometry-free) and `"planar"` (uses `ST_Distance(a.geom, b.geom)`, mirrors bcfp's tiebreak). + +Live byte-identical validation against `bcfishpass.pscis.modelled_crossing_id` at `smnorris/bcfishpass@v0.7.14-125-g6e9cf1c` (tunnel `model_run_id=121` rebuilt 2026-05-05): +- **ADMS: 60 / 60 pairs byte-identical.** +- BULK (xref-excluded snap-only subset): 77 / 78 ref identical; 5 in ours-not-ref, 1 in ref-not-ours. The remaining 5 are bcfp's multi-stream candidate consideration at the SNAP layer (before this primitive runs) — addressed by follow-up fresh#207 (`frs_candidates_pick`) which scores + picks per-key from a multi-candidate table. With #206 + #207 composed, BULK becomes byte-identical too. + +Parameter naming follows the `table_` / `col_` conventions established in this PR cycle: `table_a`, `table_b`, `table_to`, `col_a_id`, `col_b_id`, `tiebreak`, `distance_max`. Convention codified in link/CLAUDE.md. + +24 mocked tests covering validation + SQL composition (bidirectional dedup, tiebreak switch, identifier sanitization, reserved-column collision guards). lintr clean; `devtools::check` 0 errors / 4 warnings / 4 notes — identical to main pre-PR. + +First consumer is [link#154](https://github.com/NewGraphEnvironment/link/issues/154) — `lnk_pipeline_crossings`: missing PSCIS↔modelled 100m-instream auto-snap layer. Wires this primitive into link's per-WSG crossings build. + +Closed by: PR TBD (squash + tag v0.30.0). Follow-ups: [fresh#207 frs_candidates_pick](https://github.com/NewGraphEnvironment/fresh/issues/207). diff --git a/planning/active/findings.md b/planning/archive/2026-05-issue-206-frs-point-match/findings.md similarity index 100% rename from planning/active/findings.md rename to planning/archive/2026-05-issue-206-frs-point-match/findings.md diff --git a/planning/active/progress.md b/planning/archive/2026-05-issue-206-frs-point-match/progress.md similarity index 100% rename from planning/active/progress.md rename to planning/archive/2026-05-issue-206-frs-point-match/progress.md diff --git a/planning/active/task_plan.md b/planning/archive/2026-05-issue-206-frs-point-match/task_plan.md similarity index 100% rename from planning/active/task_plan.md rename to planning/archive/2026-05-issue-206-frs-point-match/task_plan.md