Problem
frs_habitat_classify path-1 (rearing-on-spawning) only triggers when h.spawning IS TRUE (modelled, rule-based). bcfishpass's habitat_linear_<sp> path-1 is broader:
WHERE (h.spawning IS TRUE or coalesce(hk.spawning_st, 0) = 1)
hk.spawning_st comes from bcfishpass.streams_habitat_known, populated from user_habitat_classification.csv. So bcfp's "rule-based" habitat_linear_<sp> already includes operator-known spawning as a path-1 trigger.
Link's apply_habitat_overlay: no config is meant to match bcfp's rule-based output. But because fresh's classify doesn't read user_habitat_classification, link misses the known-spawning-trigger rearing credits that bcfp produces.
Three logical modes (only two exist in fresh)
| Mode |
Modelled rule |
Known-spawning triggers rearing |
Final overlay |
bcfp habitat_linear_<sp> |
yes |
yes |
no |
bcfp streams_habitat_linear |
yes |
yes |
yes |
fresh apply_habitat_overlay: no |
yes |
no |
no |
fresh apply_habitat_overlay: yes |
yes |
no (only post-overlay) |
yes |
Bcfp's rule-based table corresponds to a third mode that fresh doesn't currently produce.
Concrete case
link MORR ST gap, 2026-04-30. bcfp credits ~60 km of ST rearing across the top 10 streams alone via the hk-trigger path:
| blue_line_key |
n rearing segs (with hk.spawning_st = 1) |
rearing km |
| 360885316 (Morice River) |
179 |
35.24 |
| 360885021 (Gosnell Creek) |
59 |
12.14 |
| 360819468 |
24 |
5.44 |
| 360837468 |
16 |
3.71 |
| ... |
|
|
bcfishpass.streams_habitat_known on MORR has 353 ST segments with spawning_st = 1 (from 31 distinct rows in user_habitat_classification.csv expanded across DRM ranges). Link's classify path-1 doesn't see them, so the rearing-on-spawning credit on those streams doesn't fire.
Proposed solution — reuse frs_habitat_overlay, just call it earlier
frs_habitat_overlay already does exactly the OR-additive operation we need: source-table shape (blue_line_key, drm, urm, species_code, spawning, ...) matches user_habitat_classification.csv; bridge mode does a 3-way range join into streams_habitat; updates are FALSE → TRUE only, never reversed.
Today's pipeline order:
classify → cluster → connected_waterbody → overlay (apply_habitat_overlay=yes only)
Today's apply_habitat_overlay=yes mode mutates streams_habitat.spawning AFTER cluster + connected_waterbody have already run. By the time those phases query WHERE h.spawning IS TRUE, they only see modelled spawning — which is why fresh's apply_habitat_overlay=yes mode produces a different output than bcfp's habitat_linear_<sp> despite reaching the same final overlay state.
Fix: shift the overlay call to run BEFORE cluster + connected_waterbody when the user wants bcfp-style hk-trigger semantics. Once streams_habitat.spawning carries known-spawning at that point, every downstream phase reads it naturally — cluster's label_connect IS TRUE checks, .frs_connected_waterbody Phase 2's WHERE hs.spawning IS TRUE, classify's path-1 rearing-on-spawning. No code changes inside any of those phases.
One real gap in frs_habitat_overlay
bcfp's hk-trigger uses range overlap, not strict containment:
-- frs_habitat_overlay's current bridge predicate (CONTAINMENT)
s.downstream_route_measure >= k.downstream_route_measure
AND s.upstream_route_measure <= k.upstream_route_measure
-- bcfp hk-trigger semantics (OVERLAP)
s.upstream_route_measure >= k.downstream_route_measure
AND s.downstream_route_measure <= k.upstream_route_measure
Add a range_mode = c("contain", "overlap") arg to frs_habitat_overlay. Default "contain" preserves today's apply_habitat_overlay=yes behaviour exactly. New "overlap" mode matches bcfp.
Orchestration — when to apply
Add a known_habitat_when = c("post-cluster", "post-classify", "both") option to frs_habitat (defaults to "post-cluster" = today's behaviour). When "post-classify" or "both", frs_habitat calls frs_habitat_overlay(known_habitat, range_mode = "overlap") between classify and cluster. With "both", it calls overlay twice — once post-classify, once post-cluster — and idempotent OR makes that safe (TRUE → TRUE is a no-op).
apply_habitat_overlay = no config keeps known_habitat_when = "post-cluster" and skips the overlay entirely (status quo).
Implementation surface
| Change |
File |
Approx lines |
range_mode arg + the alternative SQL predicate |
R/frs_habitat_overlay.R |
~10–15 |
known_habitat_when option threaded through frs_habitat orchestrator (with conditional pre-cluster overlay call) |
R/frs_habitat.R (around line 1153 .frs_run_connectivity) |
~30 |
| Tests |
tests/testthat/test-frs_habitat_overlay.R (new range-mode cases) + tests/testthat/test-frs_habitat.R (timing-mode cases) |
~50–80 |
Zero changes to frs_habitat_classify, frs_cluster, .frs_connected_waterbody. They keep reading h.spawning IS TRUE as before; the OR-in is upstream of them when timing is "post-classify" or "both".
Safety property
frs_habitat_overlay already mutates streams_habitat.spawning (in apply_habitat_overlay=yes mode). The proposed change shifts WHEN that mutation fires, not WHETHER. With default known_habitat_when = "post-cluster", behaviour is bit-identical to today. With "post-classify", the mutation happens earlier in the pipeline, giving cluster + connected_waterbody the augmented spawning set — same eventual streams_habitat.spawning content, just visible to those phases.
OR-additive throughout: rows can flip FALSE → TRUE, never reverse. No risk of shrinking output.
Test plan (fresh-side, self-contained)
| Test |
Bar |
All existing tests pass with default known_habitat_when = "post-cluster" |
Bit-identical fresh suite output |
range_mode = "overlap" produces the larger expected row set on a synthetic fixture (segment range partially intersects known range) |
Unit test in test-frs_habitat_overlay.R |
range_mode = "contain" (default) produces existing behaviour byte-identical |
Unit test |
known_habitat_when = "post-classify" on a synthetic fixture: cluster sees augmented spawning → segments that would have been stripped get preserved |
Integration test in test-frs_habitat.R |
known_habitat_when = "both" is idempotent (output identical to "post-classify" on the same input) |
Integration test |
Defensive: empty known_habitat, missing species column, table doesn't exist |
Standard error-path tests |
These prove the safety property + the range-mode SQL + the timing semantics without needing the live bcfp tunnel.
Parity verification (link-side, follow-up)
The bcfp parity claim itself — "MORR ST credits ~60 km of rule-based rearing matching bcfishpass.habitat_linear_st via the hk-trigger path" — requires link's compare_bcfishpass_wsg.R + the live tunnel. Tracked separately in link#132. After fresh ships the timing arg, link's lnk_pipeline_classify exposes known_habitat_when = "post-classify" to bundles that want bcfp parity (and "post-cluster" for bundles that want today's apply_habitat_overlay semantics). MORR ST runs through the compare apparatus to confirm.
Reproduction
Tunnel DB: bcfishpass on localhost:63333 (db_newgraph, rebuilt Mondays). WSG: MORR. Species: ST. bcfishpass.streams_habitat_known provides the trigger source. Link's bcfishpass-bundle ships inst/extdata/configs/bcfishpass/overrides/user_habitat_classification.csv — same data, just not currently consumed by classify until link's wiring lands in #132.
Related
This is the third mechanism in the same parity-investigation slice. fresh#186/#187 fixed link's over-credits; this fixes the under-credit. Together they should close most of the remaining MORR ST / BABL ST / MORR CO gap.
Problem
frs_habitat_classifypath-1 (rearing-on-spawning) only triggers whenh.spawning IS TRUE(modelled, rule-based). bcfishpass'shabitat_linear_<sp>path-1 is broader:hk.spawning_stcomes frombcfishpass.streams_habitat_known, populated fromuser_habitat_classification.csv. So bcfp's "rule-based"habitat_linear_<sp>already includes operator-known spawning as a path-1 trigger.Link's
apply_habitat_overlay: noconfig is meant to match bcfp's rule-based output. But because fresh's classify doesn't read user_habitat_classification, link misses the known-spawning-trigger rearing credits that bcfp produces.Three logical modes (only two exist in fresh)
habitat_linear_<sp>streams_habitat_linearapply_habitat_overlay: noapply_habitat_overlay: yesBcfp's rule-based table corresponds to a third mode that fresh doesn't currently produce.
Concrete case
link MORR ST gap, 2026-04-30. bcfp credits ~60 km of ST rearing across the top 10 streams alone via the hk-trigger path:
bcfishpass.streams_habitat_known on MORR has 353 ST segments with
spawning_st = 1(from 31 distinct rows in user_habitat_classification.csv expanded across DRM ranges). Link's classify path-1 doesn't see them, so the rearing-on-spawning credit on those streams doesn't fire.Proposed solution — reuse
frs_habitat_overlay, just call it earlierfrs_habitat_overlayalready does exactly the OR-additive operation we need: source-table shape(blue_line_key, drm, urm, species_code, spawning, ...)matchesuser_habitat_classification.csv; bridge mode does a 3-way range join intostreams_habitat; updates areFALSE → TRUEonly, never reversed.Today's pipeline order:
Today's
apply_habitat_overlay=yesmode mutatesstreams_habitat.spawningAFTER cluster + connected_waterbody have already run. By the time those phases queryWHERE h.spawning IS TRUE, they only see modelled spawning — which is why fresh'sapply_habitat_overlay=yesmode produces a different output than bcfp'shabitat_linear_<sp>despite reaching the same final overlay state.Fix: shift the overlay call to run BEFORE cluster + connected_waterbody when the user wants bcfp-style hk-trigger semantics. Once
streams_habitat.spawningcarries known-spawning at that point, every downstream phase reads it naturally —cluster'slabel_connect IS TRUEchecks,.frs_connected_waterbodyPhase 2'sWHERE hs.spawning IS TRUE, classify's path-1 rearing-on-spawning. No code changes inside any of those phases.One real gap in
frs_habitat_overlaybcfp's hk-trigger uses range overlap, not strict containment:
Add a
range_mode = c("contain", "overlap")arg tofrs_habitat_overlay. Default"contain"preserves today'sapply_habitat_overlay=yesbehaviour exactly. New"overlap"mode matches bcfp.Orchestration — when to apply
Add a
known_habitat_when = c("post-cluster", "post-classify", "both")option tofrs_habitat(defaults to"post-cluster"= today's behaviour). When"post-classify"or"both",frs_habitatcallsfrs_habitat_overlay(known_habitat, range_mode = "overlap")between classify and cluster. With"both", it calls overlay twice — once post-classify, once post-cluster — and idempotent OR makes that safe (TRUE → TRUE is a no-op).apply_habitat_overlay = noconfig keepsknown_habitat_when = "post-cluster"and skips the overlay entirely (status quo).Implementation surface
range_modearg + the alternative SQL predicateR/frs_habitat_overlay.Rknown_habitat_whenoption threaded throughfrs_habitatorchestrator (with conditional pre-cluster overlay call)R/frs_habitat.R(around line 1153.frs_run_connectivity)tests/testthat/test-frs_habitat_overlay.R(new range-mode cases) +tests/testthat/test-frs_habitat.R(timing-mode cases)Zero changes to
frs_habitat_classify,frs_cluster,.frs_connected_waterbody. They keep readingh.spawning IS TRUEas before; the OR-in is upstream of them when timing is"post-classify"or"both".Safety property
frs_habitat_overlayalready mutatesstreams_habitat.spawning(inapply_habitat_overlay=yesmode). The proposed change shifts WHEN that mutation fires, not WHETHER. With defaultknown_habitat_when = "post-cluster", behaviour is bit-identical to today. With"post-classify", the mutation happens earlier in the pipeline, giving cluster + connected_waterbody the augmented spawning set — same eventualstreams_habitat.spawningcontent, just visible to those phases.OR-additive throughout: rows can flip FALSE → TRUE, never reverse. No risk of shrinking output.
Test plan (fresh-side, self-contained)
known_habitat_when = "post-cluster"range_mode = "overlap"produces the larger expected row set on a synthetic fixture (segment range partially intersects known range)test-frs_habitat_overlay.Rrange_mode = "contain"(default) produces existing behaviour byte-identicalknown_habitat_when = "post-classify"on a synthetic fixture: cluster sees augmented spawning → segments that would have been stripped get preservedtest-frs_habitat.Rknown_habitat_when = "both"is idempotent (output identical to"post-classify"on the same input)known_habitat, missing species column, table doesn't existThese prove the safety property + the range-mode SQL + the timing semantics without needing the live bcfp tunnel.
Parity verification (link-side, follow-up)
The bcfp parity claim itself — "MORR ST credits ~60 km of rule-based rearing matching
bcfishpass.habitat_linear_stvia the hk-trigger path" — requires link'scompare_bcfishpass_wsg.R+ the live tunnel. Tracked separately in link#132. After fresh ships the timing arg, link'slnk_pipeline_classifyexposesknown_habitat_when = "post-classify"to bundles that want bcfp parity (and"post-cluster"for bundles that want today's apply_habitat_overlay semantics). MORR ST runs through the compare apparatus to confirm.Reproduction
Tunnel DB:
bcfishpassonlocalhost:63333(db_newgraph, rebuilt Mondays). WSG: MORR. Species: ST.bcfishpass.streams_habitat_knownprovides the trigger source. Link's bcfishpass-bundle shipsinst/extdata/configs/bcfishpass/overrides/user_habitat_classification.csv— same data, just not currently consumed by classify until link's wiring lands in #132.Related
frs_clusterphase-1 + confluence-boostfrs_trace_downstreamaveraged-FWA gradient.frs_connected_waterbodyfor SK lake-proximityThis is the third mechanism in the same parity-investigation slice. fresh#186/#187 fixed link's over-credits; this fixes the under-credit. Together they should close most of the remaining MORR ST / BABL ST / MORR CO gap.