#62 osm_shapes_match_routes improved to return partial matches when m… by gmatosferreira · Pull Request #63 · U-Shift/GTFShift

gmatosferreira · 2026-02-06T12:52:14Z

…ore shapes than osm routes

Copilot

Pull request overview

This PR updates osm_shapes_match_routes() to keep and return partial results when GTFS has more shapes than there are matching OSM routes (instead of failing/returning an empty placeholder for those routes), and expands the warning output to better describe what was kept vs ignored.

Changes:

Deduplicate repeated osm_id matches by selecting a “best” shape per osm_id and returning those rows.
Expand the repeated-OSM warning to include returned/ignored shape_ids.
Add a post-processing summary that detects and reports routes with partial shape matches.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-06T12:55:52Z

R/query_osm_shapes_match_routes.R

    w = sprintf(
-      "There were %d error(s) during the algorithm execution, which led to %d route(s) without a match (route(s) ignored), with the following `%s`:\n\n> %s\n",
+      "There were %d error(s) during the algorithm execution, which led to %d route(s) without a match (route(s) ignored), with the following `%s`:\n\n> %s\n\nAdditionally, %d routes had partial matches (only some of its shapes had a match):\n\n> %s\n",
      errors,


The consolidated warning text now states that the %d error(s) "led to %d route(s) without a match (route(s) ignored)", but one of the counted "errors" (warning_osm_repeated) no longer causes the route to be ignored—only some shapes are dropped and returned as a partial match. This message can be misleading; consider splitting the summary into (a) routes ignored (not_found) and (b) routes partially matched, and only attributing the ignored count to conditions that actually remove routes from result_success.

Copilot · 2026-02-06T12:55:52Z

R/query_osm_shapes_match_routes.R

+  partial_match = result_success |>
+    st_drop_geometry() |>
+    group_by(.data[[gtfs_match]]) |>
+    summarise(shapes_n = n()) |>
+    left_join(routes_shapes_n, by=gtfs_match) |>
+    rename(matched = shapes_n.x, gtfs = shapes_n.y) |>
+    filter(matched < gtfs)


partial_match is computed unconditionally from result_success via st_drop_geometry(). When there are no successful matches, result_success is the result of bind_rows(list()) (a plain tibble, not an sf), so st_drop_geometry(result_success) will error before the later if (nrow(result_success) == 0) early-return. Guard this block with if (inherits(result_success, "sf") && nrow(result_success) > 0) (or move the nrow(result_success)==0 return above this computation).

Suggested change

partial_match = result_success |>

st_drop_geometry() |>

group_by(.data[[gtfs_match]]) |>

summarise(shapes_n = n()) |>

left_join(routes_shapes_n, by=gtfs_match) |>

rename(matched = shapes_n.x, gtfs = shapes_n.y) |>

filter(matched < gtfs)

if (inherits(result_success, "sf") && nrow(result_success) > 0) {

partial_match = result_success |>

st_drop_geometry() |>

group_by(.data[[gtfs_match]]) |>

summarise(shapes_n = n()) |>

left_join(routes_shapes_n, by=gtfs_match) |>

rename(matched = shapes_n.x, gtfs = shapes_n.y) |>

filter(matched < gtfs)

} else {

# No successful matches or result_success is not an sf: define empty partial_match

partial_match <- data.frame(

matched = integer(),

gtfs = integer()

)

partial_match[[gtfs_match]] <- character()

partial_match <- partial_match[, c(gtfs_match, "matched", "gtfs")]

}

Copilot · 2026-02-06T12:55:52Z

R/query_osm_shapes_match_routes.R


  not_found <- bind_rows( result[lengths(result)<=1] )
+  routes_shapes_n =  gtfs$routes |>  # Start on routes.txt to match line number with route_name
+    select(route_id, !!gtfs_match) |>


select(route_id, !!gtfs_match) is likely incorrect here because gtfs_match is a character scalar (validated earlier), and select() generally requires all_of(gtfs_match) / any_of(gtfs_match) (or sym(gtfs_match) + !!) for programmatic selection. As written, this can fail at runtime and break the new partial_match reporting.

Suggested change

select(route_id, !!gtfs_match) |>

select(route_id, all_of(gtfs_match)) |>

Copilot · 2026-02-06T12:55:53Z

R/query_osm_shapes_match_routes.R

+    # When multiple osm_id, return those with min distance_diff + points_diff + then stops_diff
    if (length(unique(gtfs_route_name_result$osm_id)) < nrow(gtfs_route_name_result)) {
+      gtfs_route_name_result_unique = gtfs_route_name_result |>
+        group_by(osm_id) |>
+        slice_min(order_by = distance_diff + points_diff + stops_diff, with_ties = FALSE) |>


The comment says to choose the row with min distance_diff + points_diff + then stops_diff, but slice_min(order_by = distance_diff + points_diff + stops_diff) uses a simple sum (and mixes meters with a stop count). This can change which shape is kept for a duplicated osm_id in unintuitive ways. Consider implementing a lexicographic tie-break (e.g., order by distance_diff, then points_diff, then stops_diff) and/or explicitly weighting/normalizing stops_diff so the score units are comparable.

Suggested change

# When multiple osm_id, return those with min distance_diff + points_diff + then stops_diff

if (length(unique(gtfs_route_name_result$osm_id)) < nrow(gtfs_route_name_result)) {

gtfs_route_name_result_unique = gtfs_route_name_result |>

group_by(osm_id) |>

slice_min(order_by = distance_diff + points_diff + stops_diff, with_ties = FALSE) |>

# When multiple osm_id, return those with min distance_diff, then points_diff, then stops_diff

if (length(unique(gtfs_route_name_result$osm_id)) < nrow(gtfs_route_name_result)) {

gtfs_route_name_result_unique = gtfs_route_name_result |>

group_by(osm_id) |>

arrange(distance_diff, points_diff, stops_diff, .by_group = TRUE) |>

slice(1L) |>

#62 osm_shapes_match_routes improved to return partial matches when m…

76b9c98

…ore shapes than osm routes

gmatosferreira linked an issue Feb 6, 2026 that may be closed by this pull request

Improve osm_shapes_match_routes to return partial matches when more shapes than osm routes #62

Open

gmatosferreira requested a review from Copilot February 6, 2026 12:52

Copilot started reviewing on behalf of gmatosferreira February 6, 2026 12:52 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#62 osm_shapes_match_routes improved to return partial matches when m…#63

#62 osm_shapes_match_routes improved to return partial matches when m…#63
gmatosferreira wants to merge 1 commit intomainfrom
62-improve-osm_shapes_match_routes-to-return-partial-matches-when-more-shapes-than-osm-routes

gmatosferreira commented Feb 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	select(route_id, !!gtfs_match) \|>
	select(route_id, all_of(gtfs_match)) \|>

Conversation

gmatosferreira commented Feb 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant