Skip to content

#62 osm_shapes_match_routes improved to return partial matches when m…#63

Open
gmatosferreira wants to merge 1 commit intomainfrom
62-improve-osm_shapes_match_routes-to-return-partial-matches-when-more-shapes-than-osm-routes
Open

#62 osm_shapes_match_routes improved to return partial matches when m…#63
gmatosferreira wants to merge 1 commit intomainfrom
62-improve-osm_shapes_match_routes-to-return-partial-matches-when-more-shapes-than-osm-routes

Conversation

@gmatosferreira
Copy link
Member

…ore shapes than osm routes

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates osm_shapes_match_routes() to keep and return partial results when GTFS has more shapes than there are matching OSM routes (instead of failing/returning an empty placeholder for those routes), and expands the warning output to better describe what was kept vs ignored.

Changes:

  • Deduplicate repeated osm_id matches by selecting a “best” shape per osm_id and returning those rows.
  • Expand the repeated-OSM warning to include returned/ignored shape_ids.
  • Add a post-processing summary that detects and reports routes with partial shape matches.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 451 to 453
w = sprintf(
"There were %d error(s) during the algorithm execution, which led to %d route(s) without a match (route(s) ignored), with the following `%s`:\n\n> %s\n",
"There were %d error(s) during the algorithm execution, which led to %d route(s) without a match (route(s) ignored), with the following `%s`:\n\n> %s\n\nAdditionally, %d routes had partial matches (only some of its shapes had a match):\n\n> %s\n",
errors,
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consolidated warning text now states that the %d error(s) "led to %d route(s) without a match (route(s) ignored)", but one of the counted "errors" (warning_osm_repeated) no longer causes the route to be ignored—only some shapes are dropped and returned as a partial match. This message can be misleading; consider splitting the summary into (a) routes ignored (not_found) and (b) routes partially matched, and only attributing the ignored count to conditions that actually remove routes from result_success.

Copilot uses AI. Check for mistakes.
Comment on lines +440 to +446
partial_match = result_success |>
st_drop_geometry() |>
group_by(.data[[gtfs_match]]) |>
summarise(shapes_n = n()) |>
left_join(routes_shapes_n, by=gtfs_match) |>
rename(matched = shapes_n.x, gtfs = shapes_n.y) |>
filter(matched < gtfs)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial_match is computed unconditionally from result_success via st_drop_geometry(). When there are no successful matches, result_success is the result of bind_rows(list()) (a plain tibble, not an sf), so st_drop_geometry(result_success) will error before the later if (nrow(result_success) == 0) early-return. Guard this block with if (inherits(result_success, "sf") && nrow(result_success) > 0) (or move the nrow(result_success)==0 return above this computation).

Suggested change
partial_match = result_success |>
st_drop_geometry() |>
group_by(.data[[gtfs_match]]) |>
summarise(shapes_n = n()) |>
left_join(routes_shapes_n, by=gtfs_match) |>
rename(matched = shapes_n.x, gtfs = shapes_n.y) |>
filter(matched < gtfs)
if (inherits(result_success, "sf") && nrow(result_success) > 0) {
partial_match = result_success |>
st_drop_geometry() |>
group_by(.data[[gtfs_match]]) |>
summarise(shapes_n = n()) |>
left_join(routes_shapes_n, by=gtfs_match) |>
rename(matched = shapes_n.x, gtfs = shapes_n.y) |>
filter(matched < gtfs)
} else {
# No successful matches or result_success is not an sf: define empty partial_match
partial_match <- data.frame(
matched = integer(),
gtfs = integer()
)
partial_match[[gtfs_match]] <- character()
partial_match <- partial_match[, c(gtfs_match, "matched", "gtfs")]
}

Copilot uses AI. Check for mistakes.

not_found <- bind_rows( result[lengths(result)<=1] )
routes_shapes_n = gtfs$routes |> # Start on routes.txt to match line number with route_name
select(route_id, !!gtfs_match) |>
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select(route_id, !!gtfs_match) is likely incorrect here because gtfs_match is a character scalar (validated earlier), and select() generally requires all_of(gtfs_match) / any_of(gtfs_match) (or sym(gtfs_match) + !!) for programmatic selection. As written, this can fail at runtime and break the new partial_match reporting.

Suggested change
select(route_id, !!gtfs_match) |>
select(route_id, all_of(gtfs_match)) |>

Copilot uses AI. Check for mistakes.
Comment on lines +369 to +373
# When multiple osm_id, return those with min distance_diff + points_diff + then stops_diff
if (length(unique(gtfs_route_name_result$osm_id)) < nrow(gtfs_route_name_result)) {
gtfs_route_name_result_unique = gtfs_route_name_result |>
group_by(osm_id) |>
slice_min(order_by = distance_diff + points_diff + stops_diff, with_ties = FALSE) |>
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says to choose the row with min distance_diff + points_diff + then stops_diff, but slice_min(order_by = distance_diff + points_diff + stops_diff) uses a simple sum (and mixes meters with a stop count). This can change which shape is kept for a duplicated osm_id in unintuitive ways. Consider implementing a lexicographic tie-break (e.g., order by distance_diff, then points_diff, then stops_diff) and/or explicitly weighting/normalizing stops_diff so the score units are comparable.

Suggested change
# When multiple osm_id, return those with min distance_diff + points_diff + then stops_diff
if (length(unique(gtfs_route_name_result$osm_id)) < nrow(gtfs_route_name_result)) {
gtfs_route_name_result_unique = gtfs_route_name_result |>
group_by(osm_id) |>
slice_min(order_by = distance_diff + points_diff + stops_diff, with_ties = FALSE) |>
# When multiple osm_id, return those with min distance_diff, then points_diff, then stops_diff
if (length(unique(gtfs_route_name_result$osm_id)) < nrow(gtfs_route_name_result)) {
gtfs_route_name_result_unique = gtfs_route_name_result |>
group_by(osm_id) |>
arrange(distance_diff, points_diff, stops_diff, .by_group = TRUE) |>
slice(1L) |>

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve osm_shapes_match_routes to return partial matches when more shapes than osm routes

1 participant