Potential label leakage issue due to tile stitching in SD map

Hello! I'm truly thankful for the insights presented in your paper.

While studying this outstanding work, I noticed that you implemented a tiling process in [lines](https://github.com/facebookresearch/OrienterNet/blob/c222d08c9c0a084d79a415cc5c43e71173d2ecc4/maploc/osm/tiling.py#L120)  108 to 125. However, when reassembling the tiled rasters back into a single image, there may be discrepancies at the seams compared to the original image. This could be due to the fact that, when a straight line is divided into segments, the end of the line might be prematurely rounded to the next pixel, resulting in a 1-pixel difference in the reassembled image.

The example image below illustrates the difference between the original 256x256 SD map and the reassembled image from four 128x128 sub-images that were initially split and then stitched back together.
<div style="display:inline-block">
 <img width="300" alt="image" src="https://github.com/facebookresearch/OrienterNet/assets/34845957/c5a5c509-4c6a-4124-a3fe-0121da889830">
<img width="300" alt="image" src="https://github.com/facebookresearch/OrienterNet/assets/34845957/07fc9173-7c16-434a-8b45-0cad11c0b4e7">
</div>

Of course, such discrepancies are usually negligible; however, there is an exception in the following scenario:
When I obtain the WGS84 ground truth for a 2D query image, I use this ground truth as the center to extract our SD map, setting the dimensions to 256x256, while keeping the tile_size at the default value of 128.

So the [tile_manager](https://github.com/facebookresearch/OrienterNet/blob/c222d08c9c0a084d79a415cc5c43e71173d2ecc4/maploc/osm/tiling.py#L120) splits the tile into four parts right along the coordinates of the ground truth. Later, when we randomly select a 128x128 bounding box on this 256x256 SD map and call this [function](https://github.com/facebookresearch/OrienterNet/blob/c222d08c9c0a084d79a415cc5c43e71173d2ecc4/maploc/osm/tiling.py#L131)  to obtain the `canvas.raster` for training, the model, interestingly, accurately recognizes that the seams on these maps may reveal the true position of the GT. 
Consequently, our model experiences significant label leakage🤣！

Below is the visualization. Observe the cross lines at the GT location on the neural map.

<img width="1096" alt="image" src="https://github.com/facebookresearch/OrienterNet/assets/34845957/143fa72d-c189-4c35-82ac-d3b45e2e6d3d">


Therefore, my conclusion is:
The process of segmenting and then reassembling the SD map leaves scars on the map that are difficult to heal, and although they are minor, they still exhibit certain features that can be learned.
If these scars happen to coincide with the ground truth or original GPS coordinates when creating the dataset, it might enable the model to directly identify the leaked labels on the raster or interfere with the sensitivity to the GPS priors.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential label leakage issue due to tile stitching in SD map #48

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potential label leakage issue due to tile stitching in SD map #48

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions