https://github.com/IDEA-Research/MaskDINO/blob/main/maskdino/modeling/transformer_decoder/maskdino_decoder.py#L391
src_flatten = []
mask_flatten = []
spatial_shapes = []
for i in range(self.num_feature_levels):
idx=self.num_feature_levels-1-i
bs, c , h, w=x[idx].shape
size_list.append(x[i].shape[-2:])
spatial_shapes.append(x[idx].shape[-2:])
src_flatten.append(self.input_proj[idx](x[idx]).flatten(2).transpose(1, 2))
mask_flatten.append(masks[i].flatten(1))
src_flatten = torch.cat(src_flatten, 1) # bs, \sum{hxw}, c
mask_flatten = torch.cat(mask_flatten, 1) # bs, \sum{hxw}
mask_flatten.append(masks[i].flatten(1)) # bug: error, maybe is masks[idx]???????
If my understanding is correct, there appears to be a logical error in the mask processing within the decoder implementation. While this doesn't affect the final results, the implementation logic seems incorrect.
Please correct me if I've misinterpreted the code.
https://github.com/IDEA-Research/MaskDINO/blob/main/maskdino/modeling/transformer_decoder/maskdino_decoder.py#L391
mask_flatten.append(masks[i].flatten(1)) # bug: error, maybe is masks[idx]???????If my understanding is correct, there appears to be a logical error in the mask processing within the decoder implementation. While this doesn't affect the final results, the implementation logic seems incorrect.
Please correct me if I've misinterpreted the code.