I personally found that SwinV2 actually working much better than DinoV2 but your paper says otherwise. Also SwinDepth Paper shows that Swin is better than ViT based backbone for Depth Estimation task.
Can the authors share more details of the experiments they did with SwinV2 ?
I personally found that SwinV2 actually working much better than DinoV2 but your paper says otherwise. Also SwinDepth Paper shows that Swin is better than ViT based backbone for Depth Estimation task.
Can the authors share more details of the experiments they did with SwinV2 ?