You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 1, 2024. It is now read-only.
Conceptually both Dinov2 and IJEPA provide latent space representation of images. Dinov2 relies heavily on augmentation and data views generation, while IJEPA doesn't. So far as I can see, the primary advantage of the pretrained weights of DINOV2 is that it was trained on WAY more images. Why did facebook choose to do it on Dinov2 and not on IJEPA architecture? Are there advantages to the first? Are there benchmarks comparing both using the same initial unsupervised training set.
Hello,
Conceptually both Dinov2 and IJEPA provide latent space representation of images. Dinov2 relies heavily on augmentation and data views generation, while IJEPA doesn't. So far as I can see, the primary advantage of the pretrained weights of DINOV2 is that it was trained on WAY more images. Why did facebook choose to do it on Dinov2 and not on IJEPA architecture? Are there advantages to the first? Are there benchmarks comparing both using the same initial unsupervised training set.
Thank you!