-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
Thanks for the impressive work. I found a discrepancy regarding the Linear Probing implementation in the paper versus the current codebase.
1. Feature Source: Bottleneck vs. Last Layer Tokens
The paper (Sec. 4.1) states: "we probe only the reduced-dimensionality features from the bottleneck... directly evaluating the inherent properties of the latent features." This implies LP is done on the 64-dim latent features.
However, the current implementation seems to use concatenated tokens:
- In the README example, it suggests using
cls token + patch tokens. - In
tools/test_linear_probing_hf.py(Line 143), it also appears to use concatenated features instead of the bottleneck.
2. Performance Gap
I conducted LP experiments using the VTP-L-d64 model, specifically mapping the
- Paper Result: 85.7% Top-1 Acc.
- My Result: 73.896% (trained for 100 epochs).
Questions:
- Was the 85.7% accuracy reported in the paper achieved using the 64-dim bottleneck or the 768-dim transformer output?
- If it was the 64-dim bottleneck, could you please provide the specific LP training hyperparameters (e.g., learning rate, weight decay, optimizer) to help reproduce the result?
Looking forward to your clarification. Thanks!
Metadata
Metadata
Assignees
Labels
No labels