Skip to content

Clarification on Linear Probing: Discrepancy between Paper Description and Implementation/Results #3

@freezing-index

Description

@freezing-index

Thanks for the impressive work. I found a discrepancy regarding the Linear Probing implementation in the paper versus the current codebase.

1. Feature Source: Bottleneck vs. Last Layer Tokens
The paper (Sec. 4.1) states: "we probe only the reduced-dimensionality features from the bottleneck... directly evaluating the inherent properties of the latent features." This implies LP is done on the 64-dim latent features.

However, the current implementation seems to use concatenated tokens:

2. Performance Gap
I conducted LP experiments using the VTP-L-d64 model, specifically mapping the $16 \times 16 \times 64$ bottleneck features to 1,000 classes:

  • Paper Result: 85.7% Top-1 Acc.
  • My Result: 73.896% (trained for 100 epochs).

Questions:

  1. Was the 85.7% accuracy reported in the paper achieved using the 64-dim bottleneck or the 768-dim transformer output?
  2. If it was the 64-dim bottleneck, could you please provide the specific LP training hyperparameters (e.g., learning rate, weight decay, optimizer) to help reproduce the result?

Looking forward to your clarification. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions