LLM-assisted Entropy-based Adaptive Distillation for Self-Supervised Fine-Grained Visual Representation Learning
| Dataset | Download Link |
|---|---|
| CUB-200-2011 | https://data.caltech.edu/records/65de6-vp158 |
| Stanford Cars | https://www.kaggle.com/datasets/cyizhuo/stanford-cars-by-classes-folder/data |
| FGVC Aircraft | http://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/ |
Please download and organize the datasets in this structure. The aircraft dataset can be organized into the style we want using the following command:
python aircraft_organize.py --ds /path/to/fgvc-aircraft-2013b --out /path/to/aircraft --link none
LEAD
├── bird/
│ ├── images/
├── 001.Black_footed_Albatross
├── 002.Laysan_Albatross
……
├── images.txt
├── train_test_split.txt
├── car/
│ ├── train/
├── Acura Integra Type R 2001
├── Acura RL Sedan 2012
……
├── test/
├── aircraft/
│ ├── train/
├── 707-320
├── 727-200
……
├── test/
- Ubuntu 22.04
- CUDA 12.4
Use the following instructions to create the corresponding conda environment. Besides, you should download the ResNet50 pre-trained model by clicking here and save it in this folder.
conda create --name LEAD python=3.9.1
conda activate LEAD
pip install -r requirements.txt- For ease of use, we have pre-converted the text descriptions generated by LLM into tensor format and placed them in the
text_description_tensorfolder. The original descriptions and the descriptions of random categories generated by LLM are all in thetext_descriptionfolder. - Run the following scripts for pre-training and downstream linear probing and image retrieval.
./run_train_test.sh $task $dataset $llm_description $checkpoints_name $num_classes $cuda_device &linear_name
$task is the task name (bird or car or aircraft).
$dataset is the dataset path for unsupervised pre-training.
$llm_description is the text description address generated by LLM.
$checkpoints_name is the name of the folder where the checkpoints are saved in.
$num_classes is the Number of labels. bird 200, car 196, aircraft 100.
$cuda_device is the ID of used GPU.
$linear_name is the name of the folder where the linear probing checkpoints are saved in.
- An example of pretraining on CUB_200_2011.
./run_train_test.sh bird bird/ text_description_tensor/bird_text_tensor.pt result_bird 200 0,1 linear_bird
- For ease of use, we have pre-converted the text descriptions generated by LLM into tensor format and placed them in the
text_description_tensorfolder. The original descriptions and the descriptions of random categories generated by LLM are all in thetext_descriptionfolder. - Run the following script for pretraining. It will save the checkpoints to
./checkpoints/$checkpoints_name/.
./run_train.sh $task $dataset $llm_description $checkpoints_name $num_classes $cuda_device
$task is the task name (bird or car or aircraft).
$dataset is the dataset path for unsupervised pre-training.
$llm_description is the text description address generated by LLM.
$checkpoints_name is the name of the folder where the checkpoints are saved in.
$num_classes is the Number of labels. bird 200, car 196, aircraft 100.
$cuda_device is the ID of used GPU.
- An example of pretraining on CUB_200_2011.
./run_train.sh bird bird/ text_description_tensor/bird_text_tensor.pt result_bird 200 0,1
- Run the following script for linear probing. We use a single machine and a single GPU to train linear probing. It will save the checkpoints to
./checkpoints_linear/$checkpoints_name/.
./run_linear.sh $task $dataset $pretrained $checkpoints_name $num_classes $cuda_device
$task is the task name (bird or car or aircraft).
$dataset is the dataset path for unsupervised pre-training.
$pretrained is the name of the folder where the training checkpoints are saved in.
$checkpoints_name is the name of the folder where the linear probing checkpoints are saved in.
$num_classes is the Number of labels. bird 200, car 196, aircraft 100.
$cuda_device is the ID of used GPU.
- An example of linear probing on CUB_200_2011.
./run_linear.sh bird bird/ result_bird linear_bird 200 0
- Run the following script for Image Retrieval. We use a single machine and a single GPU to implement image retrieval.
./run_retrieval.sh $task $dataset $pretrained $cuda_device
$task is the task name (bird or car or aircraft).
$dataset is the dataset path for unsupervised pre-training.
$pretrained is the name of the folder where the training checkpoints are saved in.
$cuda_device is the ID of used GPU.
- An example of linear probing on CUB_200_2011.
./run_retrieval.sh bird bird/ result_bird 0
@inproceedings{dong2025iccv,
title={LLM-assisted Entropy-based Adaptive Distillation for Self-Supervised Fine-Grained Visual Representation Learning},
author={Jianfeng Dong and Danfeng Luo and Daizong Liu and Jie Sun and Xiaoye Qu and Xun Yang and Dongsheng Liu and Xun Wang},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2025}
}
