Skip to content

ICCV'25 LLM-assisted Entropy-based Adaptive Distillation for Self-Supervised Fine-Grained Visual Representation Learning

Notifications You must be signed in to change notification settings

HuiGuanLab/LEAD

Repository files navigation

LLM-assisted Entropy-based Adaptive Distillation for Self-Supervised Fine-Grained Visual Representation Learning

This repository contains the implementation of our research on unsupervised fine-grained image recognition.

Datasets

Dataset Download Link
CUB-200-2011 https://data.caltech.edu/records/65de6-vp158
Stanford Cars https://www.kaggle.com/datasets/cyizhuo/stanford-cars-by-classes-folder/data
FGVC Aircraft http://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/

Please download and organize the datasets in this structure. The aircraft dataset can be organized into the style we want using the following command:

python aircraft_organize.py --ds /path/to/fgvc-aircraft-2013b --out /path/to/aircraft --link none
LEAD
├── bird/
│   ├── images/ 
		├── 001.Black_footed_Albatross
		├── 002.Laysan_Albatross
		……
	├── images.txt
	├── train_test_split.txt
├── car/
│   ├── train/ 
		├── Acura Integra Type R 2001
		├── Acura RL Sedan 2012
		……
    ├── test/
├── aircraft/
│   ├── train/ 
		├── 707-320
		├── 727-200
		……
    ├── test/

Environments

  • Ubuntu 22.04
  • CUDA 12.4

Use the following instructions to create the corresponding conda environment. Besides, you should download the ResNet50 pre-trained model by clicking here and save it in this folder.

conda create --name LEAD python=3.9.1
conda activate LEAD
pip install -r requirements.txt

Direct Training and Downstream Testing

  • For ease of use, we have pre-converted the text descriptions generated by LLM into tensor format and placed them in the text_description_tensor folder. The original descriptions and the descriptions of random categories generated by LLM are all in the text_description folder.
  • Run the following scripts for pre-training and downstream linear probing and image retrieval.
./run_train_test.sh $task $dataset $llm_description $checkpoints_name $num_classes $cuda_device &linear_name

$task is the task name (bird or car or aircraft).

$dataset is the dataset path for unsupervised pre-training.

$llm_description is the text description address generated by LLM.

$checkpoints_name is the name of the folder where the checkpoints are saved in.

$num_classes is the Number of labels. bird 200, car 196, aircraft 100.

$cuda_device is the ID of used GPU.

$linear_name is the name of the folder where the linear probing checkpoints are saved in.

  • An example of pretraining on CUB_200_2011.
./run_train_test.sh bird bird/ text_description_tensor/bird_text_tensor.pt result_bird 200 0,1 linear_bird

Single Unsupervised Training

  • For ease of use, we have pre-converted the text descriptions generated by LLM into tensor format and placed them in the text_description_tensor folder. The original descriptions and the descriptions of random categories generated by LLM are all in the text_description folder.
  • Run the following script for pretraining. It will save the checkpoints to ./checkpoints/$checkpoints_name/.
./run_train.sh $task $dataset $llm_description $checkpoints_name $num_classes $cuda_device

$task is the task name (bird or car or aircraft).

$dataset is the dataset path for unsupervised pre-training.

$llm_description is the text description address generated by LLM.

$checkpoints_name is the name of the folder where the checkpoints are saved in.

$num_classes is the Number of labels. bird 200, car 196, aircraft 100.

$cuda_device is the ID of used GPU.

  • An example of pretraining on CUB_200_2011.
./run_train.sh bird bird/ text_description_tensor/bird_text_tensor.pt result_bird 200 0,1

Single Downstream Task Evaluation

Linear probing

  • Run the following script for linear probing. We use a single machine and a single GPU to train linear probing. It will save the checkpoints to ./checkpoints_linear/$checkpoints_name/.
./run_linear.sh $task $dataset $pretrained $checkpoints_name $num_classes $cuda_device

$task is the task name (bird or car or aircraft).

$dataset is the dataset path for unsupervised pre-training.

$pretrained is the name of the folder where the training checkpoints are saved in.

$checkpoints_name is the name of the folder where the linear probing checkpoints are saved in.

$num_classes is the Number of labels. bird 200, car 196, aircraft 100.

$cuda_device is the ID of used GPU.

  • An example of linear probing on CUB_200_2011.
./run_linear.sh bird bird/ result_bird linear_bird 200 0

Image Retrieval

  • Run the following script for Image Retrieval. We use a single machine and a single GPU to implement image retrieval.
./run_retrieval.sh $task $dataset $pretrained $cuda_device

$task is the task name (bird or car or aircraft).

$dataset is the dataset path for unsupervised pre-training.

$pretrained is the name of the folder where the training checkpoints are saved in.

$cuda_device is the ID of used GPU.

  • An example of linear probing on CUB_200_2011.
./run_retrieval.sh bird bird/ result_bird 0

Reference

@inproceedings{dong2025iccv,
  title={LLM-assisted Entropy-based Adaptive Distillation for Self-Supervised Fine-Grained Visual Representation Learning},
  author={Jianfeng Dong and Danfeng Luo and Daizong Liu and Jie Sun and Xiaoye Qu and Xun Yang and Dongsheng Liu and Xun Wang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}

About

ICCV'25 LLM-assisted Entropy-based Adaptive Distillation for Self-Supervised Fine-Grained Visual Representation Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •