LLM-assisted Entropy-based Adaptive Distillation for Self-Supervised Fine-Grained Visual Representation Learning

This repository contains the implementation of our research on unsupervised fine-grained image recognition.

Datasets

Dataset	Download Link
CUB-200-2011	https://data.caltech.edu/records/65de6-vp158
Stanford Cars	https://www.kaggle.com/datasets/cyizhuo/stanford-cars-by-classes-folder/data
FGVC Aircraft	http://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/

Please download and organize the datasets in this structure. The aircraft dataset can be organized into the style we want using the following command:

python aircraft_organize.py --ds /path/to/fgvc-aircraft-2013b --out /path/to/aircraft --link none

LEAD
├── bird/
│   ├── images/ 
		├── 001.Black_footed_Albatross
		├── 002.Laysan_Albatross
		……
	├── images.txt
	├── train_test_split.txt
├── car/
│   ├── train/ 
		├── Acura Integra Type R 2001
		├── Acura RL Sedan 2012
		……
    ├── test/
├── aircraft/
│   ├── train/ 
		├── 707-320
		├── 727-200
		……
    ├── test/

Environments

Ubuntu 22.04
CUDA 12.4

Use the following instructions to create the corresponding conda environment. Besides, you should download the ResNet50 pre-trained model by clicking here and save it in this folder.

conda create --name LEAD python=3.9.1
conda activate LEAD
pip install -r requirements.txt

Direct Training and Downstream Testing

For ease of use, we have pre-converted the text descriptions generated by LLM into tensor format and placed them in the text_description_tensor folder. The original descriptions and the descriptions of random categories generated by LLM are all in the text_description folder.
Run the following scripts for pre-training and downstream linear probing and image retrieval.

./run_train_test.sh $task $dataset $llm_description $checkpoints_name $num_classes $cuda_device &linear_name

$task is the task name (bird or car or aircraft).

$dataset is the dataset path for unsupervised pre-training.

$llm_description is the text description address generated by LLM.

$checkpoints_name is the name of the folder where the checkpoints are saved in.

$num_classes is the Number of labels. bird 200, car 196, aircraft 100.

$cuda_device is the ID of used GPU.

$linear_name is the name of the folder where the linear probing checkpoints are saved in.

An example of pretraining on CUB_200_2011.

./run_train_test.sh bird bird/ text_description_tensor/bird_text_tensor.pt result_bird 200 0,1 linear_bird

Single Unsupervised Training

For ease of use, we have pre-converted the text descriptions generated by LLM into tensor format and placed them in the text_description_tensor folder. The original descriptions and the descriptions of random categories generated by LLM are all in the text_description folder.
Run the following script for pretraining. It will save the checkpoints to ./checkpoints/$checkpoints_name/.

./run_train.sh $task $dataset $llm_description $checkpoints_name $num_classes $cuda_device

$task is the task name (bird or car or aircraft).

$dataset is the dataset path for unsupervised pre-training.

$llm_description is the text description address generated by LLM.

$checkpoints_name is the name of the folder where the checkpoints are saved in.

$num_classes is the Number of labels. bird 200, car 196, aircraft 100.

$cuda_device is the ID of used GPU.

An example of pretraining on CUB_200_2011.

./run_train.sh bird bird/ text_description_tensor/bird_text_tensor.pt result_bird 200 0,1

Single Downstream Task Evaluation

Linear probing

Run the following script for linear probing. We use a single machine and a single GPU to train linear probing. It will save the checkpoints to ./checkpoints_linear/$checkpoints_name/.

./run_linear.sh $task $dataset $pretrained $checkpoints_name $num_classes $cuda_device

$task is the task name (bird or car or aircraft).

$dataset is the dataset path for unsupervised pre-training.

$pretrained is the name of the folder where the training checkpoints are saved in.

$checkpoints_name is the name of the folder where the linear probing checkpoints are saved in.

$num_classes is the Number of labels. bird 200, car 196, aircraft 100.

$cuda_device is the ID of used GPU.

An example of linear probing on CUB_200_2011.

./run_linear.sh bird bird/ result_bird linear_bird 200 0

Image Retrieval

Run the following script for Image Retrieval. We use a single machine and a single GPU to implement image retrieval.

./run_retrieval.sh $task $dataset $pretrained $cuda_device

$task is the task name (bird or car or aircraft).

$dataset is the dataset path for unsupervised pre-training.

$pretrained is the name of the folder where the training checkpoints are saved in.

$cuda_device is the ID of used GPU.

An example of linear probing on CUB_200_2011.

./run_retrieval.sh bird bird/ result_bird 0

Reference

@inproceedings{dong2025iccv,
  title={LLM-assisted Entropy-based Adaptive Distillation for Self-Supervised Fine-Grained Visual Representation Learning},
  author={Jianfeng Dong and Danfeng Luo and Daizong Liu and Jie Sun and Xiaoye Qu and Xun Yang and Dongsheng Liu and Xun Wang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
text_description		text_description
text_description_tensor		text_description_tensor
README.md		README.md
aircraft_organize.py		aircraft_organize.py
dataset.py		dataset.py
framework.png		framework.png
linear_probing.py		linear_probing.py
main.py		main.py
models.py		models.py
requirements.txt		requirements.txt
resnet.py		resnet.py
retrieval.py		retrieval.py
run_linear.sh		run_linear.sh
run_retrieval.sh		run_retrieval.sh
run_train.sh		run_train.sh
run_train_test.sh		run_train_test.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-assisted Entropy-based Adaptive Distillation for Self-Supervised Fine-Grained Visual Representation Learning

Datasets

Environments

Direct Training and Downstream Testing

Single Unsupervised Training

Single Downstream Task Evaluation

Linear probing

Image Retrieval

Reference

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

HuiGuanLab/LEAD

Folders and files

Latest commit

History

Repository files navigation

LLM-assisted Entropy-based Adaptive Distillation for Self-Supervised Fine-Grained Visual Representation Learning

Datasets

Environments

Direct Training and Downstream Testing

Single Unsupervised Training

Single Downstream Task Evaluation

Linear probing

Image Retrieval

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages