Accepted by [Advanced Science](https://advanced.onlinelibrary.wiley.com/doi/10.1002/advs.202507730) in 2025
- training and validation code.
- Agri170K dataset (val).
- Agri170K dataset (train-val).
- Release the checkpoints.
Vision deep neural networks (VDNNs) only simulate the attention-based significance selection function in human visual perception, rather than the full spectrum of visual cognition, reflecting the divide between cognitive science (CS) and artificial intelligence (AI). To address this problem, we propose a cognitive modeling framework (CMF) comprising three stages: functional abstraction, operator structuring, and program agent. Then, we define the prior information of basic image features as the long-term memory content in VDNNs. Meanwhile, we introduce a memory modeling method for VDNNs based on the fast Fourier transform (FFT) and statistical methods—the unbiased mapping algorithm (UMA). Finally, we develop visual cognitive neural units (VCNUs) and a baseline model (VCogM) based on CMF and UMA, and conduct performance testing experiments on different datasets such as natural scene recognition and agricultural image classification. The results show that VCogM and VCNU achieved state-of-the-art (SOTA) performance across various recognition tasks. The model’s learning process is independent of data distribution and scale, fully demonstrating the rationality of cognitive-inspired modeling principles. The research findings provide new insights into the deep integration of CS and AI.
We have provided detailed instructions for model training and testing and experimental details.
- Clone this repo:
conda create -n vdnn python=3.10 -y
conda activate vdnn
git clone https://github.com/CAU-COE-VEICLab/Vision-Cognitive-Neural-Networks.git
cd Vision-Cognitive-Neural-Networks- Install
CUDA>=10.2withcudnn>=7following the official installation instructions - Install
PyTorch>=1.8.0andtorchvision>=0.9.0withCUDA>=10.2:
pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install timm==0.4.12
pip install opencv-python==4.4.0.46 termcolor==1.1.0 yacs==0.1.8 pyyaml scipyWe use standard ImageNet dataset, you can download it from http://image-net.org/. We provide the following two ways to load data:
- For standard folder dataset, move validation images to labeled sub-folders. The file structure should look like:
$ tree data imagenet ├── train │ ├── class1 │ │ ├── img1.jpeg │ │ ├── img2.jpeg │ │ └── ... │ ├── class2 │ │ ├── img3.jpeg │ │ └── ... │ └── ... └── val ├── class1 │ ├── img4.jpeg │ ├── img5.jpeg │ └── ... ├── class2 │ ├── img6.jpeg │ └── ... └── ...
To evaluate a pre-trained VCogM on ImageNet val, run:
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> main.py --eval \
--cfg <config-file, e.g., configs/sota_benchmark/vcnn/vcm_tiny_1k.yaml > --pretrained <checkpoint> --data-path <imagenet-path> To evaluate a pre-trained VCogM on Agri170K val, run:
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> main_diffusion_tuning.py --eval \
--cfg <config-file, e.g., configs/vcnu_agri17k/vcnn/pretrain/vcm_tiny_agri17k.yaml> --pretrained <checkpoint> --data-path <imagenet-path> To train the VCogM-48M on ImageNet1k, run:
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> main.py \
--cfg <config-file, e.g., configs/sota_benchmark/vcnn/vcm_small_1k.yaml > --data-path <imagenet-path> [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag>]To train the VCogM-25M on Agri170K, run:
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> main.py \
--cfg <config-file, e.g., configs/vcnu_agri17k/vcnn/pretrain/vcm_tiny_agri17k.yaml > --data-path <imagenet-path> [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag>]To train the VCNU-21M on Agri170K, run:
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> main.py \
--cfg <config-file, e.g., configs/vcnu_agri17k/vcnn/pretrain/vcnu_small_agri17k.yaml > --data-path <imagenet-path> [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag>]You can calculate the SSIM value of each image in your dataset by following this step:
- Using 'uma_tools/statistic_uma_strategy1.py' & 'uma_tools/statistic_uma_strategy2.py' to calculate the SSIM value of each image in the dataset. Then you can get the Excel file (named ssim_origin_excel_file), which contains the SSIM value of each image in your dataset
- Using 'uma_tools/count_frequency.py' to calculate the frequency distribution P in your dataset.
TODOs
| name | pretrain | resolution | acc@1 | #params | FLOPs | 1K model | Agri170K model |
|---|---|---|---|---|---|---|---|
| VCNU-T | ImageNet-1K | 224x224 | 78.2 | 13M | 2.3G | baidu | - |
| VCNU-S | ImageNet-1K | 224x224 | 80.8 | 21M | 4G | baidu | baidu |
| VCNU-B | ImageNet-1K | 224x224 | 81.8 | 37M | 6.8G | baidu | - |
| VCogM-T | ImageNet-1K | 224x224 | 82.5 | 25M | 4.3G | baidu | baidu |
| VCogM-S | ImageNet-1K | 224x224 | 83.9 | 48M | 8.7G | baidu | - |
| VCogM-B | ImageNet-1K | 224x224 | 84.4 | 92M | 17.1G | baidu | - |
We constructed a large-scale agricultural image dataset-Agri170K, comprising 96 categories and 173691 high-quality annotated images.
These images cover various scenes, including fruits, animals, crops, and agricultural machinery.
You can click this link to download Agri170K(train-val).
Our implementations are partially inspired by Swin Transformer.
Thanks for their great works!



