The winner solution of PBVS'26 HISR Challenge on both PSNR and SAM Evaluation.
Dawei Fan
Based on VolFormer, we developed two architectures, named v1 (16.64M) and v2 (75.77M), and trained them separately resulting in two pre-trained models. Considering that a single model may have limited stability, we further improved it by weighted fusing the inference results. The v1 model achieved PSNR of 24.5750 dB and SAM of 0.048591 rad, whereas the v2 model delivered 25.0368 dB and 0.052980 rad. The fused result achieves better performance on both metrics, with a PSNR of 25.3089 dB and an SAM of 0.048415 rad.
# Clone this repo
git clone https://github.com/NikonD850/PBVS_HISR26.git
cd PBVS_HISR26
# Conda Env setup
conda create -n PBVS_HISR26_v1 python=3.10 -y
conda activate PBVS_HISR26_v1
# gdal installation may take more than an hour
conda install -c conda-forge gdal=3.12.1
python -m pip install -r requirements.txtSince the link on Codabench page has expired. To your convenience, we add the LR inputs HDF5 files to datasets/final_test.
You can download our pre-trained models (v1 & v2) from checkpoints. The results are available at final results. The results are expected to be the same.
Put v1.pth into ./v1/checkpoints, v2.pth into ./v2/checkpoints, final_test dataset into ./datasets.
Run ./v1/test_h5_no_gdal.py for v1 inference:
cd ./v1
python test_h5_no_gdal.py \
--ckpt ./checkpoints/v1.pth \
--test_dir ../datasets/final_test \
--save_dir ./result/v1
cd ..Run ./v2/inference.py for v2 inference:
cd ./v2
python inference.py \
--ckpt ./checkpoints/v2.pth \
--input_dir ../datasets/final_test \
--out_dir ./result/v2
cd ..Run ./merge_h5_weighted.py to fuse h5 files:
python merge_h5_weighted.py \
--input_a ./v1/result/v1 \
--input_b ./v2/result/v2 \
--weight_a 0.4 --weight_b 0.6 \
--out_path ./result/v1_v2 \
--strict_files 1 --strict_keys 1The v1 model performs moderately in terms of PSNR but achieves better SAM metrics, so it is used as an auxiliary model. Meanwhile, the v2 model demonstrates strong performance in both PSNR and SAM. Thus, the weights are set to 0.4 (v1) and 0.6 (v2).
ZIP file will be automatically generated under the ./result directory for submission.
We used No.7, No.25 and No.50 for validation while others for training. The dataset can be found at dataset
Place the datasets under v1/datasets as the following structure:
v1/
├── datasets/
│ ├── tiff/ # TIFF (Pre-train)
│ │ ├── train/
│ │ │ ├── Scene_1_LR.tif
│ │ │ ├── Scene_1_HR.tif
│ │ │ └── ...
│ │ └── test/
│ └── h5/ # HDF5 (Finetune)
│ ├── train/
│ │ └── Scene_1.h5
│ └── test/
......
Model v1 uses TIFF (.tif) files for pre-training and HDF5 (.h5) files for fine-tuning.
TIFF: {name}_LR.tif & {name}_HR.tif are paired, with LR × 4 = HR.
HDF5: Contains LR_uint8: (H, W, C) & HR_uint8: (H*4, W*4, C).
Use h5_to_tiff.py to convert from HDF5 to TIFF:
cd ./v1
python h5_to_tiff.py
cd ..Place the original h5 files and patch shards under v2/datasets_origin as the following structure:
v2/
├── datasets_origin/
│ ├── train/ # Original h5 files
│ │ └── *.h5
│ └── test/ # Original h5 files
│ └── *.h5
├── datasets/ # Generated datasets
│ ├── train/ # h5 files for training
│ │ └── *.h5
│ ├── test/
│ │ └── *.h5
│ ├── test_crop50/ # h5 files for pretrained models validating
│ │ └── *.h5
│ └── patch_shards/ # Output of build_patch_shards.py
│ ├── train/ # h5 files for fine-tuning
│ │ ├── shard_*.h5
│ │ └── manifest.json
│ └── test/ # h5 files for fine-tuned models validating
│ ├── shard_*.h5
│ └── manifest.json
......
- build LR_4x data for pre-training
cd ./v2
python add_bicubic_h5.py --in_dir datasets_origin/train --out_dir datasets/train
python add_bicubic_h5.py --in_dir datasets_origin/test --out_dir datasets/test
python HISR_crop50.py --root_in datasets/test --root_out datasets/test_crop50
cd ..- Build patch shards using
build_patch_shards.pyfor fine-tuning:
cd ./v2
python build_patch_shards.py --in_dir datasets/train --out_dir datasets/patch_shards/train
python build_patch_shards.py --in_dir datasets/tesr --out_dir datasets/patch_shards/test
cd ..Required 4 24G NVIDIA GPUs. We used an Ubuntu 20 LTS server with 2 Intel Xeon Gold 6354, 512G RAM and 4 NVIDIA RTX 3090 GPUs.
cd ./v1
# Base model training
torchrun --nproc_per_node=4 mains.py train
# Fast fine-tuning with SAM optimization
python finetune_sam_h5_fast.pyRequired 4 80G NVIDIA GPUs. We used Ubuntu 22.04 LTS servers with 2 Intel Xeon Gold 6348, 480G RAM and 4 NVIDIA A100/A800 GPUs
cd ./v2
# Build up Conda Env
conda create -n PBVS_HISR26_v2 python=3.10
conda activate PBVS_HISR26_v2
pip install -r v2-requirements.txt
# Training & fine-tuning
torchrun train.py --nproc_per_node=4This code is based on VolFormer. We gratefully acknowledge the authors for their outstanding contribution to the community.
If you have any question, please email nikoncopd850@gmail.com to discuss with the author.