MSDFNet: Multi-Scale Detail Feature Fusion Encoder-Decoder Network for Self-Supervised Monocular Thermal Image Depth Estimation

This code was developed and tested with python 3.7, Pytorch 1.5.1, and CUDA 10.2 on Ubuntu 16.04.
For ViViD Raw dataset, download the dataset provided on the official website.
For post-processed ViViD++ dataset, please download the dataset provided on the link.
After download our post-processed dataset, unzip the files to form the below structure.
KAIST_VIVID/
calibration/
cali_ther_to_rgb.yaml, ...
indoor_aggressive_local/
RGB/
data/
000001.png, 000002.png, ...
timestamps.txt
Thermal/
data/
timestamps.txt
Lidar/
data/
timestamps.txt
Warped_Depth/
data/
timestamps.txt
avg_velocity_thermal.txt
poses_thermal.txt
...
indoor_aggressive_global/
...
outdoor_robust_day1/
...
outdoor_robust_night1/
...
Upon the above dataset structure, you can generate training/testing dataset by running the script.
sh scripts/prepare_vivid_data.shsh scripts/trai_indoor.sh
sh scripts/train_outdoor.shbash scripts/test_indoor.sh
bash scripts/test_outdoor.shLee A J, Cho Y, Shin Y, et al. ViViD++: Vision for visibility dataset[J]. IEEE Robotics and Automation Letters, 2022, 7(3): 6282-6289.
Shin U, Park J, Kweon I S. Deep depth estimation from thermal image[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 1043-1053.
Shin U, Lee K, Lee B U, et al. Maximizing self-supervision from thermal image for effective self-supervised learning of depth and ego-motion[J]. IEEE Robotics and Automation Letters, 2022, 7(3): 7771-7778.