Built on: distilled feature fields (DFFs) (Kobayashi et al. NeurIPS 2022). This is a simpler and faster demo codebase of distilled feature fields (DFFs) (Kobayashi et al. NeurIPS 2022).
- For the 3 techniques that we tested in this project, we have 3 separate branches namely - total_variation, bilateral_filtering and sam_for_conv corresponding to total variation, bilateral filtering and the sam guided smoothing methods. The master branch contains the baseline code for DFFs
- Each of the three branches is structured similarly. The train.py file in each of these branches contains sections of code responsible for adding regularization (TV and Bilateral) and performing smoothing (SAM guided). These can be found within the training_step() function, specifically in the feature_loss section.
Visualization of feature field before and after additional smoothing.
Outputs from editing operations:
video_sam.mp4
pveg_sam.mp4
pveg_sam_color.mp4
Setup
python -m pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1--index-url https://download.pytorch.org/whl/cu121
python -m pip install torch-scatter -f https://data.pyg.org/whl/torch-2.1.1+cu121.html
python -m pip install -r requirements.txt
python -m pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
git submodule update --init --recursive
cd apex && pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ && cd ..
python -m pip install models/csrc/
(Download a sample dataset)
Train
--root_diris the dataset of images with poses.--feature_directoryis the dataset of feature maps for distillation.--feature_dimmatches the dimension of them.
python train.py --root_dir sample_dataset --dataset_name colmap --exp_name exp_v1 --downsample 0.25 --num_epochs 4 --batch_size 4096 --scale 4.0 --ray_sampling_strategy same_image --feature_dim 512 --random_bg --feature_directory sample_dataset/rgb_feature_langseg
Render with Edit
- Modify
--edit_configor codebase itself for other editings. - Set
--ckpt_pathwith the checkpoint above.
python render.py --root_dir sample_dataset --dataset_name colmap --downsample 0.25 --scale 4.0 --ray_sampling_strategy same_image --feature_dim 512 --ckpt_path ckpts/colmap/exp_v1_clip/epoch\=0_slim.ckpt --edit_config query.yaml
# ls ./renderd_*.png
# ffmpeg -framerate 30 -i ./rendered_%03d.png -vcodec libx264 -pix_fmt yuv420p -r 30 video.mp4
colmap
colmap feature_extractor --ImageReader.camera_model OPENCV --SiftExtraction.estimate_affine_shape=true --SiftExtraction.domain_size_pooling=true --ImageReader.single_camera 1 --database_path sample_dataset/database.db --image_path sample_dataset/images --SiftExtraction.use_gpu=false
colmap exhaustive_matcher --SiftMatching.guided_matching=true --database_path sample_dataset/database.db --SiftMatching.use_gpu=false
mkdir sample_dataset/sparse
colmap mapper --database_path sample_dataset/database.db --image_path sample_dataset/images --output_path sample_dataset/sparse
colmap bundle_adjuster --input_path sample_dataset/sparse/0 --output_path sample_dataset/sparse/0 --BundleAdjustment.refine_principal
_point 1
colmap image_undistorter --image_path sample_dataset/images --input_path sample_dataset/sparse/0 --output_path sample_dataset_undis
--output_type COLMAP
Setup LSeg
cd distilled_feature_field/encoders/lseg_encoder
pip install -r requirements.txt
pip install git+https://github.com/zhanghang1989/PyTorch-Encoding/
Download the LSeg model file demo_e200.ckpt from the Google drive.
Encode and save
python -u encode_images.py --backbone clip_vitl16_384 --weights demo_e200.ckpt --widehead --no-scaleinv --outdir ../../sample_dataset_undis/rgb_feature_langseg --test-rgb-dir ../../sample_dataset_undis/images
This may produces large feature map files in --outdir (100-200MB per file).
Run train.py. If reconstruction fails, change --scale 4.0 to smaller or larger values, e.g., --scale 1.0 or --scale 16.0.
The codebase for this project is derived from DFFs
The codebase of NeRF is derived from ngp_pl (6b2a669, Aug 30 2022)
The codebase of encoders/lseg_encoder is derived from lang-seg


