CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image (ICCV 2025)

Official PyTorch implementation of the ICCV 2025 paper CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image

[Paper], [Project Page]

Wonseok Roh*, Hwanhee Jung*, Jong Wook Kim, Seunggwan Lee, Innfarn Yoo, Andreas Lugmayr, Seunggeun Chi, Karthik Ramani, Sangpil Kim

Get Started

🛠 Environment

Create anaconda environment

conda create -y python=3.10 -n catsplat
conda activate catsplat

Install dependencies

pip install -r requirements-torch.txt --extra-index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

📦 Datasets Preparation

RealEstate10K

We generally follow the dataset preparation process described in the Flash3D repository.

For downloading the RealEstate10K dataset we base our instructions on the Behind The Scenes scripts. First you need to download the video sequence metadata including camera poses from https://google.github.io/realestate10k/download.html and unpack it into data/ such that the folder layout is as follows:
data/RealEstate10K/train
data/RealEstate10K/test
Finally download the training and test sets of the dataset with the following commands:
python datasets/download_realestate10k.py -d data/RealEstate10K -o data/RealEstate10K -m train
python datasets/download_realestate10k.py -d data/RealEstate10K -o data/RealEstate10K -m test
This step will take several days to complete. Finally, download additional data for the RealEstate10K dataset. In particular, we provide pre-processed COLMAP cache containing sparse point clouds which are used to estimate the scaling factor for depth predictions. The last two commands filter the training and testing set from any missing video sequences.
sh datasets/dowload_realestate10k_colmap.sh
python -m datasets.preprocess_realestate10k -d data/RealEstate10K -s train
python -m datasets.preprocess_realestate10k -d data/RealEstate10K -s test

We also utilize the LLaVA-1.5-13B model to obtain VLM text embeddings.
By using the publicly available LLaVA model from the Transformers repository, you can modify the transformers/src/transformers/generation/utils.py file to extract decoder_hidden_states.
We precompute the LLaVA embeddings for all test datasets and use them during evaluation.

🎾 Training

To train on the RealEstate10K dataset, simply run the following command:

python train.py \
  +experiment=layered_re10k \
  model.depth.version=v1 \
  train.logging=false

Or you can simply run train_single.sh

bash train_single.sh

⛳ Evaluation

To evaluate on the RealEstate10K dataset, you can either run the evaluate.sh script or execute the following command:

python evaluate.py \
    hydra.run.dir=[PATH_TO_EXPERIMENT_DIRECTORY] \
    hydra.job.chdir=true \
    +experiment=layered_re10k \
    +dataset.crop_border=true \
    dataset.test_split_path=./splits/re10k_mine_filtered/test_files.txt \
    model.depth.version=v1 \
    ++eval.save_vis=false \
    run.checkpoint=[PATH_TO_CHECKPOINT]

You can find CATSplat (Re10K) checkpoint here .

Citation

If you find this project useful, please consider citing:

@article{roh2024catsplat,
  title={CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image},
  author={Roh, Wonseok and Jung, Hwanhee and Kim, Jong Wook and Lee, Seunggwan and Yoo, Innfarn and Lugmayr, Andreas and Chi, Seunggeun and Ramani, Karthik and Kim, Sangpil},
  journal={arXiv preprint arXiv:2412.12906},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
configs		configs
datasets		datasets
evaluation		evaluation
misc		misc
models		models
pointnet @ 256437e		pointnet @ 256437e
splits		splits
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
evaluate.py		evaluate.py
evaluate.sh		evaluate.sh
pyproject.toml		pyproject.toml
requirements-torch.txt		requirements-torch.txt
requirements.txt		requirements.txt
train.py		train.py
train_single.sh		train_single.sh
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image (ICCV 2025)

Get Started

🛠 Environment

📦 Datasets Preparation

RealEstate10K

🎾 Training

⛳ Evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image (ICCV 2025)

Get Started

🛠 Environment

📦 Datasets Preparation

RealEstate10K

🎾 Training

⛳ Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages