Skip to content

mwalmer-umd/UPLiFT

Repository files navigation

[CVPR 2026] UPLiFT: Universal Pixel-dense Lightweight Feature Transforms

plot

This is the official code for "UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders" a lightweight method to upscale the features of pretrained backbones to create pixel-dense features. This repository includes sample code to run pretrained UPLiFT models for several backbones, and training code to create UPLiFT models for new backbones.

Paper: https://arxiv.org/abs/2601.17950

Website: https://www.cs.umd.edu/~mwalmer/uplift/

Updates

  • 4/20/26: UPLiFT Fast Mode now released! We’ve added several performance optimizations to further accelerate our existing UPLiFT models while also reducing memory usage. See details below.
  • 2/21/26: We’re happy to announce that UPLiFT has been accepted to CVPR 2026!
  • 2/1/26: Extra running options added, see details below.
  • 1/25/26: Initial release of UPLiFT!

Installation

First, create and activate a conda environment:

conda create --name uplift python=3.12
conda activate uplift

Then install UPLiFT with the dependencies for your desired backbone:

Option 1: Clone and install

git clone https://github.com/mwalmer-umd/UPLiFT.git
cd UPLiFT
pip install -e '.[vit]'       # for DINOv2/DINOv3
# or: pip install -e '.[sd-vae]' for Stable Diffusion VAE
# or: pip install -e '.[all]'    for all backbones

Option 2: Install from GitHub

pip install 'uplift[vit] @ git+https://github.com/mwalmer-umd/UPLiFT.git'
# or: pip install 'uplift[sd-vae] @ git+https://github.com/mwalmer-umd/UPLiFT.git'
# or: pip install 'uplift[all] @ git+https://github.com/mwalmer-umd/UPLiFT.git'

Quick Start

import torch
from PIL import Image

# Load model (weights auto-download from HuggingFace)
model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov2_s14')

# Run inference
image = Image.open('image.jpg')
features = model(image)

Available Models

Model Backbone Load with
DINOv2-S/14 ViT uplift_dinov2_s14
DINOv3-S+/16 ViT uplift_dinov3_splus16
SD 1.5 VAE Diffusion uplift_sd15_vae

Fast Mode

Enable Fast Mode to activate several optimizations that increase UPLiFT’s speed and reduce its memory usage. The final outputs will be nearly identical to the results running without Fast Mode, and we find that performance in downstream tasks is also nearly identical. Note that the first call with Fast Mode will take slightly longer for compilation, but subsequent runs will use cached kernels. See FAST_MODE.md for more details.

PyTorch Hub usage:

model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov2_s14', fast=True)
features = model(image)

Command-line usage:

python sample_inference.py --pretrained uplift_dinov2-s14 --image img.png --fast

More Options

# Raw model only (no backbone)
model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov2_s14', include_extractor=False)

# Custom iterations
model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov2_s14', iters=2)

# Activate lower memory mode for Local Attender, using serial neighborhood pooling instead of parallel pooling
model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov2_s14', iters=4, low_mem=True)

Inference with Pretrained Models

Weights are automatically downloaded from HuggingFace when using torch.hub.load() or the load_model() function contained in uplift/hub_loader.py. In addition, we provide sample_inference.py, which can also be used to quickly run pretrained models or new models you train. For example:

Extract pixel-dense features with a pretrained UPLiFT for DINOv3-S+/16:

python sample_inference.py --pretrained uplift_dinov3-splus16 --image imgs/Gigi_1_512.png --iters 4

Extract pixel-dense features with a pretrained UPLiFT for DINOv2-S/14, using a forced output size:

python sample_inference.py --pretrained uplift_dinov2-s14 --image imgs/Gigi_2_448.png --iters 4 --outsize 448

Upsample an image with a pretrained UPLiFT trained for the SD1.5 VAE backbone:

python sample_inference.py --pretrained uplift_sd1.5vae --image imgs/Gigi_3_512.png --iters 2

Try enabling low-memory mode, which sacrifices some speed for lower max memory usage. The model will give equivalent outputs.

python sample_inference.py --pretrained uplift_dinov3-splus16 --image imgs/Gigi_1_512.png --iters 4 --low_mem

If you train a new UPLiFT model for an existing supported backbone or a new backbone, you can manually specify the path to the config and ckpt for it and run inference as follows:

python sample_inference.py --config path/to/config.yaml --ckpt path/to/checkpoint.pth --image your_image.png --iters 4

Training an UPLiFT Model

Before training, update ./uplift/datasets/datasets_helper.py to specify the path(s) to your training dataset(s).

Config files are used to specify the UPLiFT architecture, the feature extracting backbone, and the training settings. Example config files can be found in ./uplift/configs/. To train UPLiFT for a new model, create or modify an existing config file for the new backbone.

This repository includes two built-in methods for loading backbones. The first is in ./uplift/extractors/vit_wrapper.py which uses timm for model loading. The second is in ./uplift/extractors/diff_extractor.py which can load Diffusers pipelines from Hugging Face. Note that some pipelines may not be compatible with this wrapper. If so, the wrapper must be modified to appropriately run the VAE encoder and decoder elements of your specified pipeline. For other models, we recommend implementing an extractor wrapper similar to the two examples provided.

Once you have prepared the dataset, backbone, and config file, you can launch training with train_uplift.py. For example, the following command can be used to train an UPLiFT model from scratch with an existing sample config file:

python -m uplift.train_uplift --config uplift/configs/uplift_dinov2-s14.yaml

Evaluations

We follow the evaluation protocols of JAFAR and FM-Boost. Additional evaluation scripts will be provided in the near future.

Acknowledgements

This work was made possible thanks to code provided by the following sources:

License

Distributed under the MIT License.

Citation

If you found UPLiFT useful, please cite our paper with the following:

@article{walmer2026uplift,
  title={UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders},
  author={Walmer, Matthew and Suri, Saksham and Aggarwal, Anirud and Shrivastava, Abhinav},
  journal={arXiv preprint arXiv:2601.17950},
  year={2026}
}

About

[CVPR 2026] Official Code for "UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages