This is the official code for "UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders" a lightweight method to upscale the features of pretrained backbones to create pixel-dense features. This repository includes sample code to run pretrained UPLiFT models for several backbones, and training code to create UPLiFT models for new backbones.
Paper: https://arxiv.org/abs/2601.17950
Website: https://www.cs.umd.edu/~mwalmer/uplift/
- 4/20/26: UPLiFT Fast Mode now released! We’ve added several performance optimizations to further accelerate our existing UPLiFT models while also reducing memory usage. See details below.
- 2/21/26: We’re happy to announce that UPLiFT has been accepted to CVPR 2026!
- 2/1/26: Extra running options added, see details below.
- 1/25/26: Initial release of UPLiFT!
First, create and activate a conda environment:
conda create --name uplift python=3.12
conda activate upliftThen install UPLiFT with the dependencies for your desired backbone:
Option 1: Clone and install
git clone https://github.com/mwalmer-umd/UPLiFT.git
cd UPLiFT
pip install -e '.[vit]' # for DINOv2/DINOv3
# or: pip install -e '.[sd-vae]' for Stable Diffusion VAE
# or: pip install -e '.[all]' for all backbonesOption 2: Install from GitHub
pip install 'uplift[vit] @ git+https://github.com/mwalmer-umd/UPLiFT.git'
# or: pip install 'uplift[sd-vae] @ git+https://github.com/mwalmer-umd/UPLiFT.git'
# or: pip install 'uplift[all] @ git+https://github.com/mwalmer-umd/UPLiFT.git'import torch
from PIL import Image
# Load model (weights auto-download from HuggingFace)
model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov2_s14')
# Run inference
image = Image.open('image.jpg')
features = model(image)| Model | Backbone | Load with |
|---|---|---|
| DINOv2-S/14 | ViT | uplift_dinov2_s14 |
| DINOv3-S+/16 | ViT | uplift_dinov3_splus16 |
| SD 1.5 VAE | Diffusion | uplift_sd15_vae |
Enable Fast Mode to activate several optimizations that increase UPLiFT’s speed and reduce its memory usage. The final outputs will be nearly identical to the results running without Fast Mode, and we find that performance in downstream tasks is also nearly identical. Note that the first call with Fast Mode will take slightly longer for compilation, but subsequent runs will use cached kernels. See FAST_MODE.md for more details.
PyTorch Hub usage:
model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov2_s14', fast=True)
features = model(image)Command-line usage:
python sample_inference.py --pretrained uplift_dinov2-s14 --image img.png --fast
# Raw model only (no backbone)
model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov2_s14', include_extractor=False)
# Custom iterations
model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov2_s14', iters=2)
# Activate lower memory mode for Local Attender, using serial neighborhood pooling instead of parallel pooling
model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov2_s14', iters=4, low_mem=True)Weights are automatically downloaded from HuggingFace when using torch.hub.load() or the load_model() function contained in uplift/hub_loader.py. In addition, we provide sample_inference.py, which can also be used to quickly run pretrained models or new models you train. For example:
Extract pixel-dense features with a pretrained UPLiFT for DINOv3-S+/16:
python sample_inference.py --pretrained uplift_dinov3-splus16 --image imgs/Gigi_1_512.png --iters 4
Extract pixel-dense features with a pretrained UPLiFT for DINOv2-S/14, using a forced output size:
python sample_inference.py --pretrained uplift_dinov2-s14 --image imgs/Gigi_2_448.png --iters 4 --outsize 448
Upsample an image with a pretrained UPLiFT trained for the SD1.5 VAE backbone:
python sample_inference.py --pretrained uplift_sd1.5vae --image imgs/Gigi_3_512.png --iters 2
Try enabling low-memory mode, which sacrifices some speed for lower max memory usage. The model will give equivalent outputs.
python sample_inference.py --pretrained uplift_dinov3-splus16 --image imgs/Gigi_1_512.png --iters 4 --low_mem
If you train a new UPLiFT model for an existing supported backbone or a new backbone, you can manually specify the path to the config and ckpt for it and run inference as follows:
python sample_inference.py --config path/to/config.yaml --ckpt path/to/checkpoint.pth --image your_image.png --iters 4
Before training, update ./uplift/datasets/datasets_helper.py to specify the path(s) to your training dataset(s).
Config files are used to specify the UPLiFT architecture, the feature extracting backbone, and the training settings. Example config files can be found in ./uplift/configs/. To train UPLiFT for a new model, create or modify an existing config file for the new backbone.
This repository includes two built-in methods for loading backbones. The first is in ./uplift/extractors/vit_wrapper.py which uses timm for model loading. The second is in ./uplift/extractors/diff_extractor.py which can load Diffusers pipelines from Hugging Face. Note that some pipelines may not be compatible with this wrapper. If so, the wrapper must be modified to appropriately run the VAE encoder and decoder elements of your specified pipeline. For other models, we recommend implementing an extractor wrapper similar to the two examples provided.
Once you have prepared the dataset, backbone, and config file, you can launch training with train_uplift.py. For example, the following command can be used to train an UPLiFT model from scratch with an existing sample config file:
python -m uplift.train_uplift --config uplift/configs/uplift_dinov2-s14.yaml
We follow the evaluation protocols of JAFAR and FM-Boost. Additional evaluation scripts will be provided in the near future.
This work was made possible thanks to code provided by the following sources:
- https://github.com/Jiawei-Yang/Denoising-ViT for uplift/extractors/vit_wrapper.py
- https://gist.github.com/sayakpaul/3ae0f847001d342af27018a96f467e4e and https://github.com/huggingface/diffusers/ for resources used in uplift/extractors/diff_extractor.py
- https://github.com/PaulCouairon/JAFAR) for evaluation and PCA visualization code
- https://github.com/CompVis/fm-boosting for evaluation code
- https://github.com/facebookresearch/ConvNeXt for LayerNorm
- https://gist.github.com/andrewjong/6b02ff237533b3b2c554701fb53d5c4d for data loading resources
Distributed under the MIT License.
If you found UPLiFT useful, please cite our paper with the following:
@article{walmer2026uplift,
title={UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders},
author={Walmer, Matthew and Suri, Saksham and Aggarwal, Anirud and Shrivastava, Abhinav},
journal={arXiv preprint arXiv:2601.17950},
year={2026}
}
