This is the official repository for the paper: Where to Attend: A Principled Vision-Centric Position Encoding with Parabolas.
See pape/nn/positions/{pape,pape_ri}.py if you are mainly interested in the code for PaPE and PaPE-RI. The rest of the code is related to the experiments in the paper.
Citation
@article{ohrstrom2026pape,
author = {Koo Øhrstrøm, Christoffer and I. Cabral Muchacho, Rafael and Dong, Yifei and Moumtzidellis, Filippos and Güldenring, Ronja and T. Pokorny, Florian and Nalpantidis, Lazaros},
doi = {10.48550/arXiv.2602.01418},
journal = {arXiv preprint arXiv:2602.01418},
month = feb,
title = {{Where to Attend: A Principled Vision-Centric Position Encoding with Parabolas}},
year = {2026}
}You will need uv, Rust, Python, and FFmpeg to install the project.
We expect at least Python v3.12 and recommend installing it through uv. The project has been tested with rustc v1.90.0 and FFmpeeg 6.1.1.
Run the install script to install the project in a new virtual environment.
sh install.shYou should also create these directories at the project root:
datasets/
experiments/Datasets must be placed in datasets (see Download and Prepare Datasets). Model checkpoints will automatically be placed in experiments.
We further log experiments (hyperparameters and learning curves) to Weights & Biases and expect that your user has access to a project called "parabolic-position-encoding". Please create this project. You can change this in pape/constants.py if you want a different project name.
All datasets use the same entrypoint: train.py.
For example, you can train an ImageNet1K model like this:
uv run train.py imagenet --positional papeThis will create a new experiment with a random name. Use the --name flag to give the experiment a custom name. You will refer to this name later when evaluating the model.
uv run train.py imagenet --positional pape --name ImageNet1K-PaPEYou can add the --debug flag to avoid saving checkpoints and turn off logging to Weights & Biases.
uv run train.py imagenet --positional pape --debugUse the --help flag to see all options and their defaults.
uv run train.py imagenet --helpUse evaluate.py to evaluate a model:
uv run evaluate.py [name][name] refers to the name assigned in training.
The results are logged to Weights & Biases.
Evaluation defaults to use the validation split and the checkpoint that obtained the highest validation score during training. Use the --split and --checkpoint options to change this:
uv run evaluate.py [name] --split test --checkpoint lastDataset-specific instructions follow below. Generally, the datasets shall be placed in a datasets directory at the root of the project, like this:
Datasets:
datasets/
- coco/
- dvs_gesture/
- gen1/
- imagenet/
- ucf_101/Download the COCO 2017 version from the official source, like this:
mkdir datasets/coco
cd datasets/coco
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
wget http://images.cocodataset.org/annotations/image_info_test2017.zip
wget http://images.cocodataset.org/zips/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zip
wget http://images.cocodataset.org/zips/test2017.zipThe contents should look like this after unzipping:
datasets/
- coco/
- annotations/
- test2017/
- train2017/
- val2017/Download the dataset here.
The contents should look like this after unpacking:
datasets/
- dvs_gesture/
- DvsGesture/Preprocess the dataset:
uv run preprocess.py dvsgestureThe preprocessed dataset is stored in datasets/dvs_gesture-preprocessed/.
Download the dataset here.
The contents should look like this after unpacking:
datasets/
- gen1/
- test/
- train/
- val/Preprocess the dataset:
uv run preprocess.py gen1The preprocessed dataset is stored in datasets/gen1-preprocessed/.
We use ImageNet1K.
Follow this guide to download it from Kaggle.
The contents should look like this afterwards:
datasets/
- imagenet/
- ILSVRC/
- ILSVRC2012_val_labels.json
- imagenet_class_index.json
- LOC_sample_submission.csv
- LOC_synset_mapping.txt
- LOC_train_solution.csv
- LOC_val_solution.csvDownload UCF-101 from here. Download the dataset itself and the action recognition annotations. The downloads should be unpacked inside of datasets/ucf_101, such that it looks like this:
datasets/
- ucf_101/
- UCF-101/
- ucfTrainTestlistThis repository uses code from the following projects:
- Prophesee toolbox: For loading data from Prophesee datasets (GEN1).
- Ultralytics: For the YOLOv10 object detection head.
- Spiking Patches: For tokenization of events from event cameras.
- LookHere: LookHere implementation.
