Yepeng Liu1,2*, Hao Li2*, Liwen Yang1*, Fangzhen Li2†, Xudi Ge1, Yuliang Gu1, Kuang Gao2, Bing Wang2, Guang Chen2, Hangjun Ye2, Yongchao Xu1†✉
Keypoint-based matching is a fundamental component of modern 3D vision systems, such as Structure-from-Motion (SfM) and SLAM. Most existing learning-based methods are trained on image pairs, a paradigm that fails to explicitly optimize for the long-term trackability of keypoints across sequences under challenging viewpoint and illumination changes.
CLICK for the full abstract
In this paper, we reframe keypoint detection as a sequential decision-making problem. We introduce TraqPoint, a novel, end-to-end Reinforcement Learning (RL) framework designed to optimize the Track-quality (Traq) of keypoints directly on image sequences. Our core innovation is a track-aware reward mechanism that jointly encourages the consistency and distinctiveness of keypoints across multiple views, guided by a policy gradient method. Extensive evaluations on sparse matching benchmarks, including relative pose estimation and 3D reconstruction, demonstrate that TraqPoint significantly outperforms some state-of-the-art (SOTA) keypoint detection and description methods.
1School of Computer Science, Wuhan University 2Xiaomi EV
(*) Equal contribution. (†) Project leader. (✉) Corresponding author.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
This repository now supports two-stage training:
- Stage-1 Descriptor Training: Generates descriptor weights for matching/tracking.
- Stage-2 Keypoint/Policy Training: Loads Stage-1 weights and uses reinforcement learning to optimize the keypoint sampling policy.
- Installation
- Third-party dependencies
- Quick Start
- Relative Pose Estimation
- Visual Odometry
- Structure from Motion
- Citation
- License
Create an environment (Python 3.10+ recommended), then install dependencies:
pip install -r requirements.txtNotes:
hlocandpycolmapare only required for the SfM demo (demo_sfm.py). If you only run training/benchmarks, you may omit them.
This repository does not redistribute third-party source code or model weights.
- Install third-party source code into
./third_party/:
bash scripts/setup_third_party.shAfter running the setup script, the following subdirectories will be created in ./third_party/:
facebookresearch_dinov3_main- DINOv3 model repositoryHierarchical-Localization-master- Hierarchical Localization libraryLightGlue- LightGlue feature matching library
- Place required weight files into
./third_party/(seedocs/THIRD_PARTY.md).
Important for DINOv3: After installing the DINOv3 repository, you need to manually apply for the dinov3_convnext_base_pretrain_lvd1689m-801f2ba9.pth weight file and move it to the third_party folder.
If you see an error like "Missing required third-party repo/weights", it usually means step (1) or (2) was skipped.
Disclaimer: Datasets are not included in this repository. Please download from official sources and comply with their respective licenses. Some datasets (e.g., ScanNet, KITTI) are restricted to non-commercial research use only.
1. MegaDepth for Stage-1 (Descriptor Training)
The descriptor training stage uses the standard MegaDepth dataset.
-
Download the MegaDepth dataset and the official
megadepth_indicesfrom the original LoFTR repository. -
Your MegaDepth root folder should be organized as follows:
/path/to/megadepth/ ├── megadepth_indices # indices ├── depth_undistorted # depth maps ├── Undistorted_SfM # images and poses └── scene_info # indices for training
2. Sequence Generation for Stage-2 (Keypoint Training)
The keypoint training stage requires sequence data, which is generated from the pair-wise scene_info files.
-
The script uses the
scene_info_0.1_0.7directory (containing pair-wise data) as input. -
Run the sequence generation script:
# This script reads from .../scene_info_0.1_0.7 and generates sequence npz files python -m traqpoint.dataset.megadepth.generate_mega_indice -
The output will be a new directory named
sequence_indices_0.1_0.7_s5, containing the sequence.npzfiles required for Stage-2 training.
First, download the official DINOv3 weights (dinov3_convnext_base_pretrain_lvd1689m-801f2ba9.pth) and place them in the third_party directory, or update the model path in backbone_dinov3_conv.py. Then, run the training command.
The following command starts distributed training, automatically utilizing all available GPUs.
python -m training.train_descriptor \
--distributed \
--ckpt_save_path ./outputs/stage1-des \
--batch_size 32 \
--training_res 800 \
--lr 1e-4 \
--gamma_steplr 0.5 \
--save_ckpt_every 1000 \
--test_every_iter 3000 \
--epochs 12- A convenience script is also available:
train_descriptor.sh. - For single GPU training, remove the
--distributedflag and adjust the batch size. - You can override default dataset paths or load pre-trained weights with arguments like
--megadepth_root_path,--test_data_root, and--weights. - The output checkpoint is a
state_dict, which can be directly loaded by Stage-2 using the--weightsargument.
python -m training.train_key \
--train_detector --distributed \
--weights /path/to/stage1_descriptor.pth \
--ckpt_save_path ./outputs/key_run1 \
--sampling_strategy hybrid --num_global_samples 112 \
--grid_size 40 --ratio_thresh 0.5 \
--batch_size 120 --training_res 480- Required: The
--weightsargument must point to the Stage-1 checkpoint. - A convenience script is also available:
train_key.sh.
1. Download Test Sets
The evaluation uses the MegaDepth-1500 and ScanNet-1500 test sets. You can download them from this Google Drive link (provided by the RDD project).
After downloading and extracting, your test data folder should look like this:
/path/to/test_data/
├── megadepth_test_1500/
└── scannet_test_1500/Update the --data_root argument in the benchmark scripts to point to the correct path.
2. Run Evaluation
- MegaDepth 1500 (Outdoor)
python -m benchmarks.mega_1500 --weights ./weights/traqpoint_best.pth --method sparse --plot --data_root /path/to/test_data/megadepth_test_1500- Metrics output:
outputs/mega_1500/traqpoint_sparse.txt - Visualization: PNG files in
outputs/mega_1500/(if--plotis enabled) - ScanNet 1500 (Indoor)
python -m benchmarks.scannet_1500 --weights ./weights/traqpoint_best.pth --method sparse --plot --data_root /path/to/test_data/scannet_test_1500- Metrics output:
outputs/scannet/traqpoint_sparse.txt - Visualization: PNG files in
outputs/scannet/(if--plotis enabled)
1. Download KITTI Dataset Download the raw dataset from the official KITTI website. The evaluation script requires the color image sequences and the corresponding calibration files.
2. Run VO Evaluation
- VO evaluation script:
python -m benchmarks.visual_odometry.demo_vo_evaluator \
--path1 /path/to/kitti/dataset \
--path2 /path/to/kitti/dataset/sequences/01/image_0/ \
--id 01 \
--out_dir ./outputs/vo_kitty_results- The script uses default values for other parameters like detection threshold and tracking settings for a quick test.
- Output files:
- Keypoint video:
kitti_*_keypoints.avi - Trajectory video:
kitti_*_trajectory.avi - Pose log:
kitti_*.txt - Evaluation results:
kitti_results.json
- Keypoint video:
- SfM reconstruction script:
python demo_sfm.py- Dependency: Requires an
hloc(hierarchical localization) environment. - Supported Datasets:
- Madrid_Metropolis
- Gendarmenmarkt
- Tower_of_London
- Configuration:
- Image path: Modify the
images_dirvariable to point to the dataset image directory. - Output path: Modify the
outputsvariable to specify the reconstruction output location. - Feature config: Uses the
traqpointfeature extractor. - Matcher config: Uses the
traqpoint+dual_softmaxmatcher.
- Image path: Modify the
- Processing Pipeline:
- Image Retrieval (NetVLAD) → Generate image pairs
- Feature Extraction (TraqPoint) → Extract keypoints and descriptors
- Feature Matching → Generate match graphs
- 3D Reconstruction → Generate sparse point cloud and camera poses
- Output:
- Sparse reconstruction model
- Camera pose estimations
- Depth visualization (
color_by="depth")
Pretrained weights will be released soon.
@article{liu2026pairs,
title={From Pairs to Sequences: Track-Aware Policy Gradients for Keypoint Detection},
author={Liu, Yepeng and Li, Hao and Yang, Liwen and Li, Fangzhen and Ge, Xudi and Gu, Yuliang and Wang, Bing and Chen, Guang and Ye, Hangjun and Xu, Yongchao and others},
journal={arXiv preprint arXiv:2602.20630},
year={2026}
}
We thank these great repositories: ALIKE, LoFTR, DeDoDe, XFeat, LightGlue, Kornia, and Deformable DETR,RDD and many other inspiring works in the community.
This project is licensed under the Apache License 2.0.
