TraqPoint

From Pairs to Sequences: Track-Aware Policy Gradients for Keypoint Detection

Yepeng Liu^1,2*, Hao Li^2*, Liwen Yang^1*, Fangzhen Li^2†, Xudi Ge¹, Yuliang Gu¹, Kuang Gao², Bing Wang², Guang Chen², Hangjun Ye², Yongchao Xu^1†✉

Abstract

Keypoint-based matching is a fundamental component of modern 3D vision systems, such as Structure-from-Motion (SfM) and SLAM. Most existing learning-based methods are trained on image pairs, a paradigm that fails to explicitly optimize for the long-term trackability of keypoints across sequences under challenging viewpoint and illumination changes.

CLICK for the full abstract

In this paper, we reframe keypoint detection as a sequential decision-making problem. We introduce TraqPoint, a novel, end-to-end Reinforcement Learning (RL) framework designed to optimize the Track-quality (Traq) of keypoints directly on image sequences. Our core innovation is a track-aware reward mechanism that jointly encourages the consistency and distinctiveness of keypoints across multiple views, guided by a policy gradient method. Extensive evaluations on sparse matching benchmarks, including relative pose estimation and 3D reconstruction, demonstrate that TraqPoint significantly outperforms some state-of-the-art (SOTA) keypoint detection and description methods.

¹School of Computer Science, Wuhan University ²Xiaomi EV

(*) Equal contribution. (†) Project leader. (✉) Corresponding author.

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

This repository now supports two-stage training:

Stage-1 Descriptor Training: Generates descriptor weights for matching/tracking.
Stage-2 Keypoint/Policy Training: Loads Stage-1 weights and uses reinforcement learning to optimize the keypoint sampling policy.

Installation

Create an environment (Python 3.10+ recommended), then install dependencies:

pip install -r requirements.txt

Notes:

hloc and pycolmap are only required for the SfM demo (demo_sfm.py). If you only run training/benchmarks, you may omit them.

Third-party dependencies

This repository does not redistribute third-party source code or model weights.

Install third-party source code into ./third_party/:

bash scripts/setup_third_party.sh

After running the setup script, the following subdirectories will be created in ./third_party/:

facebookresearch_dinov3_main - DINOv3 model repository
Hierarchical-Localization-master - Hierarchical Localization library
LightGlue - LightGlue feature matching library

Place required weight files into ./third_party/ (see docs/THIRD_PARTY.md).

Important for DINOv3: After installing the DINOv3 repository, you need to manually apply for the dinov3_convnext_base_pretrain_lvd1689m-801f2ba9.pth weight file and move it to the third_party folder.

If you see an error like "Missing required third-party repo/weights", it usually means step (1) or (2) was skipped.

Quick Start

Dataset Preparation

Disclaimer: Datasets are not included in this repository. Please download from official sources and comply with their respective licenses. Some datasets (e.g., ScanNet, KITTI) are restricted to non-commercial research use only.

1. MegaDepth for Stage-1 (Descriptor Training)

The descriptor training stage uses the standard MegaDepth dataset.

Download the MegaDepth dataset and the official megadepth_indices from the original LoFTR repository.

Your MegaDepth root folder should be organized as follows:

/path/to/megadepth/
├── megadepth_indices   # indices
├── depth_undistorted   # depth maps
├── Undistorted_SfM     # images and poses
└── scene_info          # indices for training

2. Sequence Generation for Stage-2 (Keypoint Training)

The keypoint training stage requires sequence data, which is generated from the pair-wise scene_info files.

The script uses the scene_info_0.1_0.7 directory (containing pair-wise data) as input.

Run the sequence generation script:

# This script reads from .../scene_info_0.1_0.7 and generates sequence npz files
python -m traqpoint.dataset.megadepth.generate_mega_indice

The output will be a new directory named sequence_indices_0.1_0.7_s5, containing the sequence .npz files required for Stage-2 training.

Stage-1 (Descriptor Training)

First, download the official DINOv3 weights (dinov3_convnext_base_pretrain_lvd1689m-801f2ba9.pth) and place them in the third_party directory, or update the model path in backbone_dinov3_conv.py. Then, run the training command.

The following command starts distributed training, automatically utilizing all available GPUs.

python -m training.train_descriptor \
  --distributed \
  --ckpt_save_path ./outputs/stage1-des \
  --batch_size 32 \
  --training_res 800 \
  --lr 1e-4 \
  --gamma_steplr 0.5 \
  --save_ckpt_every 1000 \
  --test_every_iter 3000 \
  --epochs 12

A convenience script is also available: train_descriptor.sh.
For single GPU training, remove the --distributed flag and adjust the batch size.
You can override default dataset paths or load pre-trained weights with arguments like --megadepth_root_path, --test_data_root, and --weights.
The output checkpoint is a state_dict, which can be directly loaded by Stage-2 using the --weights argument.

Stage-2 (Keypoints/RL Training)

python -m training.train_key \
  --train_detector --distributed \
  --weights /path/to/stage1_descriptor.pth \
  --ckpt_save_path ./outputs/key_run1 \
  --sampling_strategy hybrid --num_global_samples 112 \
  --grid_size 40 --ratio_thresh 0.5 \
  --batch_size 120 --training_res 480

Required: The --weights argument must point to the Stage-1 checkpoint.
A convenience script is also available: train_key.sh.

Relative Pose Estimation

1. Download Test Sets

The evaluation uses the MegaDepth-1500 and ScanNet-1500 test sets. You can download them from this Google Drive link (provided by the RDD project).

After downloading and extracting, your test data folder should look like this:

/path/to/test_data/
├── megadepth_test_1500/
└── scannet_test_1500/

Update the --data_root argument in the benchmark scripts to point to the correct path.

2. Run Evaluation

MegaDepth 1500 (Outdoor)

python -m benchmarks.mega_1500 --weights ./weights/traqpoint_best.pth --method sparse --plot --data_root /path/to/test_data/megadepth_test_1500

Metrics output: outputs/mega_1500/traqpoint_sparse.txt
Visualization: PNG files in outputs/mega_1500/ (if --plot is enabled)
ScanNet 1500 (Indoor)

python -m benchmarks.scannet_1500 --weights ./weights/traqpoint_best.pth --method sparse --plot --data_root /path/to/test_data/scannet_test_1500

Metrics output: outputs/scannet/traqpoint_sparse.txt
Visualization: PNG files in outputs/scannet/ (if --plot is enabled)

Visual Odometry (KITTI)

1. Download KITTI Dataset Download the raw dataset from the official KITTI website. The evaluation script requires the color image sequences and the corresponding calibration files.

2. Run VO Evaluation

VO evaluation script:

python -m benchmarks.visual_odometry.demo_vo_evaluator \
  --path1 /path/to/kitti/dataset \
  --path2 /path/to/kitti/dataset/sequences/01/image_0/ \
  --id 01 \
  --out_dir ./outputs/vo_kitty_results

The script uses default values for other parameters like detection threshold and tracking settings for a quick test.
Output files:
- Keypoint video: kitti_*_keypoints.avi
- Trajectory video: kitti_*_trajectory.avi
- Pose log: kitti_*.txt
- Evaluation results: kitti_results.json

Structure from Motion (SfM) Test

SfM reconstruction script:

python demo_sfm.py

Dependency: Requires an hloc (hierarchical localization) environment.
Supported Datasets:
- Madrid_Metropolis
- Gendarmenmarkt
- Tower_of_London
Configuration:
- Image path: Modify the images_dir variable to point to the dataset image directory.
- Output path: Modify the outputs variable to specify the reconstruction output location.
- Feature config: Uses the traqpoint feature extractor.
- Matcher config: Uses the traqpoint+dual_softmax matcher.
Processing Pipeline:
1. Image Retrieval (NetVLAD) → Generate image pairs
2. Feature Extraction (TraqPoint) → Extract keypoints and descriptors
3. Feature Matching → Generate match graphs
4. 3D Reconstruction → Generate sparse point cloud and camera poses
Output:
- Sparse reconstruction model
- Camera pose estimations
- Depth visualization (color_by="depth")

Pretrained Models

Pretrained weights will be released soon.

Citation

 @article{liu2026pairs,
  title={From Pairs to Sequences: Track-Aware Policy Gradients for Keypoint Detection},
  author={Liu, Yepeng and Li, Hao and Yang, Liwen and Li, Fangzhen and Ge, Xudi and Gu, Yuliang and Wang, Bing and Chen, Guang and Ye, Hangjun and Xu, Yongchao and others},
  journal={arXiv preprint arXiv:2602.20630},
  year={2026}
}

Acknowledgements

We thank these great repositories: ALIKE, LoFTR, DeDoDe, XFeat, LightGlue, Kornia, and Deformable DETR,RDD and many other inspiring works in the community.

License

This project is licensed under the Apache License 2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TraqPoint

From Pairs to Sequences: Track-Aware Policy Gradients for Keypoint Detection

Abstract

Table of Contents

Installation

Third-party dependencies

Quick Start

Dataset Preparation

Stage-1 (Descriptor Training)

Stage-2 (Keypoints/RL Training)

Relative Pose Estimation

Visual Odometry (KITTI)

Structure from Motion (SfM) Test

Pretrained Models

Citation

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
benchmarks		benchmarks
configs		configs
docs		docs
scripts		scripts
sfm		sfm
third_party		third_party
training		training
traqpoint		traqpoint
weights		weights
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
demo_sfm.py		demo_sfm.py
requirements.txt		requirements.txt
train_descriptor.sh		train_descriptor.sh
train_key.sh		train_key.sh

Folders and files

Latest commit

History

Repository files navigation

TraqPoint

From Pairs to Sequences: Track-Aware Policy Gradients for Keypoint Detection

Abstract

Table of Contents

Installation

Third-party dependencies

Quick Start

Dataset Preparation

Stage-1 (Descriptor Training)

Stage-2 (Keypoints/RL Training)

Relative Pose Estimation

Visual Odometry (KITTI)

Structure from Motion (SfM) Test

Pretrained Models

Citation

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages