Skip to content

Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos

Notifications You must be signed in to change notification settings

BeingBeyond/VIPA-VLA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos

Yicheng Feng1,3, Wanpeng Zhang1,3, Ye Wang2,3, Hao Luo1,3, Haoqi Yuan1,3,
Sipeng Zheng3, Zongqing Lu1,3†

1Peking University    2Renmin University of China    3BeingBeyond

Website arXiv License

VIPA-VLA learns 2D–to–3D visual–physical grounding from human videos with spatial-aware VLA pretraining, enabling robot policies with stronger spatial understanding and generalization.

News

  • [2025-12-15]: We publish VIPA-VLA! Check our paper here. Code is coming soon! 🔥🔥🔥

Citation

If you find our work useful, please consider citing us and give a star to our repository! 🌟🌟🌟

@article{feng2025vipa,
  title={Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos},
  author={Feng, Yicheng and Zhang, Wanpeng and Wang, Ye and Luo, Hao and Yuan, Haoqi and  Zheng, Sipeng and Lu, Zongqing},
  journal={arXiv preprint arXiv:2512.13080},
  year={2025}
}

About

Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •