An Official Code Implementation of MimicFunc: Imitating Tool Manipulation from a Single Human Video via Functional Correspondence (CoRL 2025)
Chao Tang, Anxing Xiao, Yuhong Deng, Tianrun Hu, Wenlong Dong, Hanbo Zhang, David Hsu, and Hong Zhang
If you find this work useful, please cite:
@article{tang2025functo,
title={Functo: Function-centric one-shot imitation learning for tool manipulation},
author={Tang, Chao and Xiao, Anxing and Deng, Yuhong and Hu, Tianrun and Dong, Wenlong and Zhang, Hanbo and Hsu, David and Zhang, Hong},
journal={arXiv preprint arXiv:2502.11744},
year={2025}
}
Please begin by installing the following dependency packages: Open3D, SciPy, PyTorch, and CasADi.
Our code also relies on OWL-ViT, Grounding-SAM, and SD-DINO (optional), which are currently deployed on our internal servers. To use these models locally, please follow the installation instructions provided below.
-
Installing [OWL-ViT] or another object detector of your choice.
-
Installing [Grounding-SAM] or another segmentation model of your choice.
-
Installing [SD+DINO] (optional).
-
After installing these models, please replace the code blocks in the original code that were used to call them from our internal servers.
We provide a demo showcasing the task of pouring:
-
Specify the parameters in the
config.yamlfile located underutils_IL/config. Thevp_flagparameter controls whether to use a VLM for pose alignment refinement, whilesd_dino_flagdetermines whether to use SD+DINO for functional keypoint transfer. Both parameters are set to False by default. -
To run the demo:
python main.py
- Check the generated trajectory at
test_data/pour_test/tool_traj_transfer_output/test_tool_traj_pc.ply.