Framework overview for RoboCapture. RoboCapture is adept at acquiring images from a myriad of simulators and real-world scenarios, autonomously generating diverse multi-level tasks and planning them based on MLLMs. RoboCapture modularly selects and integrates skills, orchestrating the execution of tasks across various hardware platforms through middleware. With limited human supervision, RoboCapture performs real-time analysis of task status and failure causes, and collects multidimensional data, as illustrated in module 6.
Our code is divided into five parts: task generation and task planning, instruction fine-tuning, simulator data collection, real-world data collection, and the RT-1 model. The usage tutorial and code are currently being organized. Here are the usage tutorials for the different modules in our paper.
This project is released under the Apache 2.0 license.
We extend our gratitude to the open-source efforts of LLaVA, Yi,Reproducing rt-1 in pytorch and LLaMA-Factory.
