Skip to content

TensorAuto/OpenTau

Repository files navigation

Logo

CPU Tests Nightly GPU Tests Nightly Regression Tests Documentation Version Status Python versions License Docker pre-commit

OpenTau - Train VLA models with state-of-the-art techniques by Tensor

At Tensor, we are pushing the frontier of large foundation models for physical AI. In robot learning, a vision-language-action (VLA) model is a multimodal foundation model that integrates vision, language, and action. Today, VLA represents the leading approach for embodied AI, spanning autonomous driving, robot manipulation, and navigation.

OpenTau is Tensor’s open-source training toolchain for frontier VLA models—designed to make training reproducible, accessible, and scalable. At Tensor, we believe in open research and reproducible progress for the robotics community. By open-sourcing our training toolchain, we aim to expand knowledge sharing and accelerate scientific progress that others can reproduce.

Whether you use the official OpenPi codebase or LeRobot’s reimplementation, you may still be missing key components. OpenTau implements these key capabilities in one place:

  • Co-training on an adjustable mixture of heterogeneous datasets
  • Discrete actions for fast VLM convergence in $\pi_{0.5}$
  • Knowledge insulation between the VLM backbone and the action expert
  • Dropout in the VLM to reduce overfitting
  • Authentic $\pi_{0.6}$ policy with a Gemma 3 4B backbone, 448×448 vision, and 5-step flow matching
  • Hierarchical $\pi_{0.7}$ policy: high-level planner + low-level controller with a SpaceTime SigLIP video encoder on a Gemma 3 backbone
  • A reinforcement learning pipeline described in $\pi^*_{0.6}$
  • And more...

OpenTau ($\tau$) is a tool developed by Tensor to bridge this gap, and we also use it internally to train our proprietary in-house models. Our goal is to help you train VLAs on any dataset while fully leveraging state-of-the-art techniques. We plan to continuously upgrade this repository to keep pace with the state of the art in the robotics community.

Features OpenPi LeRobot OpenTau
Co-training with Heterogeneous Datasets
Discrete Actions Training in $\pi_{0.5}$
Knowledge Insulation (KI) between VLM and Action Decoder
Dropout Layers in PaliGemma ✅ (Jax)
❌ (PyTorch)
Multi-Node and Multi-GPU Training
Fully Functioning $\pi_{0.5}$ Checkpoint
(Missing Text Embeddings)
Visualize dataset with URDF models
Simulation Environments for Evaluating Models
Create Validation Splits During Training
Drop-in Training Profiler & Unused-Param Auditor
$\pi^{*}_{0.6}$ style Reinforcement Learning Pipeline
Post-training on Human Data
Authentic $\pi_{0.6}$ Policy (Gemma 3 4B backbone, 448×448 vision)
Hierarchical $\pi_{0.7}$ Policy (HL Planner + LL Controller, SpaceTime SigLIP video encoder)
Framework Jax / PyTorch PyTorch PyTorch

Quick Start

If you are familiar with LeRobot, getting started with OpenTau is very easy. Because OpenTau is a fork of the popular LeRobot repository, any LeRobot-compliant policy and dataset can be used directly with OpenTau. Check out our documentation to get started quickly. We provide a quick start guide to help you get started with OpenTau.

For using local notebooks to train and evaluate models, find the notebooks at notebooks/pi05_training.ipynb and notebooks/pi05_evaluation_only.ipynb.

For using the Google Colab notebooks to train and evaluate models, find the colab notebooks here: pi05_training and pi05_evaluation_only respectively.

To spin up a $\pi_{0.6}$ training run, start from configs/examples/pi06_training_config.json. The policy is selected by setting "type": "pi06" and differs from $\pi_{0.5}$ in its Gemma 3 4B backbone, 448×448 image resolution, ~860M-parameter action expert, and 5-step flow-matching default.

$\pi_{0.7}$ splits the model into a high-level planner that proposes subgoals and a low-level controller that executes them. The current implementation pairs a Gemma 3 backbone with a SpaceTime SigLIP video encoder so the controller can attend over temporal context, and the two stages train independently. To train the low-level controller, start from configs/examples/pi07_low_level_libero.json (select via "type": "pi07_low_level"); the high-level planner is registered as "type": "pi07_high_level".

Training Diagnostics

OpenTau ships three drop-in scripts under src/opentau/scripts/ to help you figure out where a training run is spending its time. Each reads the same TrainPipelineConfig as opentau-train, so they reproduce your exact model / dataset / batch size — no reconfiguration needed.

Script What it answers
profile_step.py Where does each training step's wall-clock go? (forward / backward / optimizer / sync phases, with mean / median / p95)
profile_dataloader.py Is the dataloader keeping up with the GPUs? (pure input-pipeline ceiling, no model, no collective)
find_unused_params.py Are any parameters dead? (lists params DDP would refuse to sync with find_unused_parameters=False)

A one-command example — see where your training time is going:

accelerate launch \
    --config_file configs/examples/accelerate_ddp_config.yaml \
    src/opentau/scripts/profile_step.py \
    --config_path=<your_training_config.json>

Full tutorial with annotated example output and env-var knobs: docs/tutorials/benchmarking. A worked example investigation that used these tools to find and fix a 2.9× throughput regression is tracked in issue #177.

Checkpoints

We provide fully functioning $\pi_{0.5}$ checkpoints trained with high success rates. We plan to release more models in the near future.

Model Checkpoint Description Success Rate (%)
TensorAuto/Robocasa_navigatekitchen A $\pi_{0.5}$ model checkpoint trained on Navigate to Kitchen objects task on Robocasa. 97%
TensorAuto/Robocasa_Closeupdown A $\pi_{0.5}$ model checkpoint trained on Close Oven, Close Toaster and Close Dishwasher on Robocasa. Close Oven : 90%
Close Toaster : 70%
Close Dishwasher : 90%
TensorAuto/TensorAuto/robocasa_Closesideways A $\pi_{0.5}$ model checkpoint trained on Close Microwave, Close Cabinet and Close Fridge on Robocasa. Close Microwave : 97%
Close Cabinet : 65%
Close Fridge : 80%
TensorAuto/pi05_libero_continuous_state A $\pi_{0.5}$ model checkpoint trained on Libero dataset with continuous states (projecting raw proprioceptive states to models latent dimension). 92%
TensorAuto/moka_pot_libero_sft
TensorAuto/moka_pot_RECAP_R0
TensorAuto/moka_pot_RECAP_R1
A $\pi_{0}$ RECAP model checkpoint trained on moka pot task on libero. 83%
89%
90%
TensorAuto/tPi0.5-libero A $\pi_{0.5}$ model checkpoint trained on the LIBERO dataset with discrete actions and knowledge insulation. 98.4% (10)
97.6% (Goal)
100% (Object)
98% (Spatial)
TensorAuto/pi05_base A $\pi_{0.5}$ model checkpoint converted from the official openpi checkpoint, with language embeddings added. N/A
$\pi_{0.6}$ checkpoints Coming soon — the pi06 policy is implemented and ready to train from scratch; first TensorAuto-published checkpoint will appear here once released. N/A
$\pi_{0.7}$ checkpoints Coming soon — the hierarchical pi07_high_level + pi07_low_level policies are implemented and ready to train from scratch; first TensorAuto-published checkpoint will appear here once released. N/A
More coming soon...

Acknowledgements

This project builds on the $\pi$ series of papers and many other open-source efforts—especially LeRobot—for re-implementing the $\pi$ models and helping standardize training infrastructure. OpenTau extends these foundations to provide a more accessible, comprehensive toolchain for training vision-language-action agents.

About

Tensor's VLA Training Infrastructure for Real-World Robotics in PyTorch

Resources

License

Contributing

Stars

Watchers

Forks

Contributors

Languages