Skip to content

Seun/nist gear insertion task example#566

Closed
seun-doherty wants to merge 184 commits intorelease/0.1.1from
seun/nist_gear_insertion_task_example
Closed

Seun/nist gear insertion task example#566
seun-doherty wants to merge 184 commits intorelease/0.1.1from
seun/nist_gear_insertion_task_example

Conversation

@seun-doherty
Copy link
Copy Markdown
Collaborator

Summary

  • Add a complete NIST gear insertion RL workflow: a Franka Panda robot inserts a medium gear onto a peg on the NIST assembly board using operational-space control and RL Games PPO
  • Include step-by-step documentation (environment setup, policy training, evaluation) mirroring the existing Franka lift task pages
  • Add RL Games policy wrapper, training script, and YAML config for LSTM-based asymmetric actor-critic

What's Included

Core task (isaaclab_arena/tasks/): task definition, keypoint-squashing rewards, 24-D policy observations with wrist-force feedback, insertion success / gear-drop terminations, domain randomization events

Environment (isaaclab_arena_environments/): OSC action term (asset-relative, EMA smoothing, dead-zones), Franka mimic OSC robot config, environment definition wiring scene + embodiment + task + RL Games

RL infrastructure (isaaclab_arena/policy/, scripts/): RL Games action policy wrapper, training script, PPO hyperparameter config

Documentation (docs/pages/example_workflows/nist_gear_insertion/): 4-page workflow (overview, environment setup, training, evaluation) with GIFs

Asset registry (isaaclab_arena/assets/object_library.py): NIST board, gear, peg, and connector asset definitions

NOTE: Object library paths for added assets will be updated when assets are uploaded to nucleus

xyao-nv and others added 30 commits December 3, 2025 13:33
## Summary
Closedloop GN1.5 observed low SR in multi-episode rollout, esp in
parallel-env where more contacts are being introduced / more episodes
are being observed.

## Detailed description
### Static manip
- Issue: At the beginning of episode, hands have close-open motions in
recorded trajectories. Given microwave joint is not stiff enough, small
deviations during first few inferences cause the door closed by mistake.
And this closed door is hard to pull with static GR1, causing it to fail
the task.
[Screencast from 12-02-2025 03:16:36
PM.webm](https://github.com/user-attachments/assets/da06de60-8f01-47e7-ae26-a48e08cb523f)

- Fix:  
a. Shorten task_episode_length_s to introduce more frequent resets once
the door is closed. Tradeoff is introducing more episodes.
b. Also tried with shorter `action_horizon` but ended up getting worse
SR. My hypothesis is that it's hard for VLA to tell from visuals/states
whether the door is closed to 0.2 (success) vs 0.21 (fail).

> 16 -- Metrics: {'success_rate': 0.605, 'door_moved_rate': 0.955,
'num_episodes': 200}
> 8 -- Metrics: {'success_rate': 0.225, 'door_moved_rate': 0.615,
'num_episodes': 200}
> 1 -- Metrics: {'success_rate': 0.0, 'door_moved_rate': 0.985,
'num_episodes': 200}

c. Switching to CPU PhyX does not solve above issues. So keep it on GPU
for faster parallelization (in theory).
 
### Loco manip
- Issue: After each reset, the left arm tends to have fast motions and
the box is tilted. Also observed significant penetration among fingers.
See 00:15 VS 0:30 for 5 parallel env closedloop in below video.
[Screencast from 11-25-2025 03:42:36
PM.webm](https://github.com/user-attachments/assets/c4934817-65fa-412f-a88c-af143d25d7c2)

- Fix: switch to CPU phyX, keep the policy on GPU
Arms open first and G1 starts moving, box is placed with expected pose. 
[Screencast from 12-02-2025 10:15:59
PM.webm](https://github.com/user-attachments/assets/4a02e6cd-7baf-441b-8c0f-7146051e5c9a)

### Minor fixes
Update doc on cmds & metrics.
## Summary
expose env spacing parameter
## Summary
Modify docs to show that this is manual annotation
## Summary
Add ground plane and light objects
## Summary
Users might want to modify env cfg components such as sim config. This
lets them do it.
## Summary
Move our CI infra to public runners

## Detailed description
- As part of our open-source release, we can no longer run on internal
infra.
- This MR moves our runners to public runners.
- I also took the chance to refactor and modularize the workflow file.
## Summary
Update link to the docs in README.md to the new public location.

## Detailed description
- Docs url has changed now that the repo is public.
## Summary
Fixes an issue that our tags requesting for public CI were incorrect.

## Detailed description
- Corrects the tag `[gpu]` -> `[self-hosted, gpu]`
## Summary
Revert to mapping the whole repo in the dev docker.

## Detailed description
- Previously we changed to mapping only specific folders in the repo.
- This was done for docker build speed (I believe?)
- The issue is that we want (even if occasionally) to work on all
folders in the repo, within the dev docker.
- This reverts to mapping the whole repo.
## Summary
Re-enables pre-commit in CI.

## Detailed description
- During the refactoring and switch to public CI, `pre-commit` was
broken.
- This MR fixes it.
## Summary
Language prompts are fetched from Task's data member, populated from
ArenaEnv creation.

## Detailed description
- `task_description` is automatically populated into atomic task, and
users have the freedom to overwrite it when instantiating the task class
- `Policy` sets its `task_description` attribute thru a setter func
- `Policy_runner.py` connects the `task_description` from Task to
`task_description` setter in `Policy`
- GR00T consumes description either thru `task_description` data member
or `policy_config`
## Summary
All example environments.py are repackaged into
isaaclab_arena_environments

## Reason
In prep for multi-task evaluation, as we may introduce more example
envs.
## Summary
Move the multi-versioned docs now that we have multiple version of Isaac
Lab Arena.

## Detailed description
- Means users can read the version of the docs that shipped the release
that they're using.

<img width="1130" height="625" alt="version_sidebar"
src="https://github.com/user-attachments/assets/b06372cd-9bed-4a1d-99b8-9480c279ebb4"
/>
## Summary
Tear down simulation app func could be useful both in core & tests. Prep
for it to be consumed.

## Detailed description
- Add USD `get_new_stage()` to jupyter notebook example, resolving issue
where USDs from previous run are not removed
- Add optional `suppress_exception `to let exception raised by default,
or ignored in tests
- Move this func to `isaaclab_utils`
## Summary
Fix git ownership issues in deployment pipeline.

## Detailed description
- Multi-version docs now require git during documentation build.
- This revealed git ownership issues in the page deployment pipeline
(previously seen in our pre-merge pipeline)
- This MR applies the same fix that's used in pre-merge.
## Summary
Update object library to use ISAAC_NUCLEUS_DIR prefix for YCB object usd
paths
## Summary
Refactor `mimic_env_cfg` building logic in `arena_env_builder`.


## Detailed description
- What was the reason for the change?
    - Originally we need to maintain a list of embodiment_names in each
task's MimicEnvCfg, given its single_arm or dual_arm. It is not
efficient and scalable.
- What has been changed?
    - creates a new enum class `MimicArmMode` to represent the arm mode for the mimic environment: can select from
["single_arm", "dual_arm", "left", "right"]
    - assigns a property of 'mimic_arm_mode' to embodiment_base
    - changes task.get_mimic_env_cfg() method's input from 'embodiment_name'
to 'mimic_arm_mode'
    - refactors the SubTaskConfigs configuration logic in each MimicEnvCfg
based on embodiment.mimic_arm_mode
- What is the impact of this change?
    - all existing embodiments and tasks with MimicEnvCfg.
## Summary
`RigidObjectSet` inherited from `Object` to enable users provide a list
of assets, and sim app spawn each `env_id` with one obj from this set.

## Detailed description
- Introduced `RigidObjectSet(Obejct)` class for handle rigid body object
set construction
- The order of each obj in the set to load in each env_id could be
configured as following func args order, or being random.
- Introduced `--object_set` in `kitchen_pick_and_place.py` cli to allow
spawning for each env_id
- Added tests for empty/single/multi object sets & checker each env_id's
usd is referenced in expected sequence

## TODO
- Pipe clean & verify other task-centered obj metrics/ attributes access
(Done in test)
- Introduce this concept in other sample envs & multi-task eval

## Note
- Naming to `set` instead of `collection` is to differentiate what
[`RigidBodyCollection`](https://github.com/isaac-sim/IsaacLab/blob/main/source/isaaclab/isaaclab/assets/rigid_object_collection/rigid_object_collection_data.py)
from IssacLab provides. In our use case, we need to spawn 1 obj from N
objs, where `RigidBodyCollection` API is to spawn all N objs for each
id.
- MultiAssetSpawnerCfg for articulated objs will be tricky (/buggy) as
PhyX APIs require it has the same joint prim path. It puts too much
constraints on what could be added into the set
-
[MultiAssetSpawnerCfg](https://github.com/isaac-sim/IsaacLab/blob/main/source/isaaclab/isaaclab/sim/spawners/wrappers/wrappers_cfg.py#L16)
for rigid objs require the same type of collision meshes, as written in
Lab's doc.

<img width="720" height="295" alt="image"
src="https://github.com/user-attachments/assets/71983e83-d586-427b-a1dd-3eb047be817f"
/>
## Summary
This PR adds initial support for composite sequential tasks via the
SequentialTaskBase class. The SequentialTaskBase class takes a list of
atomic tasks (TaskBase) and automatically assembles them into a
composite task with unified termination/event configs.

Adds:

1. SequentialTaskBase class
2. Test case to validate class methods
3. Test case with example task (sequential open door task) to validate
unified success check and events
4. Two new functions in `isaac_arena/utils/configclass.py` to perform
config transformation and duplicate checking
## Summary
Fixes a typo which caused a misnaming of EEFs in the Mimic Env Configs
of tasks. The name of the eefs was being set as an Enum instead of the
value of the Enum. This caused data generation to fail using our
existing datasets.
## Summary
Implement `OpenDoorTask` and `CloseDoorTask` inherited from
`RotateRevoluteJointTask`

## Detailed description
- Generalize the task to common articulated objects with revolute joint.
E.g. Cabinet door; Window panel (rotate outward or inward relative to
the fixed frame using a hinge); Scissor blade at the pivot pin.
- Open vs Close Door task differ in terminations & reset events, but
share the same underlying logics handling revolute joint.
- Add `is_closed` member function to `Openable` affordance, using
threshold to decide either open or close. Basically use it as a bi-state
object.

## TODO
- Test with other articulated objects, than the overly-used microwave
## Summary
Simplify device registry and add a retargeter registry
## Summary
Add warning to docs
## Summary
Adds a new embodiment of Agibot
Renames MimicArmMode to ArmMode, since arm_mode property is not for mimic_env_cfg building purpose only

## Detailed description
- What was the reason for the change?
     - Agibot is a widely applied mobile-base bimanual humanoid in China.
     - Renames MimicArmMode to ArmMode, since in AgibotCfg, 'arm_mode' property is not for mimic_env_cfg building purpose only, but for SceneCfg/ActionsCfg definition as well

- What has been changed?
    - Adds a new agibot.py
    - Renames MimicArmMode to ArmMode
## Summary
Add instruction to build multi docs
## Summary
Basic RL training workflow

---------

Co-authored-by: Alex Millane <amillane@nvidia.com>
Co-authored-by: peterd-NV <peterd@nvidia.com>
Co-authored-by: Xinjie Yao <xyao@nvidia.com>
## Summary
We allow addings spawn_cfg_addon and asset_cfg_addon. This exposes and gives possiblitly for a user to create any supported assets with various settings.
…#293)

## Summary
Adds new new affordance: **placeable** and new atomic task: **place
upright task**


## Detailed description
- What was the reason for the change?
    - Place upright task is a common inhouse task for humanoids. We believe
      '**placeable**' should be a basic affordance to support. So we add
      related atomic task and example to showcase how to use this affordance.
- What has been changed?
    - adds new atomic task: place upright task (and a test script for this task)
    - adss new affordance: placeable (placeable object can be placed upright, like mug, bottle, etc.)
    - adds new example: agibot left_arm place upright mug
    - adds a unitest for 'place_upright_task'
- What is the impact of this change?
    - fixed the bug in agibot.py: since SceneCfg and ActionsCfg cannot accept a new type of property like ArmMode in manager-based workflow


## Test Pipeline
1. Zero Action Policy:
`python isaaclab_arena/examples/policy_runner.py --policy_type
zero_action tabletop_place_upright --object mug`

2. Record demos:
`python isaaclab_arena/scripts/imitation_learning/record_demos.py
--dataset_file datasets/dataset_agibot_left_arm_rel.hdf5 --num_demos 1
tabletop_place_upright --teleop_device keyboard`

3. Replay demos:
`python isaaclab_arena/scripts/imitation_learning/replay_demos.py
--dataset_file datasets/dataset_agibot_left_arm_rel.hdf5
tabletop_place_upright`

4. Test place_upright_mug task:
`pytest isaaclab_arena/tests/test_place_upright_mug.py`
alexmillane and others added 21 commits April 2, 2026 18:41
## Summary
Move dependencies from Dockerfile to setup.py

## Detailed description
- This simplifies the installation process for external users because
now `pip install ./isaaclab_arena` will install arenas dependencies.
- Prior to this change, arena made assumptions about the environment was
was being installed in. These assumptions were met through our
dockerfile.
- For an external user this will make the installation process much
easier.
## Summary
Upgrade IsaacLab interop for Isaac Lab 3.0

## Detailed description
- In IsaacLab 3.0 kit is optional, whereas in Isaac Lab-Arena, it's
compulsory.
- We therefore add a step to our interop. callback to detect if kit has
been started and if not, start it.
- Update docs with new Lab 3.0 visualizer arguments
- Reactivate the RSL-RL test.
## Summary
[MR](#531) introduces a
torch conflict in the docker. Resolve it in this MR.

## Detailed description
- Deepspeed pulls in torch 2.11+cu13 as deps.
- After the fix, deepspeed's transitive deps (hjson, msgpack, psutil,
py-cpuinfo, ninja, etc.) are all resolved automatically by pip, and the
duplicate torch is cleaned up right after.
…#530)

## Summary
Prevent
[RslRlVecEnvWrapper](https://github.com/isaac-sim/IsaacLab/blob/main/source/isaaclab_rl/isaaclab_rl/rsl_rl/vecenv_wrapper.py#L66)
from calling a duplicate env.reset() during inference, which inflated
`num_episodes` by one and miscomputed success_rate.

## Detailed description
- What was wrong: `RslRlVecEnvWrapper.__init__ `unconditionally calls
`env.reset()` (intended for training, where the runner never resets).
During eval, the rollout loop in `policy_runner.py` already calls
`env.reset() `before the first step. When the RSL-RL policy is lazily
loaded on the first get_action call, the wrapper's second reset records
a phantom failed episode via the `RecorderManager`, producing
`num_episodes = N+1 * num_envs` and diluting `success_rate`.

- What was changed: Introduced `_RslRlInferenceEnvWrapper`, a subclass
of `RslRlVecEnvWrapper` that temporarily replaces env.reset with a no-op
during `__init__,` then restores it. `RslRlActionPolicy._load_policy`
now uses this wrapper instead of `RslRlVecEnvWrapper` directly. Avoid
modifying Lab core code

- Add a TODO to add the test case once the test_rsl_rl is enabled by
@alexmillane.
## Summary
Update the lift object RL example to have a high success rate model

## Detailed description
- What was the reason for the change?
Existing lift object RL training model success rate is low (~30%), and
the arm motion is unnatural.
- What has been changed?
Add a franka joint control embodiment for RL training to avoid the weird
arm motion from IK version
Update the observation to include joint and target poses only
Fix a bug in the base rls policy so the target pose (task_obs in
addition to policy) is passed to the actor/critic model
Fix a bug resulting ~0 success rate for parallel eval due to incorrect
object/target frame in the success term
Update RL docs with latest models and commands
- What is the impact of this change?
The RL model now gets 70~80% success rate within 1.5h

---------

Co-authored-by: Xinjie Yao <xyao@nvidia.com>
## Summary

- Add Fabio Ramos and Xuning Yang ([Robolab
authors](https://gitlab-master.nvidia.com/xuningy/robolab)) to the
Contributors section
- Fix alphabetical ordering of Hui Kang

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Co-authored-by: Xinjie Yao <xyao@nvidia.com>
## Summary
Fixes some issues that surfaced after upgrading to IsaacLab3:

- Add `--visualizer kit` to all `policy_runner.py` and `eval_runner.py`
commands in the docs

- Fix DROID robot spawning upside-down due to stale WXYZ quaternion
convention

- The viewer camera was stuck far away from the robot and any
`ViewerCfg` set in the environment had no effect and the view could not
be overridden: Fix viewer camera position not applied when using the
`kit` visualizer backend

---------

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Co-authored-by: Xinjie Yao <xyao@nvidia.com>
## Summary
Port the IsaacLab dexsuite_kuka_allegro_env_cfg.py example with Newton
to arena for evaluation

python isaaclab_arena/evaluation/policy_runner.py --policy_type rsl_rl
--checkpoint_path
/models/isaaclab_arena/dexsuite_kuka_allegro/model_14999.pt --visualizer
newton --num_steps 1000 dexsuite_lift

![dexsuite_newton](https://github.com/user-attachments/assets/83abf2dd-1d2f-4de7-bf4f-9575ebadd09d)

---------

Co-authored-by: Xinjie Yao <xyao@nvidia.com>
## Summary

Previously, non-anchor objects were initialized anywhere in a large
fixed box. Objects with `On(table)` constraints would start outside the
table, get pushed to the edge by the solver, and cluster at the
boundary. With this change they start distributed across the whole
surface.

- Replace fixed initialization box with per-object initialization guided
by `On` relations
- Objects with `On(parent)` now start uniformly within the parent's X/Y
footprint at the correct Z height, giving the solver a valid warm start
and producing more even surface coverage
- Remove `init_bounds` and `init_bounds_size` from `ObjectPlacerParams`
- Use explicit `torch.Generator` for seeding instead of global
`torch.manual_seed`, so placement seed does not pollute Isaac Sim's
global RNG state

---------

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
## Summary
Add a "Publishing Your Own Benchmark" section to the README Ecosystem
page with a three-step workflow (own repo, cite in papers, PR a link
here) using RoboTwin as the reference example.

Following @sangeetas-nv's suggestion in slack
[thread](https://nvidia.slack.com/archives/C097CP8FG67/p1775070050814859?thread_ts=1775068933.390519&cid=C097CP8FG67).
## Summary

Fixes documentation on how to run the zero_action policy.

- Argument order: `--num_steps` & `--distributed --headless`
- Missing required arguments:` --object `
- Updated the path to be wrt. to the root of the repo as we do in other
examples (eg isaaclab_arena/evaluation/policy_runner.py)

---------

Co-authored-by: Xinjie Yao <xyao@nvidia.com>
…518)

## Summary

- Adds a new **Getting Started** page (`arena_in_your_repo.rst`)
documenting the recommended pattern for consuming Arena as an unmodified
git submodule from an external project, inspired by and based on the
integration pattern used in
[nvblox_next](https://github.com/nvidia-isaac/nvblox_next/tree/e3d4fec646004956ac24ed3446dbb41c531d5908/datagen)

## Notes
- **Not yet tested** end-to-end — the examples reflect the observed
nvblox_next patterns but have not been validated in a fresh external
repo

---------

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: alex <amillane@nvidia.com>
…550)

## Summary
Adds documentation of more advanced external usage of arena - custom
tasks and embodiments.

## Detailed description
- Adds a new example external environment that introduces a custom task
and a custom environment.
- Documents this
- Adds a test that covers this new env.

Copy of #545

---------

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Co-authored-by: Clemens Volk <cvolk@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
#555)

## Summary
Remove outdated `--enable_pinnoccio` from our generation commands.

## Detailed description
- Addresses [6059248](https://nvbugspro.nvidia.com/bug/6059248)

Co-authored-by: Xinjie Yao <xyao@nvidia.com>
## Summary
`--visualizer` arg is deprecated in Lab 3.0 in favour of `--viz,` this
PR replaces all occurences of it.

Also fixes docs that instruct users to remove `--headless` flag to see
the output but didn't specify that `--viz kit` needed to be added.
## Summary
Fix https://nvbugspro.nvidia.com/bug/6058876,
https://nvbugspro.nvidia.com/bug/6060494 on non-optimal camera viewing
angles

## Detailed description
- Root cause: `_reapply_viewer_cfg(env)` only runs inside
`make_registered()` / `make_registered_and_return_cfg().` Any script
that calls `build_registered()` + `gym.make()` directly bypasses it, so
the viewer camera stays at Kit's default position instead of the
configured eye/lookat.
- Fix: Among `record/replay/generate/annotate_dataset,py` call
`reapply_viewer_cfg(env)` on the wrapped env (before `.unwrapped)` right
after `gym.make()`


<img width="864" height="529" alt="image"
src="https://github.com/user-attachments/assets/667a24df-e68f-4c1d-9668-adcd7746c443"
/>
from replay

Co-authored-by: peterd-NV <peterd@nvidia.com>
…r Task (#560)

## Summary
Fixes HF dataset version (v2.3 -> v3.0) in download command for step 1
of GR1 Open Microwave Door Task
## Summary
Switches `.. tabs::` for `.. tab-set::` in the docs to fix readability
issues in dark mode.

## Detailed description
- For some reason `tab-set` handles dark mode better.
- Addresses: [5727965](https://nvbugspro.nvidia.com/bug/5727965)

Before:
<img width="1350" height="1630" alt="image"
src="https://github.com/user-attachments/assets/1d25a745-2ef2-4105-8405-96e01e8b60c8"
/>

After:
<img width="830" height="799" alt="image"
src="https://github.com/user-attachments/assets/4dc6652d-b2cf-4b50-9fea-490b75b4498b"
/>
## Summary

The G1 whole-body controller currently only supports the Homie V2
lower-body
policy. This MR adds the WBC-AGILE end-to-end velocity policy as an
alternative
lower-body controller for the G1 robot, and wires it into the
environment so it
can be used via a new `g1_wbc_agile_joint` embodiment.

## Changes

### AGILE policy implementation
- Add `G1AgilePolicy` ONNX-based end-to-end velocity policy
(`g1_agile_policy.py`)
- Add `AgileConfig` dataclass and `g1_agile.yaml` joint ordering / model
I/O config
- Register `"agile"` variant in `wbc_policy_factory.py`

### Model download
- Add `docker/setup/download_wbc_models.sh` to download and verify the
AGILE ONNX
model at Docker build time (SHA256 checked), removing the need for
runtime download

### Environment integration
- Add `AgileConfig` branch in `G1DecoupledWBCJointAction` so
`wbc_version="agile"` is
  accepted by the action term
- Register `G1WBCAgileJointEmbodiment` (`g1_wbc_agile_joint`) — mirrors
  `g1_wbc_joint` but uses the AGILE lower-body policy

### Tests
- Add unit tests and stability tests for `G1AgilePolicy` in
  `test_g1_agile_policy.py`

## Results

Run the AGILE policy with the G1 robot:

```bash
/isaac-sim/python.sh isaaclab_arena/evaluation/policy_runner.py \
  --policy_type zero_action \
  --num_envs 1 \
  --num_steps 1000 \
  --enable_cameras \
  --visualizer kit \
  galileo_g1_locomanip_pick_and_place \
  --embodiment g1_wbc_agile_joint
```

## Test plan
- [x] `pytest
isaaclab_arena_g1/g1_whole_body_controller/wbc_policy/tests/test_g1_agile_policy.py`
— unit and stability tests pass
- [x] Run `policy_runner.py` with `--embodiment g1_wbc_agile_joint` —
environment loads and steps without errors
- [x] Run `policy_runner.py` with `--embodiment g1_wbc_joint` — existing
Homie V2 path is unaffected

---------

Signed-off-by: Lionel Gulich <lgulich@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Deletes `isaaclab_arena/reinforcement_learning/`: A module that
existed solely for an `RLFramework` enum whose only method produced
strings like `"rsl_rl_cfg_entry_point"`
- Replaces `rl_framework: RLFramework` with `rl_framework_entry_point:
str` on `IsaacLabArenaEnvironment` and let the user pass in the correct
string directly

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
@seun-doherty seun-doherty requested a review from alexmillane April 9, 2026 23:24
NIST peg-insert gear assembly environment: custom OSC action term,
keypoint squashing rewards, IK gripper reset, and domain randomization.
Includes RL-Games PPO config, train/play scripts, and policy runner wrapper.
Isaac Lab 3.0 changed from wxyz to xyzw quaternion ordering. This
caused the robot to spawn upside down, leading to IK solver failure,
NaN observations, and training crashes.

Key fixes:
- Robot init rotation: (1,0,0,0) -> (0,0,0,0,1) in robot_configs.py
- Grasp rotation offset: wxyz -> xyzw in environment config
- Quaternion canonicalization: check w at index 3 (not 0) everywhere
- Replace torch_utils (wxyz) with math_utils (xyzw) in OSC action
- Wrap all warp arrays with wp.to_torch() for PyTorch compatibility
- Update deprecated IsaacLab API calls to _index variants
- Increase gpu_collision_stack_size to 4 GB for contact-heavy scenes
- Consolidate 3 observation files into single gear_insertion_observations.py
- Replace 4 custom obs functions with Isaac Lab built-ins (root_pos_w, root_quat_w)
- Slim NistGearInsertionTask constructor (40 → 17 params) via GraspConfig dataclass
- Hardcode reward weights in configclass instead of passing through constructor
- Delete bespoke play_rl_games.py; use generic policy_runner.py for evaluation
- Genericise RlGamesActionPolicy (remove NIST-specific defaults)
- Move RL-Games YAML config to isaaclab_arena_examples/policy/
- Clean up mdp/__init__.py re-exports
- Add merge readiness report and NIST vs Lift comparison doc
- Keep success term registered (returns all-False during training) so
  SuccessRateMetric can query it, matching Lift task pattern
- Loosen success_z_fraction from 0.05 to 0.15 (3mm depth threshold)
- Add new NIST asset definitions to object_library
Add step-by-step documentation for the NIST gear insertion RL workflow
(environment setup, policy training, evaluation) mirroring the existing
Franka lift task pages. Include task overview GIFs, register the new
workflow in the RL workflows index, and clamp NaN/inf values in
force-torque observations to prevent training instabilities.
@seun-doherty seun-doherty force-pushed the seun/nist_gear_insertion_task_example branch from 0bd15bd to 93e071e Compare April 9, 2026 23:27
@isaaclab-review-bot
Copy link
Copy Markdown

🤖 IsaacLab Review Bot — PR #566

Note: This PR was closed and reopened as #567. Posting review here for the record; the same findings apply to #567.

Overview

Large PR adding a complete NIST gear insertion RL workflow: task definition, keypoint-squashing rewards, 24-D policy observations with wrist-force feedback, OSC action term (asset-relative, EMA smoothing, dead-zones), Franka mimic OSC robot config, RL Games policy wrapper, training script, YAML config, documentation, and asset registry entries. 541 files changed — this includes many changes unrelated to the gear insertion feature (CI, docs restructuring, new embodiments, etc.), which makes isolated review of just the gear insertion feature difficult.


Findings

1. 🔴 Critical: _pred_scale is global state — breaks multi-env semantics

File: isaaclab_arena/tasks/rewards/gear_insertion_rewards.pysuccess_prediction_error.__call__

if true_success.float().mean() >= self._delay_until_ratio:
    self._pred_scale = 1.0

_pred_scale is a scalar that applies to all environments but is flipped to 1.0 based on mean() across all envs. Once flipped, it never resets to 0.0 — not on episode reset, not on environment reset. This means:

  • Early episodes where delay_until_ratio hasn't been reached will correctly suppress the penalty
  • But once any batch of environments triggers it, all future episodes (even newly reset ones) will be penalized from step 0

Suggestion: Track per-env episode progress or make _pred_scale per-env and reset it in a reset() method.


2. 🟡 Warning: Repeated tensor allocation in hot paths

File: isaaclab_arena/tasks/rewards/gear_insertion_rewards.py_check_gear_position

held_off = (
    torch.tensor(held_gear_base_offset, device=env.device, ...).unsqueeze(0).expand(...)
)
offset = torch.tensor(peg_offset, device=env.device, ...).unsqueeze(0).expand(...)

_check_gear_position is called by gear_insertion_engagement_bonus, gear_insertion_success_bonus, and indirectly by success_prediction_error — all every step. Each call creates new tensors from Python lists. For thousands of envs at 60+ Hz this creates GC pressure.

Suggestion: Pre-compute these tensors once (e.g., in __init__ of the calling classes) and reuse them, similar to how gear_peg_keypoint_squashing already caches self.peg_offset and self.held_gear_base_offset.


3. 🟡 Warning: Observation class directly accesses env.action_manager._terms["arm_action"] private API

File: isaaclab_arena/tasks/observations/gear_insertion_observations.pyNistGearInsertionPolicyObservations.__call__

arm_osc_action = env.action_manager._terms["arm_action"]

This accesses the private _terms dict of the action manager. If the action term name changes or the API evolves, this will silently break. The same pattern appears in multiple reward classes in gear_insertion_rewards.py.

Suggestion: Consider adding a public accessor on the action manager or passing the action term via config params rather than reaching into internals. At minimum, add a clear error message if the key isn't found.


4. 🟡 Warning: Quaternion canonicalization zeros out z/w components

File: isaaclab_arena/tasks/observations/gear_insertion_observations.pyNistGearInsertionPolicyObservations.__call__

noisy_quat[:, [2, 3]] = 0.0
noisy_quat = noisy_quat * self._flip_quats.unsqueeze(-1)

Zeroing the z and w components of a quaternion after rotation noise application, then multiplying by a random sign, produces a non-unit quaternion that represents a projection rather than a valid rotation. If this is intentional (e.g., only tracking x,y for a 2-DOF orientation), it should be documented. Otherwise, this silently corrupts the rotation information fed to the policy.


5. 🟡 Warning: No tests for the new NIST gear insertion task

Despite this being a complete new task workflow, there are no test files specific to the gear insertion task (no test_nist_gear_insertion*.py). The PR adds many other tests (test_assembly_task, test_sorting_task, etc.) but the core new feature — the gear insertion task, its rewards, observations, and OSC action — has no test coverage.

Suggestion: Add at least:

  • Unit tests for reward functions (especially the squashing-fn keypoint rewards with known geometry)
  • Unit tests for _check_gear_position with controlled inputs
  • Integration test that the environment can be created and stepped

6. 💡 Nit: body_quat_canonical sign convention may produce discontinuities

File: isaaclab_arena/tasks/observations/gear_insertion_observations.py

sign = torch.where(quat[:, 3:4] < 0, -1.0, 1.0)
return quat * sign

This canonicalizes by flipping so w >= 0. But for orientations near w ≈ 0, small noise flips the sign, creating observation discontinuities. This is a known issue with quaternion canonicalization and may hurt LSTM-based policies. Consider using the make_quat_unique utility from Isaac Lab which handles this more robustly.


7. 💡 Nit: gear_peg_height default unused in environment

The NistGearInsertionTask constructor defaults gear_peg_height=0.02, but the environment never passes this parameter — it uses the default. Meanwhile, success_z_fraction=0.20 is explicitly set. The actual success threshold is 0.02 * 0.20 = 0.004m (4mm Z tolerance). This should be documented since it's a critical tuning parameter that's easy to miss.


8. 💡 Nit: Duplicate copyright headers

File: isaaclab_arena/scripts/reinforcement_learning/train_rl_games.py

# Copyright (c) 2026, The Isaac Lab Arena Project Developers ...
# Copyright (c) 2025-2026, The Isaac Lab Arena Project Developers.

Two copyright headers at the top — one with 2026 only and one with 2025-2026.


Summary

The gear insertion task is well-structured and follows established patterns (Factory-style keypoint rewards, OSC action space, assembly peg-insert benchmarks). The multi-file decomposition (task / observations / rewards / action / environment) is clean and consistent with the existing Arena architecture.

Main concerns: The _pred_scale global state bug (Finding #1) could cause training instabilities. The lack of tests for the core new feature (Finding #5) is a significant gap given the complexity of the reward/observation logic. The hot-path tensor allocations (Finding #2) will hurt training throughput at scale.

The PR's scope is very large (541 files) — consider breaking unrelated changes (CI, docs restructuring, new embodiments) into separate PRs for easier review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.