Skip to content

Seun/nist gear insertion task example#567

Open
seun-doherty wants to merge 5 commits intomainfrom
seun/nist_gear_insertion_task_example
Open

Seun/nist gear insertion task example#567
seun-doherty wants to merge 5 commits intomainfrom
seun/nist_gear_insertion_task_example

Conversation

@seun-doherty
Copy link
Copy Markdown
Collaborator

@seun-doherty seun-doherty commented Apr 9, 2026

Summary

  • Add a complete NIST gear insertion RL workflow: a Franka Panda robot inserts a medium gear onto a peg on the NIST assembly board using operational-space control and RL Games PPO
  • Include step-by-step documentation (environment setup, policy training, evaluation) mirroring the existing Franka lift task pages
  • Add RL Games policy wrapper, training script, and YAML config for LSTM-based asymmetric actor-critic

What's Included

Core task (isaaclab_arena/tasks/): task definition, keypoint-squashing rewards, 24-D policy observations with wrist-force feedback, insertion success / gear-drop terminations, domain randomization events

Environment (isaaclab_arena_environments/): OSC action term (asset-relative, EMA smoothing, dead-zones), Franka mimic OSC robot config, environment definition wiring scene + embodiment + task + RL Games

RL infrastructure (isaaclab_arena/policy/, scripts/): RL Games action policy wrapper, training script, PPO hyperparameter config

Documentation (docs/pages/example_workflows/nist_gear_insertion/): 4-page workflow (overview, environment setup, training, evaluation) with GIFs

Asset registry (isaaclab_arena/assets/object_library.py): NIST board, gear, peg, and connector asset definitions

NOTE: Object library paths for added assets will be updated when assets are uploaded to nucleus

NIST peg-insert gear assembly environment: custom OSC action term,
keypoint squashing rewards, IK gripper reset, and domain randomization.
Includes RL-Games PPO config, train/play scripts, and policy runner wrapper.
Isaac Lab 3.0 changed from wxyz to xyzw quaternion ordering. This
caused the robot to spawn upside down, leading to IK solver failure,
NaN observations, and training crashes.

Key fixes:
- Robot init rotation: (1,0,0,0) -> (0,0,0,0,1) in robot_configs.py
- Grasp rotation offset: wxyz -> xyzw in environment config
- Quaternion canonicalization: check w at index 3 (not 0) everywhere
- Replace torch_utils (wxyz) with math_utils (xyzw) in OSC action
- Wrap all warp arrays with wp.to_torch() for PyTorch compatibility
- Update deprecated IsaacLab API calls to _index variants
- Increase gpu_collision_stack_size to 4 GB for contact-heavy scenes
- Consolidate 3 observation files into single gear_insertion_observations.py
- Replace 4 custom obs functions with Isaac Lab built-ins (root_pos_w, root_quat_w)
- Slim NistGearInsertionTask constructor (40 → 17 params) via GraspConfig dataclass
- Hardcode reward weights in configclass instead of passing through constructor
- Delete bespoke play_rl_games.py; use generic policy_runner.py for evaluation
- Genericise RlGamesActionPolicy (remove NIST-specific defaults)
- Move RL-Games YAML config to isaaclab_arena_examples/policy/
- Clean up mdp/__init__.py re-exports
- Add merge readiness report and NIST vs Lift comparison doc
- Keep success term registered (returns all-False during training) so
  SuccessRateMetric can query it, matching Lift task pattern
- Loosen success_z_fraction from 0.05 to 0.15 (3mm depth threshold)
- Add new NIST asset definitions to object_library
Add step-by-step documentation for the NIST gear insertion RL workflow
(environment setup, policy training, evaluation) mirroring the existing
Franka lift task pages. Include task overview GIFs, register the new
workflow in the RL workflows index, and clamp NaN/inf values in
force-torque observations to prevent training instabilities.
@seun-doherty seun-doherty requested a review from alexmillane April 9, 2026 23:37
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 9, 2026

Greptile Summary

This PR adds a complete NIST gear insertion RL workflow: task definition, keypoint-squashing rewards, a 24-D OSC policy observation with wrist-force feedback, RL Games PPO training infrastructure, and 4-page documentation.

  • P1 — Corrupted ee_angvel in policy observation: NistGearInsertionPolicyObservations.__call__ zeroes indices [2, 3] of noisy_quat (z and w in xyzw format), producing a non-unit quaternion that is then passed to quat_mul / axis_angle_from_quat to estimate ee_angvel. Unit quaternions are required; the angular velocity component of the 24-D observation will be numerically wrong, and is identically zero at the identity pose.
  • P1 — Spurious angular velocity spike on reset: reset() stores the full unit quaternion in self._prev_noisy_quat, while __call__ then stores the zeroed non-unit version. The first step of every episode computes rot_diff from these mismatched representations, injecting a large fictitious angular velocity at episode boundaries.

Confidence Score: 4/5

The core architecture is solid but two P1 bugs in the 24-D policy observation class should be fixed before merging to ensure training produces correct signals.

Two confirmed P1 bugs exist in NistGearInsertionPolicyObservations: ee_angvel is computed from a non-unit (zeroed) quaternion, and a unit/non-unit mismatch on the first step after reset injects a spurious angular velocity spike every episode. These directly corrupt a component of the 24-D observation fed to the policy during training. The remaining findings are P2. The overall infrastructure — OSC action term, reward shaping, RL Games wrapper, environment wiring — is well-structured.

isaaclab_arena/tasks/observations/gear_insertion_observations.py (both P1 bugs are here, in call and reset methods)

Important Files Changed

Filename Overview
isaaclab_arena/tasks/observations/gear_insertion_observations.py Adds gear-peg observation primitives and a 24-D policy observation class; contains two bugs: non-unit quaternions from zeroing indices [2,3] corrupt the ee_angvel estimate, and a mismatch between the full quaternion stored on reset and the zeroed quaternion written each step causes spurious angular velocity spikes at episode boundaries.
isaaclab_arena/tasks/rewards/gear_insertion_rewards.py Implements keypoint-squashing rewards, engagement/success bonuses, OSC action penalties, and a success-prediction error term; the latter has a one-way curriculum latch (_pred_scale never resets) that could interfere with recovery from policy regression.
isaaclab_arena/tasks/nist_gear_insertion_task.py Well-structured task class composing scene, observation, reward, termination, and event configs; the domain-randomization event configs pass arm_joint regex in a list (joint_names=[arm_joints]) which may silently match zero joints.
isaaclab_arena/tasks/terminations.py Adds gear_mesh_insertion_success and gear_dropped_from_gripper terminations using correct geometry; gear_orientation_exceeded is effectively dead code because env._initial_gear_ee_relative_quat is never populated.
isaaclab_arena_environments/mdp/nist_gear_insertion_osc_action.py Comprehensive 7-D OSC action term with EMA smoothing, dead zones, per-episode gain randomization, and yaw wrapping to avoid Franka joint-limit discontinuities; logic appears sound.
isaaclab_arena/policy/rl_games_action_policy.py Clean RL-Games policy wrapper handling checkpoint loading, LSTM state management, and observation processing; integrates correctly with the Arena policy runner pattern.
isaaclab_arena/assets/object_library.py Registers seven new NIST asset classes; all USD paths point to local _REPO_ROOT/assets/*.usd files that are not committed, so the environment will fail at sim startup until assets are uploaded and paths updated.
isaaclab_arena_environments/nist_assembled_gearmesh_osc_environment.py Wires together scene, embodiment, OSC action, policy observations, and task configs; correctly uses peg_tip_offset vs peg_base_offset for obs vs success criteria; overall clean integration.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Policy: 7-D action output] --> B[NistGearInsertionOscAction\nEMA smoothing, dead zones, yaw wrap]
    B --> C[OSC Controller\npose_abs, fixed impedance]
    C --> D[Franka Mimic Robot\npanda_fingertip_centered]
    D --> E[Physics Sim\n60Hz, 192 solver iters]
    E --> F[NistGearInsertionPolicyObservations\n24-D policy obs]
    F --> |fingertip_pos_rel 3| G[Policy Input]
    F --> |noisy_quat 4 - zeroed z,w| G
    F --> |ee_linvel 3| G
    F --> |ee_angvel 3 computed from non-unit quat| G
    F --> |ft_force 3| G
    F --> |force_threshold 1| G
    F --> |prev_actions 7| G
    G --> A
    E --> H[Reward Manager]
    H --> H1[kp_baseline/coarse/fine\nkeypoint squashing]
    H --> H2[engagement_bonus]
    H --> H3[success_bonus]
    H --> H4[osc_action_magnitude_penalty]
    H --> H5[success_prediction_error\none-way latch]
    E --> I[Termination Manager]
    I --> I1[gear_mesh_insertion_success]
    I --> I2[gear_dropped_from_gripper]
    I --> I3[time_out]
Loading

Reviews (1): Last reviewed commit: "Add NIST gear insertion docs, assets, an..." | Re-trigger Greptile

Comment on lines +227 to +251
noisy_quat = math_utils.quat_mul(
ft_quat,
math_utils.quat_from_angle_axis(rot_noise_angle, rot_noise_axis),
)
noisy_quat[:, [2, 3]] = 0.0
noisy_quat = noisy_quat * self._flip_quats.unsqueeze(-1)

arm_osc_action = env.action_manager._terms["arm_action"]
board_pos = wp.to_torch(board.data.root_pos_w) - origins
board_quat = wp.to_torch(board.data.root_quat_w)
peg_offset_exp = self._peg_offset.unsqueeze(0).expand(n, 3)
peg_pos = board_pos + math_utils.quat_apply(board_quat, peg_offset_exp)
noisy_fixed_pos = peg_pos + arm_osc_action.fixed_pos_noise

fingertip_pos_rel = noisy_pos - noisy_fixed_pos

safe_dt = max(dt, 1e-6)
ee_linvel = (noisy_pos - self._prev_noisy_pos) / safe_dt
self._prev_noisy_pos[:] = noisy_pos

rot_diff = math_utils.quat_mul(noisy_quat, math_utils.quat_conjugate(self._prev_noisy_quat))
rot_diff = rot_diff * torch.sign(rot_diff[:, 3]).unsqueeze(-1)
ee_angvel = axis_angle_from_quat(rot_diff) / safe_dt
ee_angvel[:, 0:2] = 0.0
self._prev_noisy_quat[:] = noisy_quat
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Non-unit quaternion corrupts ee_angvel computation

noisy_quat[:, [2, 3]] = 0.0 zeroes the z-imaginary and w-real components of the xyzw quaternion, producing [x, y, 0, 0] whose norm is sqrt(x²+y²) ≠ 1 (and is exactly 0 at the identity rotation where x=y=0). This non-unit quaternion is then fed directly into quat_mul / axis_angle_from_quat to estimate ee_angvel, which requires unit quaternions. The angular velocity signal included in the 24-D policy observation will be numerically incorrect — and zero at the identity pose — which can silently degrade training.

The ee_angvel should be computed from the normalized quaternions before the zeroing step. The zeroed representation is only needed for the neural-network input. For example:

# 1. Compute angular velocity from unit quaternions first
rot_diff_full = math_utils.quat_mul(ft_quat, math_utils.quat_conjugate(self._prev_noisy_quat_full))
rot_diff_full = rot_diff_full * torch.sign(rot_diff_full[:, 3]).unsqueeze(-1)
ee_angvel = axis_angle_from_quat(rot_diff_full) / safe_dt

# 2. Then zero z/w for the policy observation representation
noisy_quat_for_obs = noisy_quat.clone()
noisy_quat_for_obs[:, [2, 3]] = 0.0
noisy_quat_for_obs = noisy_quat_for_obs * self._flip_quats.unsqueeze(-1)
self._prev_noisy_quat[:] = ft_quat  # track full unit quaternion

Comment on lines +185 to +195
flip[torch.rand(n, device=dev) > 0.5] = -1.0
self._flip_quats[env_ids] = flip

self._ensure_body_indices()
robot: Articulation = self._env.scene[self._robot_name]
origins = self._env.scene.env_origins
self._prev_noisy_pos[env_ids] = (
wp.to_torch(robot.data.body_pos_w)[env_ids, self._fingertip_idx] - origins[env_ids]
)
self._prev_noisy_quat[env_ids] = wp.to_torch(robot.data.body_quat_w)[env_ids, self._fingertip_idx]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 First reset mismatches unit vs. zeroed quaternion

In reset(), self._prev_noisy_quat[env_ids] is stored as the raw body quaternion (full unit quaternion). But in __call__, noisy_quat is zeroed at indices [2, 3] and then saved to self._prev_noisy_quat. So the very first step after every episode reset computes rot_diff from a full unit quaternion (prev) vs. a zeroed non-unit quaternion (current), causing a large spurious angular velocity spike at step 0 of each episode. The reset path should store the same representation that __call__ will write, or the angular velocity should be zeroed at the reset step.

Comment on lines +265 to +275
``gear_insertion_success_bonus`` / ``gear_mesh_insertion_success``), not the gear
rigid-body root, consistent with common assembly peg-insert benchmarks.
"""

def __init__(self, cfg: RewardTermCfg, env: ManagerBasedRLEnv):
super().__init__(cfg, env)
self._pred_scale = 0.0
self._delay_until_ratio: float = cfg.params.get("delay_until_ratio", 0.25)
self._gear_cfg: SceneEntityCfg = cfg.params["gear_cfg"]
self._board_cfg: SceneEntityCfg = cfg.params["board_cfg"]
hgo = cfg.params.get("held_gear_base_offset", [2.025e-2, 0.0, 0.0])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 _pred_scale is never reset — curriculum latches permanently

self._pred_scale starts at 0.0 and is set to 1.0 once true_success.float().mean() >= self._delay_until_ratio. Because it is an instance-level float with no reset path, once activated it stays at 1.0 for the remainder of training — even if the policy later regresses below the threshold. Whether this irreversibility is intentional (one-way curriculum gate) should be confirmed; if performance can degrade after the switch, the penalty will still apply and may prevent recovery.

Comment on lines +253 to +264
"asset_cfg": SceneEntityCfg("robot", joint_names=[arm_joints]),
"stiffness_distribution_params": (0.75, 1.5),
"damping_distribution_params": (0.3, 3.0),
"operation": "scale",
"distribution": "log_uniform",
},
)
cfg.robot_joint_friction = EventTermCfg(
func=mdp_isaac_lab.randomize_joint_parameters,
mode="reset",
params={
"asset_cfg": SceneEntityCfg("robot", joint_names=[arm_joints]),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 joint_names=[arm_joints] — regex string wrapped in a list

arm_joints is a regex string ("panda_joint.*" or "panda_joint[1-7]"), but it is passed as joint_names=[arm_joints] — a list containing one element. Whether SceneEntityCfg interprets each list element as a regex pattern or as a literal name depends on the Isaac Lab version. If treated as a literal name, it will match zero joints and silently produce no-ops for actuator-gain and joint-friction randomization. The same pattern appears at line 264. Verify the expected API; if a single regex is correct, pass it directly:

"asset_cfg": SceneEntityCfg("robot", joint_names=arm_joints),

Comment on lines +266 to +290
held_gear_base_offset: list[float] | None = None,
gear_peg_height: float = 0.02,
success_z_fraction: float = 0.80,
xy_threshold: float = 0.0025,
rl_training: bool = False,
) -> torch.Tensor:
"""Terminate when the gear is inserted onto the peg to the required depth.

Checks that the held gear's base (root + held_gear_base_offset in gear frame)
is centered on the peg (XY) and lowered past a fraction of the peg height (Z).
Peg position is fixed_asset pose + gear_base_offset in the fixed asset's local frame.

When ``rl_training`` is True, always returns False (no early termination)
but the term stays registered so that ``SuccessRateMetric`` can query it.
"""
if rl_training:
return torch.zeros(env.num_envs, dtype=torch.bool, device=env.device)
held_object: RigidObject = env.scene[held_object_cfg.name]
fixed_object: RigidObject = env.scene[fixed_object_cfg.name]

held_root = wp.to_torch(held_object.data.root_pos_w) - env.scene.env_origins
held_quat = wp.to_torch(held_object.data.root_quat_w)
h_offset = held_gear_base_offset if held_gear_base_offset is not None else gear_base_offset
held_off = torch.tensor(h_offset, device=env.device, dtype=torch.float32).unsqueeze(0).expand(env.num_envs, 3)
held_base_pos = held_root + math_utils.quat_apply(held_quat, held_off)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 gear_orientation_exceeded reads an attribute that is never set

env._initial_gear_ee_relative_quat is accessed inside the function body, but no event, task, or reset logic in this PR ever sets this attribute on the env. The hasattr guard makes the function silently return all-False (never triggers), so it is effectively dead code. If this termination is intended for future use, a TODO comment would clarify that; if it should be active now, the attribute needs to be populated — e.g., in the place_gear_in_gripper event.


name = "gears_and_base"
tags = ["object"]
usd_path = str(_REPO_ROOT / "assets" / "gearbase_and_gears_gearbase_root.usd")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Asset USD paths reference local files that don't exist in the repo

All new NIST asset classes point to _REPO_ROOT / "assets" / "*.usd" paths that are not committed. The PR description notes these will be updated when assets are uploaded to Nucleus, but loading any of these asset classes will raise a file-not-found error at sim startup. Consider adding a clear NOTE comment on each class to make the stub status explicit.

Copy link
Copy Markdown

@kellyguo11 kellyguo11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #567 Review: NIST Gear Insertion Task with RL-Games Training Pipeline

Overall Assessment

This is a substantial and well-structured PR that adds a complete RL workflow for contact-rich gear insertion on the NIST assembly board. The architecture follows existing Arena patterns (lift task), the documentation is thorough, and the commit history shows thoughtful iteration (quaternion migration, refactoring, observation hardening). Good work overall.

That said, there are several issues worth addressing before merge — ranging from correctness concerns to code quality and maintainability.


🔴 Issues (Should Fix)

1. Duplicate copyright headers in train_rl_games.py
Lines 1-5 and 7-8 both have copyright/SPDX headers. Remove the duplicate.

2. gpu_collision_stack_size=2**32 - 1 in env_callbacks.py is 4 GB — this affects ALL assembly environments, not just this task
This is set in the shared assembly_env_cfg_callback. Other tasks that import this callback will now request 4 GB collision stack. Consider:

  • Making it configurable per-environment, or
  • At minimum, adding a comment explaining why this value is needed (contact-heavy gear mesh scenes), so future maintainers know whether they can tune it down.

3. Mutable default arguments in function signatures
Multiple functions use mutable list defaults (peg_offset: list[float] = [0.0, 0.0, 0.0]). This is a well-known Python anti-pattern. Examples:

  • gear_insertion_rewards.py: gear_insertion_engagement_bonus, gear_insertion_success_bonus, _check_gear_position
  • gear_insertion_observations.py: peg_pos_in_env_frame, peg_delta_from_held_gear_base, held_gear_base_pos_in_env_frame
  • terminations.py: gear_mesh_insertion_success, gear_dropped_from_gripper

While these are typically called by the manager framework (which passes explicit args), it's still a latent bug. Use tuple or None with a default factory pattern.

4. NistAssembledBoard vs NistBoardAssembled in object_library.py
Two very similarly named asset classes with different name fields ("nist_assembled_board" vs "nist_board_assembled"). The environment only uses "nist_board_assembled". Is NistAssembledBoard actually needed? If not, remove it to avoid confusion. If both are needed, add docstrings explaining the distinction.

5. Missing __init__.py for observations and rewards subdirectories
isaaclab_arena/tasks/observations/gear_insertion_observations.py and isaaclab_arena/tasks/rewards/gear_insertion_rewards.py are new files in subdirectories. Are there __init__.py files already? The diff doesn't show them being created, but imports in nist_gear_insertion_task.py use direct module paths, so this may be fine if the parent __init__.py already exists. Worth verifying.

6. NistGearInsertionPolicyObservations accesses private action_manager._terms
In gear_insertion_observations.py line ~230: arm_osc_action = env.action_manager._terms["arm_action"]. Accessing _terms (private dict) is fragile. If there's a public API for this (like env.action_manager.get_term("arm_action")), use it. Same pattern appears in gear_insertion_rewards.py (osc_action_magnitude_penalty, osc_action_delta_penalty, wrist_contact_force_penalty, success_prediction_error).

7. force_torque_at_body uses robot.root_view.get_link_incoming_joint_force() — undocumented API
get_link_incoming_joint_force() is a PhysX-level call via root_view. This couples the code to specific Isaac Sim internals. Same pattern in NistGearInsertionPolicyObservations.__call__. Add a comment explaining this dependency.


🟡 Suggestions (Nice to Have)

8. Line length violations
Several lines exceed reasonable width (>120 chars), making diffs harder to review:

  • gear_insertion_observations.py line 60 (held_gear_base_pos_in_env_frame)
  • terminations.py line ~298 (tensor creation in gear_mesh_insertion_success)
  • gear_insertion_rewards.py line ~143

9. Duplicated geometry checks across rewards, terminations, and observations
_check_gear_position (rewards), gear_mesh_insertion_success (terminations), and the success prediction logic all compute the same peg-vs-held-gear-base geometry independently. Consider extracting a shared utility function to reduce drift risk.

10. grasp_rot_offset default [1.0, 0.0, 0.0, 0.0] in GraspConfig — this is xyzw format with w=0
[1.0, 0.0, 0.0, 0.0] — if this is xyzw convention, then w=0 which is not a valid quaternion. The environment overrides it with [1.0, 0.0, 0.0, 0.0] too. Given the Isaac Lab 3.0 migration to xyzw, this looks like it should be [0.0, 0.0, 0.0, 1.0] (identity in xyzw). The commit message says wxyz→xyzw migration was done, but the default in the dataclass may have been missed. Verify this is intentional — the environment always overrides it, but the default is misleading/incorrect.

11. NistGearSmall and NistGearLarge are registered but never used
These asset classes exist in object_library.py but nothing in this PR references them. If they're for future use, add a brief comment. Otherwise, defer adding them until they're needed.

12. concate_obs (typo?) in RL Games YAML and wrapper
concate_obs_groups appears in the YAML config — is this intentionally concate (not concatenate)? This follows the existing RL Games wrapper naming, but worth noting.

13. RL Games YAML: player.deterministic: False
The default player config has deterministic: False. The RlGamesActionPolicy class defaults to deterministic: True. This inconsistency could cause confusion when someone evaluates using the raw RL Games player vs. the Arena policy runner.

14. Consider adding type hints to _KeypointDistanceComputer.compute
The class works well but return types and param types aren't annotated, which would help maintainability.

15. place_gear_in_gripper iterates gripper_joint_setter_func per-row
The loop for row_idx in range(n): self.gripper_joint_setter_func(...) is called twice (open then close). This is O(n) Python loops over environments. Since the setter just does joint_pos[row_indices, jid] = width / 2.0, this could be vectorized to a single tensor operation.


✅ What's Good

  • Clean separation of task/environment/policy following Arena conventions
  • Documentation mirrors the existing Franka lift workflow — consistent UX
  • Observation hardening (nan_to_num, force clamping) is a smart addition for contact-rich tasks
  • Commit history is well-structured: each commit has a clear purpose
  • Domain randomization is comprehensive (friction, mass, actuator gains, joint friction, yaw variation)
  • The GraspConfig dataclass is a good refactor — keeps the task constructor focused

Summary

The PR is in good shape architecturally. The main items to address are:

  1. The gpu_collision_stack_size change scope (affects all assembly envs)
  2. The quaternion default in GraspConfig (verify xyzw correctness)
  3. Mutable default args (easy fix, use tuples)
  4. Clean up the duplicate asset class (NistAssembledBoard vs NistBoardAssembled)
  5. Duplicate copyright header in training script

The rest are quality-of-life improvements that can be addressed in follow-ups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants