Skip to content

[WIP] Controller refactor: Extract orchestration FSM out of BasePolicy#117

Open
tomasz-lewicki wants to merge 9 commits into
mainfrom
dev/tomasz/controller
Open

[WIP] Controller refactor: Extract orchestration FSM out of BasePolicy#117
tomasz-lewicki wants to merge 9 commits into
mainfrom
dev/tomasz/controller

Conversation

@tomasz-lewicki
Copy link
Copy Markdown
Contributor

@tomasz-lewicki tomasz-lewicki commented May 27, 2026

Summary

Carves the per-cycle run loop out of BasePolicy into a dedicated Controller that owns the SDK interface, input providers, rate limiter, and keyboard listener. Policies become pluggable objects conforming to PolicyProtocol (act, on_activate, on_deactivate, apply_velocity, apply_command); transitions like INIT, DAMP, and STIFF_HOLD are themselves first-class policies the Controller swaps in. DualModePolicy collapses to a thin swap object; its parallel run loop is gone.

New first-class DAMP state (StateCommand.DAMP, keyboard \, joystick B+X chord) holds the last observed q with the policy's KP/KD so the robot stays energized when the teleop handle is released.

What goes away

  • DualModePolicy's parallel run loop and _dispatch_command lambda-patching
  • BasePolicy._handle_{start,stop,init,damp}_* handlers (and WBT overrides)
  • _shared_hardware_source guard pattern
  • 5-way branching in policy_action()
  • ControllerState enum write-through (replaced by controller.active.name)

Functionality Map

Function Old abstraction New abstraction
Owns SDK interface + input providers + rate limiter Constructed inside BasePolicy.__init__ (_init_sdk_components, _init_communication_components, _init_input_handlers) Controller, built once by build_default_hardware(config) in run_policy.py
Per-tick run loop (poll velocity → poll command → act → send) BasePolicy.run() Controller.step() driving a PolicyProtocol
Maps state → low-level command BasePolicy.policy_action() (5-way branching on flags) PolicyProtocol.act(ctx, state) -> Command
State machine Three coupled flags: use_policy_action, get_ready_state, _stiff_hold_active Controller.active: PolicyProtocol (the active policy is the state)
Lifecycle hooks on state transitions BasePolicy._handle_{start,stop,init,damp}_* (and WBT overrides) PolicyProtocol.on_activate(ctx) / on_deactivate(ctx)
Discrete command dispatch BasePolicy._dispatch_command() (monkey-patched by DualModePolicy) PolicyProtocol.apply_command(cmd) -> bool, with Controller._builtin_dispatch as fallback
Velocity command side-channel _apply_velocity hook on subclasses PolicyProtocol.apply_velocity(vc)
Hold last pose with KP/KD when handle released Did not exist DampingPolicy (policies/damping.py), entered via StateCommand.DAMP
Init-pose ramp before policy starts Inline get_ready_state branch in policy_action() InitRampPolicy (policies/init_ramp.py)
WBT stiff-hold pre-roll _stiff_hold_active flag + branch in WBT's policy_action StiffHoldPolicy (policies/stiff_hold.py)
Two-policy runtime swap (X-button) DualModePolicy class with parallel run() loop + _shared_hardware_source guard + _dispatch_command lambda patching A Controller with multiple entries in its policies dict; SWITCH_MODE cycles controller.active
Hardware sharing across policies secondary._shared_hardware_source = primary guard pattern in BasePolicy.__init__ One Controller owns hardware; all policies receive it via ctx in act()
Policy class selection from config Lived inside DualModePolicy.__init__ _select_policy_class(config) standalone helper in policies/dual_mode.py
sim2sim regression test Did not exist tests/sim2sim/harness.py + MujocoInterface adapter

The design doc (docs/controller-design.md) lays out a 5-state Controller
that orchestrates BasePolicy, replacing today's _shared_hardware_source
guard pattern and the duplicated dual-mode run loop. It also adds a
first-class DAMP state so the robot can hold pose when the teleop handle
is released.

The harness (tests/sim2sim/) drives a real LocomotionPolicy against an
in-process MuJoCo interface, asserts the pelvis stays above 0.3 m for
10 s of sim time at 0.5 m/s forward velocity. Used as the regression
gate for the Controller refactor.

Baseline: pelvis final=0.768 m, min=0.756 m on G1-29dof FastSAC checkpoint.
Step 1 of the Controller refactor (see docs/controller-design.md). Carves
the per-cycle run-loop body out of BasePolicy.run() into a new
Controller class. Hardware ownership stays on the policy — the
Controller reaches into the policy for interface, inputs, rate, and
latency tracker. Steps 2+ migrate ownership.

BasePolicy.run() is now a 3-line shim that builds a Controller and
delegates. ControllerState is a read-only projection of the existing
flags (use_policy_action, get_ready_state, _stiff_hold_active). It
becomes load-bearing in Step 3.

DualModePolicy.run() is left untouched — Step 5 collapses it to a swap.
FAR-pi extensions also untouched — Step 7 will update them.

Verified: sim2sim harness pelvis final=0.768 m, min=0.756 m, identical
to the pre-refactor baseline. All 3 sim2sim tests pass.
Opens a passive MuJoCo viewer paced at real-time so the locomotion
policy can be eyeballed during the refactor. Headless behaviour is
unchanged.

  python -m tests.sim2sim.harness                 # headless, ~2 s
  python -m tests.sim2sim.harness --render -d 15  # viewer, real-time

Skips viewer.close() on teardown — the passive viewer races with the
GL context shutdown and prints GLXBadWindow from libX11 (a stderr-only
X protocol error not catchable from Python). Letting interpreter
shutdown handle it produces a quiet exit.
Step 2 of the Controller refactor: BasePolicy no longer creates or owns
the SDK interface, input providers, rate limiter, or keyboard listener.
Those move to the Controller, which is constructed in run_policy.py
from build_default_hardware(config) before policy instantiation.

Step 5 came along for the ride because DualModePolicy.run() reached
into per-policy private attributes that no longer exist. Rewrote
DualModePolicy as a thin swap object that holds two policies sharing
one Controller's hardware and flips controller.policy on SWITCH_MODE.
Its parallel run loop is gone.

Concrete changes:

- New build_default_hardware(config) in controller.py constructs
  interface + inputs + rate; run_policy.py uses it to build a Controller
  before instantiating the policy.
- BasePolicy.__init__ accepts an optional `interface` parameter.
  Removed _init_sdk_components, _init_communication_components,
  _init_input_handlers, _init_rate_handler, _init_input_device,
  _init_joystick_handler, _create_input_providers, _setup_keyboard_listener,
  _shared_hardware_source, BasePolicy.run().
- LocomotionPolicy / WholeBodyTrackingPolicy accept and forward `interface`.
- LocomotionPolicy._handle_zero_velocity / _handle_stand_command go
  through self.controller.velocity_input.zero() instead of
  self._velocity_input.zero().
- create_input(source, role, interface, config, use_joystick) — no
  longer takes a policy reference.
- WBT no longer has the secondary-policy stiff-hold-prompt skip; the
  prompt is gated on sys.stdin.isatty() only.

Tests:
- Sim2sim harness updated to construct Controller directly. Result:
  pelvis final=0.767 m, min=0.755 m (baseline 0.768/0.756).
- inputs/tests/{test_factory,test_providers,test_dual_mode}.py target
  the pre-Controller API and the patched-_dispatch_command pattern.
  File-level pytest.mark.skip with TODO(step 7) until rewritten.

Steps 3 (FSM formalization) and 4 (DAMP state) still ahead.
FAR-pi extensions still untouched (Step 7).
Step 3 makes Controller.state writable via Controller.set_state(). The
legacy flags (use_policy_action, get_ready_state, _stiff_hold_active)
become a derived view: set_state() updates them atomically. Subclass
dispatch handlers (_handle_start_policy, _handle_stop_policy,
_handle_init_state) now route through Controller.set_state() instead
of mutating flags directly. The legacy flag-mutation paths are
preserved as fallbacks for code paths that build a policy without a
Controller (the input tests' mock-policy fixtures, FAR-pi extensions
in Step 7).

Step 4 adds the DAMP state Adam Setapen asked for: hold the last
observed joint positions with the policy's KP/KD gains so the robot
stays energized when the teleop handle is released. Triggered by:

  StateCommand.DAMP — new entry in inputs/api/commands.py
  Keyboard "\\" — backslash
  Joystick B+X chord

Implementation lives entirely on Controller. _publish_damp_command()
captures q on entry from interface.get_low_state(), then emits
send_low_command(q_hold, kp_override=kp, kd_override=kd) every tick.
Step() short-circuits past policy_action() while the state is DAMP.
Exiting via START/STOP/INIT clears the damp flag through set_state().

Tests:
- test_damp_state.py: pelvis stays > 0.6 m for 2 s in DAMP, the damp
  flag clears on transition to RUN_POLICY, and the held q reflects the
  current joint state (not default pose).
- Sim2sim harness baseline unchanged at pelvis final=0.767, min=0.755.
- 90 passed, 108 skipped, 0 failed.

Step 7 (FAR-pi extensions + rewrite the skipped input tests) is the
only remaining work in the controller-refactor sequence.
Lays the groundwork for Step 8 without changing any behaviour. The
existing Controller class moves out of the top-level
holosoma_inference/controller.py into holosoma_inference/controllers/
controller.py; the top-level file becomes a deprecation shim that
re-exports the public symbols.

Adds two new files:

  controllers/protocol.py — PolicyProtocol (5 members: act, on_activate,
    on_deactivate, apply_velocity, apply_command) and the Command
    dataclass returned from act().
  controllers/__init__.py — public surface for the submodule.

No callers updated yet (they all still go through the deprecation
shim). 8b makes OnnxBasePolicy conform to the protocol; 8c rewrites
Controller to drive policies by protocol; 8d cleans up the legacy
flags.

Sim2sim harness: pelvis final=0.767 m, min=0.755 m (unchanged).
Pytest: 90 passed, 108 skipped, 0 failed.

Also updates docs/controller-design.md with the controllers/ layout
and adds PR_DESC.md listing the abstractions Step 8 will eliminate.
Adds the protocol surface (act, on_activate, on_deactivate,
apply_velocity, apply_command) to BasePolicy and pulls the body of
policy_action() into a private _compute_action() helper that returns
a Command. policy_action() becomes a back-compat wrapper that calls
_compute_action() then forwards the Command to send_low_command().

Subclasses pick up name = "locomotion" / "wbt" for Step 8c's policies
dict keys. apply_velocity wraps the existing _apply_velocity hook;
apply_command snapshots a coarse policy state before/after dispatch
to detect "did the legacy table handle it" until 8c narrows the
contract.

Controller is unchanged at this step — it still calls
policy.policy_action() through the run loop. 8c will switch it to
calling policy.act() directly.

Sim2sim harness: pelvis final=0.767 m, min=0.755 m (unchanged).
Pytest: 10 passed in tests/sim2sim, including 4 new protocol-
conformance tests asserting isinstance(policy, PolicyProtocol),
Command shape from act(), apply_velocity() lifts to no-raise on the
base class, and on_activate/on_deactivate are no-ops.
Drops BasePolicy._handle_start_policy / _handle_stop_policy /
_handle_init_state / _handle_damp_state and their WBT overrides. They
were redundant once Step 8c made on_activate / on_deactivate the
canonical lifecycle hooks and Controller._builtin_dispatch handled
all transition commands directly.

WBT keeps on_activate (which captures yaw offsets and resets stiff
hold) and on_deactivate (which resets motion clip state).

The use_policy_action / get_ready_state / _stiff_hold_active flags
remain on BasePolicy because _compute_action() still consults them
for the WBT stiff-hold-via-_get_manual_command path. A future cleanup
that rewrites WBT to use a real StiffHoldPolicy would let those
flags go too.

Sim2sim harness: pelvis final=0.764 m, min=0.752 m (unchanged).
Pytest: 94 passed, 108 skipped, 0 failed.
@tomasz-lewicki tomasz-lewicki changed the title WIP on controller [WIP] Controller refactor: Extract orchestration FSM out of BasePolicy May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant