[WIP] Controller refactor: Extract orchestration FSM out of BasePolicy by tomasz-lewicki · Pull Request #117 · amazon-far/holosoma

tomasz-lewicki · 2026-05-27T19:17:23Z

Summary

Carves the per-cycle run loop out of BasePolicy into a dedicated Controller that owns the SDK interface, input providers, rate limiter, and keyboard listener. Policies become pluggable objects conforming to PolicyProtocol (act, on_activate, on_deactivate, apply_velocity, apply_command); transitions like INIT, DAMP, and STIFF_HOLD are themselves first-class policies the Controller swaps in. DualModePolicy collapses to a thin swap object; its parallel run loop is gone.

New first-class DAMP state (StateCommand.DAMP, keyboard \, joystick B+X chord) holds the last observed q with the policy's KP/KD so the robot stays energized when the teleop handle is released.

What goes away

DualModePolicy's parallel run loop and _dispatch_command lambda-patching
BasePolicy._handle_{start,stop,init,damp}_* handlers (and WBT overrides)
_shared_hardware_source guard pattern
5-way branching in policy_action()
ControllerState enum write-through (replaced by controller.active.name)

Functionality Map

Function	Old abstraction	New abstraction
Owns SDK interface + input providers + rate limiter	Constructed inside `BasePolicy.__init__` (`_init_sdk_components`, `_init_communication_components`, `_init_input_handlers`)	`Controller`, built once by `build_default_hardware(config)` in `run_policy.py`
Per-tick run loop (poll velocity → poll command → act → send)	`BasePolicy.run()`	`Controller.step()` driving a `PolicyProtocol`
Maps state → low-level command	`BasePolicy.policy_action()` (5-way branching on flags)	`PolicyProtocol.act(ctx, state) -> Command`
State machine	Three coupled flags: `use_policy_action`, `get_ready_state`, `_stiff_hold_active`	`Controller.active: PolicyProtocol` (the active policy is the state)
Lifecycle hooks on state transitions	`BasePolicy._handle_{start,stop,init,damp}_*` (and WBT overrides)	`PolicyProtocol.on_activate(ctx)` / `on_deactivate(ctx)`
Discrete command dispatch	`BasePolicy._dispatch_command()` (monkey-patched by `DualModePolicy`)	`PolicyProtocol.apply_command(cmd) -> bool`, with `Controller._builtin_dispatch` as fallback
Velocity command side-channel	`_apply_velocity` hook on subclasses	`PolicyProtocol.apply_velocity(vc)`
Hold last pose with KP/KD when handle released	Did not exist	`DampingPolicy` (`policies/damping.py`), entered via `StateCommand.DAMP`
Init-pose ramp before policy starts	Inline `get_ready_state` branch in `policy_action()`	`InitRampPolicy` (`policies/init_ramp.py`)
WBT stiff-hold pre-roll	`_stiff_hold_active` flag + branch in WBT's `policy_action`	`StiffHoldPolicy` (`policies/stiff_hold.py`)
Two-policy runtime swap (X-button)	`DualModePolicy` class with parallel `run()` loop + `_shared_hardware_source` guard + `_dispatch_command` lambda patching	A `Controller` with multiple entries in its `policies` dict; `SWITCH_MODE` cycles `controller.active`
Hardware sharing across policies	`secondary._shared_hardware_source = primary` guard pattern in `BasePolicy.__init__`	One `Controller` owns hardware; all policies receive it via `ctx` in `act()`
Policy class selection from config	Lived inside `DualModePolicy.__init__`	`_select_policy_class(config)` standalone helper in `policies/dual_mode.py`
sim2sim regression test	Did not exist	`tests/sim2sim/harness.py` + `MujocoInterface` adapter

The design doc (docs/controller-design.md) lays out a 5-state Controller that orchestrates BasePolicy, replacing today's _shared_hardware_source guard pattern and the duplicated dual-mode run loop. It also adds a first-class DAMP state so the robot can hold pose when the teleop handle is released. The harness (tests/sim2sim/) drives a real LocomotionPolicy against an in-process MuJoCo interface, asserts the pelvis stays above 0.3 m for 10 s of sim time at 0.5 m/s forward velocity. Used as the regression gate for the Controller refactor. Baseline: pelvis final=0.768 m, min=0.756 m on G1-29dof FastSAC checkpoint.

Step 1 of the Controller refactor (see docs/controller-design.md). Carves the per-cycle run-loop body out of BasePolicy.run() into a new Controller class. Hardware ownership stays on the policy — the Controller reaches into the policy for interface, inputs, rate, and latency tracker. Steps 2+ migrate ownership. BasePolicy.run() is now a 3-line shim that builds a Controller and delegates. ControllerState is a read-only projection of the existing flags (use_policy_action, get_ready_state, _stiff_hold_active). It becomes load-bearing in Step 3. DualModePolicy.run() is left untouched — Step 5 collapses it to a swap. FAR-pi extensions also untouched — Step 7 will update them. Verified: sim2sim harness pelvis final=0.768 m, min=0.756 m, identical to the pre-refactor baseline. All 3 sim2sim tests pass.

Opens a passive MuJoCo viewer paced at real-time so the locomotion policy can be eyeballed during the refactor. Headless behaviour is unchanged. python -m tests.sim2sim.harness # headless, ~2 s python -m tests.sim2sim.harness --render -d 15 # viewer, real-time Skips viewer.close() on teardown — the passive viewer races with the GL context shutdown and prints GLXBadWindow from libX11 (a stderr-only X protocol error not catchable from Python). Letting interpreter shutdown handle it produces a quiet exit.

Step 2 of the Controller refactor: BasePolicy no longer creates or owns the SDK interface, input providers, rate limiter, or keyboard listener. Those move to the Controller, which is constructed in run_policy.py from build_default_hardware(config) before policy instantiation. Step 5 came along for the ride because DualModePolicy.run() reached into per-policy private attributes that no longer exist. Rewrote DualModePolicy as a thin swap object that holds two policies sharing one Controller's hardware and flips controller.policy on SWITCH_MODE. Its parallel run loop is gone. Concrete changes: - New build_default_hardware(config) in controller.py constructs interface + inputs + rate; run_policy.py uses it to build a Controller before instantiating the policy. - BasePolicy.__init__ accepts an optional `interface` parameter. Removed _init_sdk_components, _init_communication_components, _init_input_handlers, _init_rate_handler, _init_input_device, _init_joystick_handler, _create_input_providers, _setup_keyboard_listener, _shared_hardware_source, BasePolicy.run(). - LocomotionPolicy / WholeBodyTrackingPolicy accept and forward `interface`. - LocomotionPolicy._handle_zero_velocity / _handle_stand_command go through self.controller.velocity_input.zero() instead of self._velocity_input.zero(). - create_input(source, role, interface, config, use_joystick) — no longer takes a policy reference. - WBT no longer has the secondary-policy stiff-hold-prompt skip; the prompt is gated on sys.stdin.isatty() only. Tests: - Sim2sim harness updated to construct Controller directly. Result: pelvis final=0.767 m, min=0.755 m (baseline 0.768/0.756). - inputs/tests/{test_factory,test_providers,test_dual_mode}.py target the pre-Controller API and the patched-_dispatch_command pattern. File-level pytest.mark.skip with TODO(step 7) until rewritten. Steps 3 (FSM formalization) and 4 (DAMP state) still ahead. FAR-pi extensions still untouched (Step 7).

Step 3 makes Controller.state writable via Controller.set_state(). The legacy flags (use_policy_action, get_ready_state, _stiff_hold_active) become a derived view: set_state() updates them atomically. Subclass dispatch handlers (_handle_start_policy, _handle_stop_policy, _handle_init_state) now route through Controller.set_state() instead of mutating flags directly. The legacy flag-mutation paths are preserved as fallbacks for code paths that build a policy without a Controller (the input tests' mock-policy fixtures, FAR-pi extensions in Step 7). Step 4 adds the DAMP state Adam Setapen asked for: hold the last observed joint positions with the policy's KP/KD gains so the robot stays energized when the teleop handle is released. Triggered by: StateCommand.DAMP — new entry in inputs/api/commands.py Keyboard "\\" — backslash Joystick B+X chord Implementation lives entirely on Controller. _publish_damp_command() captures q on entry from interface.get_low_state(), then emits send_low_command(q_hold, kp_override=kp, kd_override=kd) every tick. Step() short-circuits past policy_action() while the state is DAMP. Exiting via START/STOP/INIT clears the damp flag through set_state(). Tests: - test_damp_state.py: pelvis stays > 0.6 m for 2 s in DAMP, the damp flag clears on transition to RUN_POLICY, and the held q reflects the current joint state (not default pose). - Sim2sim harness baseline unchanged at pelvis final=0.767, min=0.755. - 90 passed, 108 skipped, 0 failed. Step 7 (FAR-pi extensions + rewrite the skipped input tests) is the only remaining work in the controller-refactor sequence.

Lays the groundwork for Step 8 without changing any behaviour. The existing Controller class moves out of the top-level holosoma_inference/controller.py into holosoma_inference/controllers/ controller.py; the top-level file becomes a deprecation shim that re-exports the public symbols. Adds two new files: controllers/protocol.py — PolicyProtocol (5 members: act, on_activate, on_deactivate, apply_velocity, apply_command) and the Command dataclass returned from act(). controllers/__init__.py — public surface for the submodule. No callers updated yet (they all still go through the deprecation shim). 8b makes OnnxBasePolicy conform to the protocol; 8c rewrites Controller to drive policies by protocol; 8d cleans up the legacy flags. Sim2sim harness: pelvis final=0.767 m, min=0.755 m (unchanged). Pytest: 90 passed, 108 skipped, 0 failed. Also updates docs/controller-design.md with the controllers/ layout and adds PR_DESC.md listing the abstractions Step 8 will eliminate.

Adds the protocol surface (act, on_activate, on_deactivate, apply_velocity, apply_command) to BasePolicy and pulls the body of policy_action() into a private _compute_action() helper that returns a Command. policy_action() becomes a back-compat wrapper that calls _compute_action() then forwards the Command to send_low_command(). Subclasses pick up name = "locomotion" / "wbt" for Step 8c's policies dict keys. apply_velocity wraps the existing _apply_velocity hook; apply_command snapshots a coarse policy state before/after dispatch to detect "did the legacy table handle it" until 8c narrows the contract. Controller is unchanged at this step — it still calls policy.policy_action() through the run loop. 8c will switch it to calling policy.act() directly. Sim2sim harness: pelvis final=0.767 m, min=0.755 m (unchanged). Pytest: 10 passed in tests/sim2sim, including 4 new protocol- conformance tests asserting isinstance(policy, PolicyProtocol), Command shape from act(), apply_velocity() lifts to no-raise on the base class, and on_activate/on_deactivate are no-ops.

…olicies

Drops BasePolicy._handle_start_policy / _handle_stop_policy / _handle_init_state / _handle_damp_state and their WBT overrides. They were redundant once Step 8c made on_activate / on_deactivate the canonical lifecycle hooks and Controller._builtin_dispatch handled all transition commands directly. WBT keeps on_activate (which captures yaw offsets and resets stiff hold) and on_deactivate (which resets motion clip state). The use_policy_action / get_ready_state / _stiff_hold_active flags remain on BasePolicy because _compute_action() still consults them for the WBT stiff-hold-via-_get_manual_command path. A future cleanup that rewrites WBT to use a real StiffHoldPolicy would let those flags go too. Sim2sim harness: pelvis final=0.764 m, min=0.752 m (unchanged). Pytest: 94 passed, 108 skipped, 0 failed.

tomasz-lewicki added 9 commits May 8, 2026 18:06

Step 8c: PolicyProtocol drives Controller; DAMP/INIT/STIFF_HOLD are p…

5c4dda7

…olicies

tomasz-lewicki changed the title ~~WIP on controller~~ [WIP] Controller refactor: Extract orchestration FSM out of BasePolicy May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Controller refactor: Extract orchestration FSM out of BasePolicy#117

[WIP] Controller refactor: Extract orchestration FSM out of BasePolicy#117
tomasz-lewicki wants to merge 9 commits into
mainfrom
dev/tomasz/controller

tomasz-lewicki commented May 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tomasz-lewicki commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What goes away

Functionality Map

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tomasz-lewicki commented May 27, 2026 •

edited

Loading