[WIP] Controller refactor: Extract orchestration FSM out of BasePolicy#117
Open
tomasz-lewicki wants to merge 9 commits into
Open
[WIP] Controller refactor: Extract orchestration FSM out of BasePolicy#117tomasz-lewicki wants to merge 9 commits into
tomasz-lewicki wants to merge 9 commits into
Conversation
The design doc (docs/controller-design.md) lays out a 5-state Controller that orchestrates BasePolicy, replacing today's _shared_hardware_source guard pattern and the duplicated dual-mode run loop. It also adds a first-class DAMP state so the robot can hold pose when the teleop handle is released. The harness (tests/sim2sim/) drives a real LocomotionPolicy against an in-process MuJoCo interface, asserts the pelvis stays above 0.3 m for 10 s of sim time at 0.5 m/s forward velocity. Used as the regression gate for the Controller refactor. Baseline: pelvis final=0.768 m, min=0.756 m on G1-29dof FastSAC checkpoint.
Step 1 of the Controller refactor (see docs/controller-design.md). Carves the per-cycle run-loop body out of BasePolicy.run() into a new Controller class. Hardware ownership stays on the policy — the Controller reaches into the policy for interface, inputs, rate, and latency tracker. Steps 2+ migrate ownership. BasePolicy.run() is now a 3-line shim that builds a Controller and delegates. ControllerState is a read-only projection of the existing flags (use_policy_action, get_ready_state, _stiff_hold_active). It becomes load-bearing in Step 3. DualModePolicy.run() is left untouched — Step 5 collapses it to a swap. FAR-pi extensions also untouched — Step 7 will update them. Verified: sim2sim harness pelvis final=0.768 m, min=0.756 m, identical to the pre-refactor baseline. All 3 sim2sim tests pass.
Opens a passive MuJoCo viewer paced at real-time so the locomotion policy can be eyeballed during the refactor. Headless behaviour is unchanged. python -m tests.sim2sim.harness # headless, ~2 s python -m tests.sim2sim.harness --render -d 15 # viewer, real-time Skips viewer.close() on teardown — the passive viewer races with the GL context shutdown and prints GLXBadWindow from libX11 (a stderr-only X protocol error not catchable from Python). Letting interpreter shutdown handle it produces a quiet exit.
Step 2 of the Controller refactor: BasePolicy no longer creates or owns
the SDK interface, input providers, rate limiter, or keyboard listener.
Those move to the Controller, which is constructed in run_policy.py
from build_default_hardware(config) before policy instantiation.
Step 5 came along for the ride because DualModePolicy.run() reached
into per-policy private attributes that no longer exist. Rewrote
DualModePolicy as a thin swap object that holds two policies sharing
one Controller's hardware and flips controller.policy on SWITCH_MODE.
Its parallel run loop is gone.
Concrete changes:
- New build_default_hardware(config) in controller.py constructs
interface + inputs + rate; run_policy.py uses it to build a Controller
before instantiating the policy.
- BasePolicy.__init__ accepts an optional `interface` parameter.
Removed _init_sdk_components, _init_communication_components,
_init_input_handlers, _init_rate_handler, _init_input_device,
_init_joystick_handler, _create_input_providers, _setup_keyboard_listener,
_shared_hardware_source, BasePolicy.run().
- LocomotionPolicy / WholeBodyTrackingPolicy accept and forward `interface`.
- LocomotionPolicy._handle_zero_velocity / _handle_stand_command go
through self.controller.velocity_input.zero() instead of
self._velocity_input.zero().
- create_input(source, role, interface, config, use_joystick) — no
longer takes a policy reference.
- WBT no longer has the secondary-policy stiff-hold-prompt skip; the
prompt is gated on sys.stdin.isatty() only.
Tests:
- Sim2sim harness updated to construct Controller directly. Result:
pelvis final=0.767 m, min=0.755 m (baseline 0.768/0.756).
- inputs/tests/{test_factory,test_providers,test_dual_mode}.py target
the pre-Controller API and the patched-_dispatch_command pattern.
File-level pytest.mark.skip with TODO(step 7) until rewritten.
Steps 3 (FSM formalization) and 4 (DAMP state) still ahead.
FAR-pi extensions still untouched (Step 7).
Step 3 makes Controller.state writable via Controller.set_state(). The legacy flags (use_policy_action, get_ready_state, _stiff_hold_active) become a derived view: set_state() updates them atomically. Subclass dispatch handlers (_handle_start_policy, _handle_stop_policy, _handle_init_state) now route through Controller.set_state() instead of mutating flags directly. The legacy flag-mutation paths are preserved as fallbacks for code paths that build a policy without a Controller (the input tests' mock-policy fixtures, FAR-pi extensions in Step 7). Step 4 adds the DAMP state Adam Setapen asked for: hold the last observed joint positions with the policy's KP/KD gains so the robot stays energized when the teleop handle is released. Triggered by: StateCommand.DAMP — new entry in inputs/api/commands.py Keyboard "\\" — backslash Joystick B+X chord Implementation lives entirely on Controller. _publish_damp_command() captures q on entry from interface.get_low_state(), then emits send_low_command(q_hold, kp_override=kp, kd_override=kd) every tick. Step() short-circuits past policy_action() while the state is DAMP. Exiting via START/STOP/INIT clears the damp flag through set_state(). Tests: - test_damp_state.py: pelvis stays > 0.6 m for 2 s in DAMP, the damp flag clears on transition to RUN_POLICY, and the held q reflects the current joint state (not default pose). - Sim2sim harness baseline unchanged at pelvis final=0.767, min=0.755. - 90 passed, 108 skipped, 0 failed. Step 7 (FAR-pi extensions + rewrite the skipped input tests) is the only remaining work in the controller-refactor sequence.
Lays the groundwork for Step 8 without changing any behaviour. The
existing Controller class moves out of the top-level
holosoma_inference/controller.py into holosoma_inference/controllers/
controller.py; the top-level file becomes a deprecation shim that
re-exports the public symbols.
Adds two new files:
controllers/protocol.py — PolicyProtocol (5 members: act, on_activate,
on_deactivate, apply_velocity, apply_command) and the Command
dataclass returned from act().
controllers/__init__.py — public surface for the submodule.
No callers updated yet (they all still go through the deprecation
shim). 8b makes OnnxBasePolicy conform to the protocol; 8c rewrites
Controller to drive policies by protocol; 8d cleans up the legacy
flags.
Sim2sim harness: pelvis final=0.767 m, min=0.755 m (unchanged).
Pytest: 90 passed, 108 skipped, 0 failed.
Also updates docs/controller-design.md with the controllers/ layout
and adds PR_DESC.md listing the abstractions Step 8 will eliminate.
Adds the protocol surface (act, on_activate, on_deactivate, apply_velocity, apply_command) to BasePolicy and pulls the body of policy_action() into a private _compute_action() helper that returns a Command. policy_action() becomes a back-compat wrapper that calls _compute_action() then forwards the Command to send_low_command(). Subclasses pick up name = "locomotion" / "wbt" for Step 8c's policies dict keys. apply_velocity wraps the existing _apply_velocity hook; apply_command snapshots a coarse policy state before/after dispatch to detect "did the legacy table handle it" until 8c narrows the contract. Controller is unchanged at this step — it still calls policy.policy_action() through the run loop. 8c will switch it to calling policy.act() directly. Sim2sim harness: pelvis final=0.767 m, min=0.755 m (unchanged). Pytest: 10 passed in tests/sim2sim, including 4 new protocol- conformance tests asserting isinstance(policy, PolicyProtocol), Command shape from act(), apply_velocity() lifts to no-raise on the base class, and on_activate/on_deactivate are no-ops.
Drops BasePolicy._handle_start_policy / _handle_stop_policy / _handle_init_state / _handle_damp_state and their WBT overrides. They were redundant once Step 8c made on_activate / on_deactivate the canonical lifecycle hooks and Controller._builtin_dispatch handled all transition commands directly. WBT keeps on_activate (which captures yaw offsets and resets stiff hold) and on_deactivate (which resets motion clip state). The use_policy_action / get_ready_state / _stiff_hold_active flags remain on BasePolicy because _compute_action() still consults them for the WBT stiff-hold-via-_get_manual_command path. A future cleanup that rewrites WBT to use a real StiffHoldPolicy would let those flags go too. Sim2sim harness: pelvis final=0.764 m, min=0.752 m (unchanged). Pytest: 94 passed, 108 skipped, 0 failed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Carves the per-cycle run loop out of
BasePolicyinto a dedicatedControllerthat owns the SDK interface, input providers, rate limiter, and keyboard listener. Policies become pluggable objects conforming toPolicyProtocol(act,on_activate,on_deactivate,apply_velocity,apply_command); transitions like INIT, DAMP, and STIFF_HOLD are themselves first-class policies the Controller swaps in.DualModePolicycollapses to a thin swap object; its parallel run loop is gone.New first-class DAMP state (
StateCommand.DAMP, keyboard\, joystick B+X chord) holds the last observedqwith the policy's KP/KD so the robot stays energized when the teleop handle is released.What goes away
DualModePolicy's parallel run loop and_dispatch_commandlambda-patchingBasePolicy._handle_{start,stop,init,damp}_*handlers (and WBT overrides)_shared_hardware_sourceguard patternpolicy_action()ControllerStateenum write-through (replaced bycontroller.active.name)Functionality Map
BasePolicy.__init__(_init_sdk_components,_init_communication_components,_init_input_handlers)Controller, built once bybuild_default_hardware(config)inrun_policy.pyBasePolicy.run()Controller.step()driving aPolicyProtocolBasePolicy.policy_action()(5-way branching on flags)PolicyProtocol.act(ctx, state) -> Commanduse_policy_action,get_ready_state,_stiff_hold_activeController.active: PolicyProtocol(the active policy is the state)BasePolicy._handle_{start,stop,init,damp}_*(and WBT overrides)PolicyProtocol.on_activate(ctx)/on_deactivate(ctx)BasePolicy._dispatch_command()(monkey-patched byDualModePolicy)PolicyProtocol.apply_command(cmd) -> bool, withController._builtin_dispatchas fallback_apply_velocityhook on subclassesPolicyProtocol.apply_velocity(vc)DampingPolicy(policies/damping.py), entered viaStateCommand.DAMPget_ready_statebranch inpolicy_action()InitRampPolicy(policies/init_ramp.py)_stiff_hold_activeflag + branch in WBT'spolicy_actionStiffHoldPolicy(policies/stiff_hold.py)DualModePolicyclass with parallelrun()loop +_shared_hardware_sourceguard +_dispatch_commandlambda patchingControllerwith multiple entries in itspoliciesdict;SWITCH_MODEcyclescontroller.activesecondary._shared_hardware_source = primaryguard pattern inBasePolicy.__init__Controllerowns hardware; all policies receive it viactxinact()DualModePolicy.__init___select_policy_class(config)standalone helper inpolicies/dual_mode.pytests/sim2sim/harness.py+MujocoInterfaceadapter