Cartpole stabilisation from pixels alone, with no angle or cart position sensor at inference time. Implements the linear vision-based control framework of Lee et al., "From Pixels to Torques with Linear Feedback", adapted for simulation in Gym CartPole.
The key result: a student policy trained on 40 teacher demos achieves 100% survival across 100 random initial conditions under true plant mismatch, running as a single matrix multiply at 60 Hz.
See the accompanying report for full derivations and evaluation.
The approach separates the control problem into two phases.
Data collection. A privileged teacher with full state access runs an LQR policy and logs
Observer training. A hybrid Luenberger observer is fit to the collected data. The nominal dynamics matrices
Inference. At each step the observer fuses nominal dynamics with the pixel estimate using a fixed blend weight
No angle sensor is used. The observer sees only binary pixel frames produced by Gaussian blur + Otsu thresholding of the Gym render.
Evaluated on 100 trials with
| Demos | Pixel→angle |
Pixel→angle RMSE | Survival | Paper-success* |
|---|---|---|---|---|
| 20 | 0.980 | 0.181° | 96/100 | 75/100 |
| 40 | 0.987 | 0.123° | 100/100 | 72/100 |
* paper-success:
The jump from 20 to 40 demos eliminates all failures and collapses the 95th-percentile
Why 72% paper-success is a meaningful result. The original paper's observer corrects all 4 states (cart position, cart velocity, pole angle, and pole angular rate) directly from pixels. This implementation corrects only
Requirements: Python 3.9+
git clone https://github.com/<you>/pixelsToSteps.git
cd pixelsToSteps
python3 -m pip install -r simulation/python/requirements.txtRun the live web UI (streams Gym renders, binary observations, and telemetry):
python3 -m uvicorn simulation.web.app:app --host 127.0.0.1 --port 8000Then open http://127.0.0.1:8000 and click Reset → Start.
Reproduce the evaluation from scratch:
# 1. Collect 40 teacher demos
python3 simulation/python/collect_teacher_demos.py \
--steps 600 \
--theta0-deg-list $(seq -s' ' -12 0.5 -0.5) $(seq -s' ' 0.5 0.5 12) \
--collection-id my_40demos
# 2. Train the hybrid observer
python3 simulation/python/train_hybrid_observer.py \
--collection-dir simulation/captures/collections/my_40demos \
--output-json simulation/python/my_observer.json
# 3. Evaluate on 100 random initial conditions
python3 simulation/python/eval_initial_conditions.py \
--observer-json simulation/python/my_observer.json \
--n-trials 100 --steps 150Or use the pre-trained observer directly:
python3 simulation/python/teacher_policy.py \
--steps 150 --theta0-deg 12 \
--true-masspole-scale 1.1 --true-half-pole-length-scale 0.9 \
--observer-json simulation/python/hybrid_pixels_to_cartpole_observer_theta_blend_0p7.json