Skip to content

Motion planning: diff drive pose controller + dashboard UI#1

Open
JaredBaileyDuke wants to merge 7 commits into
mainfrom
motion_planning
Open

Motion planning: diff drive pose controller + dashboard UI#1
JaredBaileyDuke wants to merge 7 commits into
mainfrom
motion_planning

Conversation

@JaredBaileyDuke
Copy link
Copy Markdown
Collaborator

@JaredBaileyDuke JaredBaileyDuke commented May 11, 2026

Summary

  • Firmware (Pi): motion_control.py — diff drive pose controller with spin_move_spin and continuous modes; BLE service exposing goal, pose, and status characteristics
  • Firmware (ESP32): UUID registration for the new motion service
  • C++ control package: packages/control/DiffDrive, Pose2D, SpinMoveSpinController, ContinuousController with Python bindings via pybind11
  • Dashboard UI: motion card with manual joypad (drives motors directly) and pose goal form (x/y/θ, controller mode, optional wheel config override)
  • CV integration: background 500ms polling loop auto-writes motionPoseChar whenever a fresh entry.cvPosition arrives from the overhead camera panel — keeps odometry aligned with camera ground truth with no user action required

Test plan

  • Pair a robot with the motion BLE service and verify the motion card appears
  • Manual tab: drag joypad, confirm wheels spin with correct diff drive direction
  • Pose tab: enter x/y/θ, press Go in both Pose · Move · Pose and Continuous modes, confirm robot reaches goal and status updates
  • Cancel button stops in-progress move
  • Stop button (card header) halts both manual and pose modes
  • CV integration: run overhead camera scan, confirm robot's odometry auto-corrects without pressing Go

🤖 Generated with Claude Code

JaredBaileyDuke and others added 2 commits May 11, 2026 19:04
Adds packages/control — a C++ library with two PoseController
implementations (continuous and spin-move-spin), diff drive kinematics,
and pybind11 bindings for future compiled use.

Adds firmware/pi_robot/motion_control.py — pure Python mirror of the
C++ interface that runs on the Pi today.

Wires into pi_robot.py: three new BLE characteristics (motion-goal,
motion-pose, motion-status), wheel config from pi-robot.conf, 20 Hz
tick loop, and automatic cancellation on manual joystick input.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Redefines the 2-byte motor protocol as (forward%, turn%) instead of
(left%, right%). The Pi converts to physical (v, ω) using wheel_separation
and max_wheel_speed from pi-robot.conf, then applies diff drive kinematics
to get correct wheel PWM values. Handles backward turns correctly without
a sign flip.

Dashboard senders updated to match:
- joypad: sends raw (forward, turn) instead of pre-mixed (left, right)
- keyboard: removes mix() call, sends (throttle, turn) directly
- gamepad: switches from dual-stick tank to left-Y forward + right-X turn

The 4-byte LLM pulse path is unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@JaredBaileyDuke JaredBaileyDuke self-assigned this May 11, 2026
JaredBaileyDuke and others added 2 commits May 11, 2026 19:35
Adds a Motion cap card (pose-control type) that appears on Pi robots
with motors enabled. Supports two modes selectable per robot:
- Manual: virtual joypad wired to motorsChar (same diff-drive kinematics
  as the Motors card)
- Pose Control: X/Y (m) + theta (°) inputs, controller toggle
  (Spin→Move→Spin / Continuous), Go/Cancel buttons, live status pill,
  collapsible wheel-config overrides (separation, radius, max speed)

Dashboard changes:
- public/capabilities/runtime/motion.js — new makeMotionCap
- public/ble.js — UUIDS_BY_CAP.motion + import 3 motion UUIDs
- public/capabilities/runtime/index.js — register "pose-control" type
- public/capabilities/runtime/cap-section.js — motion defaults open
- public/styles.css — motion card layout styles

Firmware changes:
- pi_robot.py _motion_start + _motion_goal_handle_write now accept
  optional wheel_sep / wheel_r / max_spd overrides from the goal
  message; fall back to pi-robot.conf values when absent

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@JaredBaileyDuke JaredBaileyDuke changed the title Motion planning: diff drive pose control + proper joystick kinematics Motion planning: diff drive pose controller + dashboard UI May 11, 2026
JaredBaileyDuke and others added 2 commits May 11, 2026 20:29
Before sending a pose goal, write entry.cvPosition to motionPoseChar so
the robot's odometry starts from the camera-measured position rather than
its dead-reckoned estimate. Fix is skipped if cvPosition is absent or stale
(> 10s old). The pose tab shows the last CV fix and dims it when stale.
No-ops until the cv branch is merged and entry.cvPosition is populated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the Go-button-triggered CV fix with a 500ms polling loop
(startCvWatch/stopCvWatch) that automatically writes odometry resets
to motionPoseChar whenever a fresh, non-stale cvPosition arrives.
No user interaction required; no CV state visible in the motion card.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jonasneves
Copy link
Copy Markdown
Member

Oops, just seeing your PRs right now as I was disconnecting. I will take a look at them tomorrow, but ping me on Teams if you have anything blocking you.

@JaredBaileyDuke
Copy link
Copy Markdown
Collaborator Author

There is no rush. Just feeling inspired today and stayed up late working. I'm bringing in my balance bot today so that we can test the code whenever you are free next

Merge picks up motor calibration orientation, ESP32 camera lifecycle,
overhead ArUco helper relocation, and other main-line changes since the
branch's base.

Resolved conflict in firmware/pi_robot/pi_robot.py: kept both the new
motors_orientation flips from main (ORIENT_SWAP, ORIENT_INVERT_A/B) and
the wheel-geometry config additions from motion_planning (WHEEL_SEPARATION,
WHEEL_RADIUS, MAX_WHEEL_SPEED). Motion controller writes go through
_apply_motors so the orientation flips apply to pose-driven motion as well.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts:
#	firmware/pi_robot/pi_robot.py
jonasneves added a commit that referenced this pull request May 20, 2026
Four latency optimizations layered onto the OpenAI TTS path. None
touch the Web Speech fallback or change behavior for users without an
API key configured.

1. MediaSource streaming. Old path: await res.blob() — wait for the
   FULL mp3 before any audio plays (~500-1000ms TTFA). New: pipe the
   response body chunk-by-chunk into a MediaSource via a backpressure
   queue, audio starts as soon as the decoder has enough header data
   (~50-100ms after first chunk lands). Saves 200-400ms on every
   single speak(). Falls back to buffer-then-play if MSE isn't
   supported (some Safari versions) or the codec mismatches.

2. gpt-4o-mini-tts. Replaces tts-1. Lower TTFB (~250ms vs ~400ms by
   OpenAI's published benchmarks) AND supports a free-form
   `instructions` parameter for voice character — no more picking
   from 6 fixed voices. Default instructions: "peppy, youthful,
   slightly excited energy. Sound like a young hero — quick,
   enthusiastic, friendly. American accent, slightly higher pitched."
   Direct response to the "voice too adult / slightly sensual"
   feedback on onyx. tts-1 ignores the field; we only send it on
   gpt-4o-* models.

3. Connection pre-warm. First speak() fires a parallel HEAD on
   /v1/models to pay the TLS handshake (~150-250ms) once instead of
   inside the first user-visible speak. Subsequent calls reuse the
   kept-alive HTTP/2 connection from the browser pool. Once per page
   load, fire-and-forget.

4. (Already done last commit) — speak() returns a Promise that
   resolves on actual audio.onended, tool dispatch awaits it.
   Together with #1, the next motion fires the moment audio truly
   ends — no more truncation, no over-padding.

currentVoice() debug helper now reports the active engine + model +
voice + whether streaming is supported, so DevTools `currentVoice()`
gives a complete picture without introspecting the module.

Backpressure queue inside playStreamingMSE: reader can outpace the
decoder (chunks arrive faster than MediaSource can append them);
queue absorbs the burst and drains on each updateend event. Cleanup
is idempotent (resolved flag) so error / cancel paths don't double-
resolve.
jonasneves added a commit that referenced this pull request May 20, 2026
Three fixes for the introduce demo:

1. TTS reading punctuation aloud. Expressive instructions ("peppy,
   excited") can make gpt-4o-mini-tts vocalize trailing periods as
   "dot" — and once the Cache API stores a bad render, it replays
   forever. Removed every trailing period and ellipsis from the
   introduce strings; the words carry their own prosody. Bumped
   CACHE_NAME from tts-v1 → tts-v2 to invalidate the accumulated bad
   renders (any phrase that got cached with a "dot" pronunciation gets
   re-rendered fresh on next play). Pattern: never end a TTS string
   with a period unless you specifically want the model to render
   a pause + close.

2. "Follow you around" ending. Previous version did a settle-spin
   after the line, which (a) repeated the spin gesture from two lines
   earlier and (b) didn't illustrate "following" at all. Replaced
   with a smooth S-curve forward (1.4s gentle right arc + 1.4s gentle
   left arc), both moving forward — reads as "I'd come this way to
   find you" instead of "I'd spin in place."

3. "Spin" line that wasn't audible. Same root cause as #1 — the
   "spin..." render (with ellipsis) got cached badly and replayed
   silent or glitched. Clean text "spin" + cache bump fixes it on
   next play.

General principle worth keeping: tts-v2 cache namespace is the kill
switch for any future bad-render accumulation; bump it whenever model,
voice, instructions, or expressive style changes (or as a hammer
when you see weird audio).
@jonasneves jonasneves force-pushed the main branch 2 times, most recently from 9b7decb to 23b8179 Compare May 24, 2026 22:45
jonasneves added a commit that referenced this pull request May 24, 2026
Four latency optimizations layered onto the OpenAI TTS path. None
touch the Web Speech fallback or change behavior for users without an
API key configured.

1. MediaSource streaming. Old path: await res.blob() — wait for the
   FULL mp3 before any audio plays (~500-1000ms TTFA). New: pipe the
   response body chunk-by-chunk into a MediaSource via a backpressure
   queue, audio starts as soon as the decoder has enough header data
   (~50-100ms after first chunk lands). Saves 200-400ms on every
   single speak(). Falls back to buffer-then-play if MSE isn't
   supported (some Safari versions) or the codec mismatches.

2. gpt-4o-mini-tts. Replaces tts-1. Lower TTFB (~250ms vs ~400ms by
   OpenAI's published benchmarks) AND supports a free-form
   `instructions` parameter for voice character — no more picking
   from 6 fixed voices. Default instructions: "peppy, youthful,
   slightly excited energy. Sound like a young hero — quick,
   enthusiastic, friendly. American accent, slightly higher pitched."
   Direct response to the "voice too adult / slightly sensual"
   feedback on onyx. tts-1 ignores the field; we only send it on
   gpt-4o-* models.

3. Connection pre-warm. First speak() fires a parallel HEAD on
   /v1/models to pay the TLS handshake (~150-250ms) once instead of
   inside the first user-visible speak. Subsequent calls reuse the
   kept-alive HTTP/2 connection from the browser pool. Once per page
   load, fire-and-forget.

4. (Already done last commit) — speak() returns a Promise that
   resolves on actual audio.onended, tool dispatch awaits it.
   Together with #1, the next motion fires the moment audio truly
   ends — no more truncation, no over-padding.

currentVoice() debug helper now reports the active engine + model +
voice + whether streaming is supported, so DevTools `currentVoice()`
gives a complete picture without introspecting the module.

Backpressure queue inside playStreamingMSE: reader can outpace the
decoder (chunks arrive faster than MediaSource can append them);
queue absorbs the burst and drains on each updateend event. Cleanup
is idempotent (resolved flag) so error / cancel paths don't double-
resolve.
jonasneves added a commit that referenced this pull request May 24, 2026
Three fixes for the introduce demo:

1. TTS reading punctuation aloud. Expressive instructions ("peppy,
   excited") can make gpt-4o-mini-tts vocalize trailing periods as
   "dot" — and once the Cache API stores a bad render, it replays
   forever. Removed every trailing period and ellipsis from the
   introduce strings; the words carry their own prosody. Bumped
   CACHE_NAME from tts-v1 → tts-v2 to invalidate the accumulated bad
   renders (any phrase that got cached with a "dot" pronunciation gets
   re-rendered fresh on next play). Pattern: never end a TTS string
   with a period unless you specifically want the model to render
   a pause + close.

2. "Follow you around" ending. Previous version did a settle-spin
   after the line, which (a) repeated the spin gesture from two lines
   earlier and (b) didn't illustrate "following" at all. Replaced
   with a smooth S-curve forward (1.4s gentle right arc + 1.4s gentle
   left arc), both moving forward — reads as "I'd come this way to
   find you" instead of "I'd spin in place."

3. "Spin" line that wasn't audible. Same root cause as #1 — the
   "spin..." render (with ellipsis) got cached badly and replayed
   silent or glitched. Clean text "spin" + cache bump fixes it on
   next play.

General principle worth keeping: tts-v2 cache namespace is the kill
switch for any future bad-render accumulation; bump it whenever model,
voice, instructions, or expressive style changes (or as a hammer
when you see weird audio).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants