Skip to content

Preview subsystem rework: non-blocking resumable send + graceful degradation#24

Merged
ewowi merged 10 commits into
mainfrom
next-iteration
Jun 24, 2026
Merged

Preview subsystem rework: non-blocking resumable send + graceful degradation#24
ewowi merged 10 commits into
mainfrom
next-iteration

Conversation

@ewowi

@ewowi ewowi commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

This branch grew well beyond its original title. It started as a control-validation refactor and became a large 3D-preview subsystem rework plus a batch of surrounding improvements: the preview now streams full-resolution frames to the browser without ever stalling the LED render tick, degrades gracefully on a slow link, and previews a 128² grid smoothly on all three boards (classic ESP32, S3, P4). Alongside it: firmware-identity controls moved to their own card, an audio level-display fix, a mDNS crash fix, a two-chapter Getting Started guide, and a browser-faithful preview-health probe.

What landed

Preview subsystem (the bulk of the branch)

  • Non-blocking, resumable WebSocket send. The full-resolution colour frame streams from the producer buffer a memory-adaptive chunk per transport tick (drained on loop20ms, off the render hot path); the next frame starts only when the previous one drained, so the effective frame rate self-limits to what the link sustains — no loop stall, no connection drop. Drop-new backpressure (a frame offered while one is in flight is dropped, the in-flight one kept).
  • Graceful degradation under transport backpressure (architecture.md names it): shed frame rate first, then resolution via a closed-form spatial-lattice downsample, with a memory-derived point cap so a tighter board downsamples sooner. The render loop is charged only a bounded slice per tick; a wedged client is closed past a spin budget and reconnects.
  • Zero preview buffers — drivers read the producer buffer directly; the preview decoupled from core via the domain-neutral BinaryBroadcaster interface.
  • Responsive UI — docked split-pane preview / floating draggable PiP; the grid draws the instant the coordinate table arrives; effective fps shown in the status bar.

Core / domain

  • Control validation moved onto the control (ControlDescriptor::validate), removing the bespoke SET_DEVICE_MODEL Improv RPC; deviceModel is now set like any catalog default (APPLY_OP / POST /api/control).
  • Firmware-identity controls (version, build, firmware, firmwarePartition) moved SystemModule → FirmwareUpdateModule, where they belong next to the OTA status.
  • mDNS browse crash fixed — replaced the async-handle API (which raced the IDF expiry timer on a UI refresh) with a self-contained, throttled synchronous mdnsBrowse.
  • GridLayout serpentine option; Layer measures mapping identity (dense fast path vs LUT); LinesEffect off-by-one sweep fix; NetworkSendDriver light_count slice.
  • Audio level read-out shows the per-second peak of the RMS level (was reading 0 between beats); renamed to level RMS. Algorithm untouched.

Tooling / docs

  • scripts/diag/preview_health.py — browser-faithful preview-stream health probe (reads frames, sends the 25 s keepalive ping, auto-reconnects), wired into MoonDeck's Live tab. Measures what a user watching the preview actually sees.
  • Getting Started reworked into two chapters: install (chapter 1) + a UI tour for new users (chapter 2), with a note that projectMM's engine is reusable beyond lights (FastLED-MM as an example).
  • Module specs, decisions.md lessons, and docs/history/plans/ (shipped plans marked) updated.

Verification

  • Desktop build -Werror clean; 447 unit tests pass; all scenarios pass; spec + platform-boundary checks pass; host Python (4) + JS (6) suites pass.
  • ESP32 builds (esp32s3-n16r8 + classic esp32) clean.
  • Hardware: 128² preview verified streaming on classic / S3 / P4.
  • Pre-merge Reviewer agent over the whole branch diff: architecturally sound, no blockers (its one should-fix — a dead WriteChunk struct — applied).
  • CodeRabbit findings across the branch processed (fixed or accepted-with-reason).

🤖 Generated with Claude Code

deviceModel is now set like any other catalog default — an APPLY_OP `set System.deviceModel` op (installer over serial) or POST /api/control (MoonDeck) — instead of a bespoke SET_DEVICE_MODEL Improv RPC. Its printable-ASCII rule moves onto the control as a per-control validator on ControlDescriptor, so every write path (HTTP, serial APPLY_OP, persistence load) runs the same check in one backend place. This is architecturally cleaner — any control that needs input rules declares them on itself — and it removes a whole vendor RPC + its device-side handler, buffers, and JS frame builder. SET_TX_POWER stays a dedicated RPC because it genuinely must land before the radio associates; deviceModel has no such ordering constraint, so it rides the generic transport (like deviceName already does).

KPI: 16384lights | PC:365KB | tick:112/87/111/9/1/313/37/15/18/111/11us(FPS:8928/11494/9009/111111/1000000/3194/27027/66666/55555/9009/90909) | src:97(19660) | test:68(10232) | lizard:76w

(No ESP32 tick line: this change touches only Improv provisioning + the validator hook + boot wiring — no render-path code — so the device tick is unchanged from the last committed ESP32 value, tick:5155us(FPS:193). ESP32 esp32 + esp32p4-eth both build clean under -Werror.)

Core:
- Control: ControlDescriptor gains an optional per-control validator (`bool (*validate)(const char*)`, Text/Password only). applyControlValue runs it on the incoming value BEFORE the write and returns Malformed on reject (prior value preserved, no partial write), so the check covers HTTP, APPLY_OP-over-serial, and persistence load in one place. addText takes an optional validator arg.
- SystemModule: deviceModel's printable-ASCII rule (1..31, 0x20-0x7E, no NUL) is now that validator (validateDeviceModel), declared on the control via addText. Removed setDeviceModel().
- ImprovProvisioningModule: dropped the SET_DEVICE_MODEL pending buffer + per-tick poll + the deviceModelOut init args; deviceModel arrives via the APPLY_OP poll like any other control. systemModule wiring stays (GET_DEVICE_INFO device name).

Platform:
- platform_esp32_improv.cpp: removed the SET_DEVICE_MODEL (0xFE) handler, its dispatch case, and the deviceModelOut/deviceModelReady g_improv fields + init args.
- platform.h + platform_desktop.cpp: improvProvisioningInit drops the deviceModelOut/deviceModelOutLen/deviceModelReady params.
- main.cpp: comment-only — deviceModel injection now goes through the apply-core + validator, not a dedicated RPC.

UI:
- improv-frame.js: removed the IMPROV_CMD_SET_DEVICE_MODEL export.
- install-orchestrator.js: removed encodeSetDeviceModelPayload + sendSetBoardFrame; pushDefaultsOverSerial sends only APPLY_OP ops now (the deviceModel name is one of the catalog `set` ops). The TX-power frame still precedes provisioning.

Tests:
- unit_Control_apply_absent_key: a per-control validator accepts valid input, rejects a raw control byte / empty value (Malformed), preserves the prior value on reject; a no-validator Text control accepts anything that fits.
- unit_HttpServerModule_apply: a validated Text control (Tag) is enforced THROUGH the apply-core — valid value applies via both applySetControl (HTTP) and applyOp (APPLY_OP-over-serial), bad bytes / empty rejected with the prior value kept. Proves the serial path is guarded with no per-transport special-casing.

Docs / CI:
- CLAUDE.md: extended "Present tense only" to ban absence-narration ("no longer", "anymore", "formerly", "X was removed") — state what is, not what stopped being; decisions.md lessons exempt (the before/after contrast is the lesson).
- ImprovProvisioningModule.md + SystemModule.md: deviceModel arrives via APPLY_OP set + the per-control validator; removed the SET_DEVICE_MODEL RPC from the wire contract.
- install/README.md, index.html, landing/index.html: APPLY_OP-only wording; landing page links to the getting-started guide; installer credit line drops the inaccurate "secure handshake" phrasing.
- backlog: removed the per-control-validator-hook item (shipped here).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Removes the SET_DEVICE_MODEL vendor Improv RPC and routes deviceModel through APPLY_OP with a new per-control validate callback on ControlDescriptor. Replaces the scatter-gather WebSocket preview send with a non-blocking begin/push/end staged API and spatial-lattice downsampling. Switches mDNS discovery to a single synchronous mdnsBrowse call. Adds a responsive docked/PiP preview UI with two-pass WebGL rendering. Adds MoonDeck inject-defaults tooling, scenario runner optional-skip propagation, and driver light-count caps.

Changes

Per-control text validation and SET_DEVICE_MODEL removal

Layer / File(s) Summary
Control validator contract, apply path, and tests
src/core/Control.h, src/core/Control.cpp, test/unit/core/unit_Control_apply_absent_key.cpp, test/unit/core/unit_HttpServerModule_apply.cpp, docs/moonmodules/core/SystemModule.md, docs/backlog/backlog.md
ControlDescriptor gains an optional validate function pointer; addText accepts it; applyControlValue parses into a scratch buffer, calls the validator, returns Malformed on rejection, and only then commits via strncpy; unit tests cover valid, malformed, boundary, and unvalidated paths through both applyControlValue and applyOp.
SystemModule validator wiring and ImprovProvisioningModule cleanup
src/core/SystemModule.h, src/core/ImprovProvisioningModule.h, src/main.cpp
SystemModule replaces setDeviceModel with a static validateDeviceModel registered via addText; ImprovProvisioningModule removes pendingDeviceModel_ fields and the loop1s polling block; main comments describe the new APPLY_OP injection path.
Platform Improv API and ESP32/desktop SET_DEVICE_MODEL removal
src/platform/platform.h, src/platform/esp32/platform_esp32_improv.cpp, src/platform/desktop/platform_desktop.cpp
platform.h drops device-model extension output parameters from improvProvisioningInit; ESP32 Improv removes ImprovTaskState device-model fields, the SET_DEVICE_MODEL handler, and its dispatch path; the desktop stub signature is updated.
Installer flow switched to APPLY_OP-only defaults push
docs/install/improv-frame.js, docs/install/install-orchestrator.js, docs/install/README.md, docs/install/index.html, docs/moonmodules/core/ImprovProvisioningModule.md
IMPROV_CMD_SET_DEVICE_MODEL constant removed; install-orchestrator drops the SET_DEVICE_MODEL frame helpers, import, and UART settle wait, making pushDefaultsOverSerial call sendConfigOverSerial directly; install page copy and ImprovProvisioningModule.md wire-contract are updated.

Non-blocking WebSocket preview: writeSome, staged drain, spatial lattice, generation resend

Layer / File(s) Summary
TcpConnection writeSome API replacing writeChunks
src/platform/platform.h, src/platform/esp32/platform_esp32.cpp, src/platform/desktop/platform_desktop.cpp
WriteResult enum and writeChunks scatter-gather API are removed; writeSome(data, len) performs a single non-blocking write and returns byte count, 0 (would-block), or -1 (error) on both ESP32 (lwip) and desktop (POSIX).
BinaryBroadcaster staged frame API and HttpServerModule implementation
src/core/BinaryBroadcaster.h, src/core/HttpServerModule.h, src/core/HttpServerModule.cpp, docs/moonmodules/core/HttpServerModule.md, docs/history/plans/Plan-20260622 - Non-blocking preview send.md, docs/history/plans/Plan-20260623 - Stream preview from buffers, zero preview buffers.md
BinaryBroadcaster replaces broadcastBinary with bool-returning begin/push/end streaming frame methods and adds clientGeneration; HttpServerModule implements them using a sendAllOrClose helper with bounded non-blocking spin; WS upgrade increments wsClientGeneration_; plan docs record architecture decisions.
PreviewDriver spatial-lattice downsampling, generation resend, and coordPending retry
src/light/drivers/PreviewDriver.h, docs/moonmodules/light/drivers/PreviewDriver.md, test/unit/light/unit_PreviewDriver.cpp, docs/history/decisions.md
lastCoordTime_ replaced by lastClientGen_ for generation-driven coordinate table sends; buildAndSendCoordTable uses per-axis lattice sampling with a PSRAM-dependent cap; coordPending_ gates RGB frame sending; sendFrame returns bool; unit tests cover backpressure retry and no-timer-resend behaviors.

mDNS synchronous browse API

Layer / File(s) Summary
platform.h and platform implementations for synchronous mdnsBrowse
src/platform/platform.h, src/platform/esp32/platform_esp32.cpp, src/platform/desktop/platform_desktop.cpp, docs/history/decisions.md
mdnsBrowseStart/Poll/Stop are replaced by a single mdnsBrowse(service, proto, timeoutMs, cb, user) declaration; ESP32 performs one synchronous PTR query/iterate/free; desktop is a no-op; decisions doc captures the async-handle crash rationale.
DevicesModule throttled synchronous browse step
src/core/DevicesModule.h
DevicesModule replaces the mdnsQuerying_ async state machine with mdnsTick_-based throttling that calls mdnsBrowse once per cadence for the current service type.

Responsive docked split-pane and draggable PiP preview UI

Layer / File(s) Summary
Workspace HTML structure and docked/PiP CSS layout
src/ui/index.html, src/ui/style.css, docs/history/plans/Plan-20260622 - Responsive split-pane preview with draggable PiP.md
index.html introduces workspace/preview-pane/preview-bar with dock, close, show controls, and labels canvas overlay; CSS defines mode-docked two-column sticky/scroll layout and mode-pip fixed floating pane with drag states, grip, and dismissed/restore affordances.
preview3d setupLayout controller and WebGL rendering updates
src/ui/preview3d.js, src/ui/app.js
Adds setupLayout with guarded localStorage prefs, mode switching, PiP drag/clamp/corner snap, dot-size and sequence-number controls; WebGL gains two-pass lit/unlit LED rendering, wireframe bounding box, depth-correct point sizing, cursor-anchored zoom, and u32 count parsing for 0x02/0x03 frames with desync guard; app.js wires setupLayout.

MoonDeck inject defaults, scenario runner optional skipping, and driver light-count caps

Layer / File(s) Summary
MoonDeck board push helper and inject-defaults button
scripts/moondeck_ui/app.js, scripts/moondeck_ui/style.css
pushBoard POSTs to /api/push-board with a 10s timeout; the inject-defaults button validates selection, disables during request, shows ok/fail feedback; board-row CSS becomes a flex container so the picker and button share the row.
Scenario runner optional module skip propagation
scripts/scenario/run_live_scenario.py
skipped_ids tracks platform-unavailable optional add_module IDs; subsequent optional measure/set_control/remove_module steps on skipped IDs become no-ops without issuing REST calls.
NetworkSendDriver light_count control and RmtLedDriver pin-count cap
src/light/drivers/NetworkSendDriver.h, src/light/drivers/RmtLedDriver.h, docs/moonmodules/light/drivers/NetworkSendDriver.md
NetworkSendDriver adds a lightCount field and light_count UI control limiting transmitted frame size to N lights; RmtLedDriver accumulates txLightCount_ from pinCounts_ in parseConfig and uses it to cap encoded lights per loop.

Documentation and scenario fixture refreshes

Layer / File(s) Summary
Writing policy and landing page updates
CLAUDE.md, docs/landing/index.html
CLAUDE.md expands the present-tense rule to forbid absence-narration with explicit exceptions for backlog/history/decisions; docs/landing adds a getting-started note link and expanded links section.
Scenario observed metric and timestamp refresh
test/scenarios/light/scenario_*.json
All affected scenario JSON fixtures receive updated tick_us, free_heap, max_alloc_block, and at timestamps reflecting June 22, 2026 measurements across pc-macos, esp32s3-n16r8, and esp32p4-eth platforms.

Sequence Diagram(s)

sequenceDiagram
  participant Installer as Web Installer
  participant ImprovSerial as Improv Serial (ESP32)
  participant applyOp as HttpServerModule::applyOp
  participant applyControlValue
  participant SystemModule

  Installer->>ImprovSerial: APPLY_OP frame: set System.deviceModel
  ImprovSerial->>applyOp: opOut JSON (via pendingOp_)
  applyOp->>applyControlValue: JSON string value
  applyControlValue->>SystemModule: validateDeviceModel(scratch)
  alt invalid (non-printable / empty / too long)
    SystemModule-->>applyControlValue: false
    applyControlValue-->>applyOp: ApplyResult::Malformed
  else valid
    SystemModule-->>applyControlValue: true
    applyControlValue->>SystemModule: strncpy deviceModel_
    applyControlValue-->>applyOp: ApplyResult::Ok
    applyOp-->>ImprovSerial: ack OK
  end
Loading
sequenceDiagram
  participant PreviewDriver
  participant BinaryBroadcaster as HttpServerModule BinaryBroadcaster
  participant sendAllOrClose
  participant WsClient as WebSocket Client

  PreviewDriver->>BinaryBroadcaster: beginBinaryFrame(totalLen)
  BinaryBroadcaster->>sendAllOrClose: write WS frame header
  sendAllOrClose->>WsClient: writeSome (non-blocking)
  PreviewDriver->>BinaryBroadcaster: pushBinaryFrame(coordBytes)
  BinaryBroadcaster->>sendAllOrClose: write coord payload
  sendAllOrClose->>WsClient: writeSome (bounded spins)
  PreviewDriver->>BinaryBroadcaster: endBinaryFrame()
  BinaryBroadcaster-->>PreviewDriver: false = backpressure → increase previewStride_
  note over PreviewDriver: coordPending_ gates RGB send until endBinaryFrame() returns true
  PreviewDriver->>BinaryBroadcaster: beginBinaryFrame + push RGB frame
  BinaryBroadcaster-->>PreviewDriver: true = all clients accepted
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • MoonModules/projectMM#2: Both PRs modify ControlDescriptor in src/core/Control.h — this PR adds a validate callback field and validator-aware addText/write-path enforcement to the same struct changed in #2.
  • MoonModules/projectMM#22: Both PRs modify deviceModel identity plumbing in SystemModule/ImprovProvisioningModule; #22 introduced SET_DEVICE_MODEL which this PR removes in favor of APPLY_OP with per-control validation.
  • MoonModules/projectMM#23: Directly continues the device-model-via-APPLY_OP work from #23, touching the same installer framing (docs/install/improv-frame.js, install-orchestrator.js) and device-side provisioning paths.

Poem

🐇 Hop, hop — no more SET_DEVICE_MODEL in sight,
The model glides in via APPLY_OP, light!
The preview floats in PiP or docks with grace,
Non-blocking sends keep latency in place.
mDNS browses once and hops away,
A bunny's tidy warren — hip hooray! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the major preview streaming rewrite and its backpressure/degradation focus.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch next-iteration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/moonmodules/core/ImprovProvisioningModule.md`:
- Line 30: The documentation states that the installer awaits each ack for
APPLY_OP operations, but the actual implementation in install-orchestrator.js
uses open-loop pacing without reading ack responses. Update the APPLY_OP
description in the ImprovProvisioningModule.md file to accurately reflect that
the installer does not wait for acks and instead uses open-loop pacing for
sending operations, removing or correcting the phrase "the installer awaits each
ack" to match the actual behavior documented in install-orchestrator.js lines
144-153.

In `@docs/moonmodules/core/SystemModule.md`:
- Line 19: The deviceModel bullet point in SystemModule.md contains historical
narration at the end that violates the documentation's present-state-only rule.
Remove the trailing sentence beginning with "Was its own `BoardModule` child
until folded into System" from the deviceModel bullet description, keeping all
the current operational details about how it functions but trimming the
historical context about its past organizational structure.

In `@src/core/Control.cpp`:
- Around line 275-279: The scratch buffer in the Text/Password control
validation path is fixed at 64 bytes, causing silent truncation when maxLen
exceeds 64. Instead of using a fixed char scratch[64] array, allocate the
scratch buffer dynamically to accommodate the actual maxLen value. The
mm::json::parseString call should parse into a buffer sized to hold the maximum
length specified by maxLen, not hardcoded to sizeof(scratch). This ensures
validated text values larger than 63 characters are properly preserved rather
than silently truncated.

In `@src/core/Control.h`:
- Around line 181-187: The validate function pointer is being added directly to
the ControlDescriptor structure, which unnecessarily increases memory overhead
for every control descriptor instance even when validation is not needed,
conflicting with the <16 bytes per descriptor guideline. Instead of adding this
pointer to the ControlDescriptor structure itself, implement a separate
validation registry or lookup mechanism that only stores validators for controls
that actually require them, keeping the descriptor structure lean and only
allocating validation storage when necessary.

In `@test/unit/core/unit_Control_apply_absent_key.cpp`:
- Around line 119-142: The test for applyControlValue with the
acceptPrintableAscii validator is missing boundary checks for the maximum
allowed string length. The validator accepts 1 to 31 characters maximum (31
chars + 1 null terminator in the 32-byte buffer), but the current test only
validates empty strings and control-byte rejection. Add two additional CHECK
statements after the existing tests: one that verifies a 31-character valid
string is accepted and applied successfully, and another that verifies a
32-character string is rejected as Malformed with the prior value preserved.
These boundary tests should use the same applyControlValue and applyPolicy calls
as the existing test cases to ensure length validation is properly exercised.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ae6f1c93-96c6-4caa-aa8c-ba80793ce62f

📥 Commits

Reviewing files that changed from the base of the PR and between 57be007 and 41303e4.

📒 Files selected for processing (19)
  • CLAUDE.md
  • docs/backlog/backlog.md
  • docs/install/README.md
  • docs/install/improv-frame.js
  • docs/install/index.html
  • docs/install/install-orchestrator.js
  • docs/landing/index.html
  • docs/moonmodules/core/ImprovProvisioningModule.md
  • docs/moonmodules/core/SystemModule.md
  • src/core/Control.cpp
  • src/core/Control.h
  • src/core/ImprovProvisioningModule.h
  • src/core/SystemModule.h
  • src/main.cpp
  • src/platform/desktop/platform_desktop.cpp
  • src/platform/esp32/platform_esp32_improv.cpp
  • src/platform/platform.h
  • test/unit/core/unit_Control_apply_absent_key.cpp
  • test/unit/core/unit_HttpServerModule_apply.cpp
💤 Files with no reviewable changes (3)
  • docs/install/improv-frame.js
  • docs/backlog/backlog.md
  • src/platform/desktop/platform_desktop.cpp

Comment thread docs/moonmodules/core/ImprovProvisioningModule.md Outdated
Comment thread docs/moonmodules/core/SystemModule.md Outdated
Comment thread src/core/Control.cpp Outdated
Comment thread src/core/Control.h
Comment on lines +181 to +187
// Optional per-control input validator (Text/Password only; nullptr = accept anything
// that fits the buffer). applyControlValue calls it on the incoming string BEFORE the
// write and returns ApplyResult::Malformed on reject, so the check covers EVERY write
// path — HTTP /api/control, APPLY_OP over serial, persistence load — in one place.
// A control with a wire-format constraint (e.g. deviceModel's printable-ASCII rule)
// declares it here, so the rule lives with the control, not with any one transport.
bool (*validate)(const char* value) = nullptr;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Avoid widening ControlDescriptor for every control entry.

Adding a raw function pointer to every descriptor increases per-control memory permanently, even for controls that never use validation. This pushes the structure farther from the ESP32 descriptor-size target and scales poorly with control count.

As per coding guidelines: "Target <16 bytes per control descriptor on ESP32."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/Control.h` around lines 181 - 187, The validate function pointer is
being added directly to the ControlDescriptor structure, which unnecessarily
increases memory overhead for every control descriptor instance even when
validation is not needed, conflicting with the <16 bytes per descriptor
guideline. Instead of adding this pointer to the ControlDescriptor structure
itself, implement a separate validation registry or lookup mechanism that only
stores validators for controls that actually require them, keeping the
descriptor structure lean and only allocating validation storage when necessary.

Source: Coding guidelines

Comment thread test/unit/core/unit_Control_apply_absent_key.cpp
Reworks the UI layout so the 3D preview no longer crowds the module cards, and processes the CodeRabbit review of the per-control-validator commit. On a wide screen the preview and cards sit side by side; on a narrow screen (or when popped out) the preview becomes a small draggable picture-in-picture and the cards take the full width, so configuration always has room.

KPI: 16384lights | PC:365KB | tick:111/88/112/9/1/310/36/15/18/111/11us(FPS:9009/11363/8928/111111/1000000/3225/27777/66666/55555/9009/90909) | ESP32:tick:3661us(FPS:273) | heap:8329KB | src:97(19663) | test:68(10251) | lizard:76w

(ESP32 tick captured live from the S3 running this firmware. The preview rework is front-end only; the device tick reflects the testbench render config, unchanged by it.)

UI:
- index.html / style.css: .content's main area is a .workspace with two modes. .mode-docked (wide ≥960px) lays the preview pane beside a fixed 480px card column that scrolls on its own — the preview is sticky and stable, replacing the old scroll-shrink stack that ate vertical space above the cards. .mode-pip (narrow, or popped out) detaches the preview into a fixed, draggable, corner-snapping picture-in-picture (a drag bar with dock/close), cards full-width; a re-show pill brings a dismissed PiP back.
- preview3d.js: setupShrink → setupLayout — width-driven mode toggle (matchMedia + rAF-throttled resize), pointer-drag with viewport clamp + corner snap, dock/close/show wiring, localStorage-persisted corner/dismissed/forcePip. The single WebGL canvas is restyled in place across modes (never duplicated), and the drag handle is a separate element so it doesn't fight the camera-orbit pointer handler. The existing resize re-fit keeps the canvas correct through dock↔PiP.
- app.js: setupShrink() → setupLayout() call site.

Core:
- Control.cpp: the validator scratch buffer is sized to the control's full buffer (maxLen ≤ 255), not a fixed 64, so a long-but-valid Text value reaches the validator intact rather than being silently truncated first. (🐇 CodeRabbit)

Tests:
- unit_Control_apply_absent_key: added the validator length boundary — a 31-char value is accepted, a 32-char value is rejected (Malformed) with the prior value preserved, using a 64-byte buffer so the validator's own length check (not parse truncation) is what rejects. (🐇 CodeRabbit)

Docs / CI:
- ImprovProvisioningModule.md: APPLY_OP pacing is open-loop (a fixed inter-op delay, not an ack read-back) with idempotent ops; corrected the inaccurate "awaits each ack". (🐇 CodeRabbit)
- SystemModule.md: dropped the trailing "Was its own BoardModule" history (present-tense rule). (🐇 CodeRabbit)
- docs/history/plans: saved the responsive-preview plan.

Reviews:
- 🐇 scratch buffer truncation (Control.cpp) — fixed.
- 🐇 "awaits each ack" inaccuracy (ImprovProvisioningModule.md) — fixed (open-loop pacing).
- 🐇 BoardModule history narration (SystemModule.md) — fixed (removed).
- 🐇 validator length-boundary test gap — fixed (31-accept / 32-reject).
- 🐇 validate pointer on ControlDescriptor (>16B) — accepted: the descriptor is already ~28B on ESP32 (int32 min/max + flags, predating the 16B target); a validator side-table to reclaim 4 bytes on a fixed-capacity array is a bespoke lookup that fails Common-patterns-first. The pointer-on-descriptor is the recognizable, minimal choice.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/moonmodules/core/ImprovProvisioningModule.md`:
- Line 30: The APPLY_OP description in the ImprovProvisioningModule.md
documentation contains backward-looking language that narrates the absence of
old features. Remove or rewrite the final parenthetical statement that begins
with "(The device-side catalog fetch + the old `?deviceModel=` handoff are
removed..." to focus only on present-state functionality. Either delete this
sentence entirely or rephrase it to describe current capabilities (such as using
MoonDeck on the LAN) without referencing what has been removed, as per the
documentation style guidelines requiring present-state wording only.

In `@src/ui/index.html`:
- Around line 37-42: The preview control buttons with IDs preview-dock,
preview-close, and preview-reset use only symbol icons and title attributes,
which is insufficient for screen reader accessibility. Add aria-label attributes
to each of these buttons to provide explicit accessible names that screen
readers can reliably announce, ensuring the accessible names clearly describe
the actions (e.g., "Pop preview out or dock", "Hide preview", "Reset camera").

In `@src/ui/preview3d.js`:
- Around line 205-214: The preview-close button currently persists the dismissed
state even when the preview is in docked mode, causing confusing behavior when
the narrow mode auto-PiP is triggered later. Find the click handler for the
preview-close element (referenced at lines 234-238 according to the comment) and
add a condition to only set dismissed=true when the preview is actually in PiP
mode (check the pip variable state or similar condition that indicates PiP
mode). This ensures that closing the preview in docked mode does not persist a
hidden state that will affect the appearance when auto-PiP is activated in
narrow mode.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 652e242d-56e4-4cda-8856-24ae9ca7e1e7

📥 Commits

Reviewing files that changed from the base of the PR and between 41303e4 and 37be156.

📒 Files selected for processing (10)
  • docs/history/plans/Plan-20260622 - Responsive split-pane preview with draggable PiP.md
  • docs/moonmodules/core/ImprovProvisioningModule.md
  • docs/moonmodules/core/SystemModule.md
  • src/core/Control.cpp
  • src/ui/app.js
  • src/ui/index.html
  • src/ui/preview3d.js
  • src/ui/style.css
  • test/scenarios/light/scenario_Driver_mutation.json
  • test/unit/core/unit_Control_apply_absent_key.cpp

Comment thread docs/moonmodules/core/ImprovProvisioningModule.md Outdated
Comment thread src/ui/index.html Outdated
Comment thread src/ui/preview3d.js
Fixes a crash where a UI refresh could reboot the device, sharpens the 3D preview rendering (crisp LEDs, off-LEDs now visible, a bounding box), and adds an explicit "inject defaults" button to MoonDeck. The preview no longer holds the device's mDNS browse handle across ticks, which was the crash; discovery still works via a throttled synchronous query.

KPI: 16384lights | PC:365KB | tick:118/90/134/9/2/379/45/24/22/124/15us(FPS:8474/11111/7462/111111/500000/2638/22222/41666/45454/8064/66666) | ESP32(S3,REST):tick:3215us(FPS:311) | src:97(19677) | test:68(10283) | lizard:76w

(ESP32 tick read from the live S3's REST, not the serial KPI capture — the S3's USB serial dropped mid-session; the device stayed up on the network, uptime 4135s, confirming the mDNS fix holds. P4 also stable: 2452us/407fps, uptime 2715s.)

Core:
- platform_esp32.cpp + platform.h: replaced the async mDNS browse (mdnsBrowseStart/Poll/Stop — held a query handle across ticks and polled it) with one synchronous mdnsBrowse(service, proto, timeoutMs, cb). The async handle's internal queue is owned + freed by the mDNS component's own task when the query window expires; a poll landing after that expiry asserted on a null queue (xQueueSemaphoreTake queue.c:1709), rebooting the device — intermittently, on a UI refresh, grid-size-sensitive (a longer tick widened the window). A self-contained synchronous call holds no handle across ticks, so the race can't exist. mDNS advertising (<deviceName>.local) is untouched — only the peer-discovery browse changed.
- DevicesModule: stepMdns calls the synchronous mdnsBrowse, throttled to one ~60ms query every ~8th tick (mdns_query_ptr blocks its full timeout and loop1s is charged to the tick, so an unthrottled per-tick query would tank FPS; one brief hiccup every ~8s is invisible for discovery). Removed the mdnsQuerying_ state.
- BinaryBroadcaster + HttpServerModule: clientGeneration() bumps on each new WS client so PreviewDriver re-sends the coordinate table the moment a page connects (a refresh shows the preview at once).
- PreviewDriver: send the coordinate table only when geometry changes (onBuildState) or a new client connects — never on a timer (the old ~1Hz rebuild was a hot-path waste).

Light domain:
- PreviewDriver: frame builder now sends EVERY light (dark ones too) so the shader can draw off LEDs as faint placeholder rings — the grid shape stays visible and an all-off scene isn't a black screen.

UI:
- preview3d.js: crisp-disc shader (a ~1px AA rim instead of a half-radius fade that read as blurry at small grids); off/near-black LEDs render as a dim placeholder ring; a faint wireframe bounding box around the light volume (a second line program). setupShrink → setupLayout already shipped; this is the render-quality pass.
- index.html: preview button aria-labels.

Scripts / MoonDeck:
- moondeck_ui: explicit "inject defaults" button per discovered device — re-pushes the SELECTED device-model's full deviceModels.json config on demand (reuses POST /api/push-board → _push_board_to_device), distinct from the implicit on-pick push. Inline injecting…/injected ✓/failed ✗ feedback.

Tests:
- unit_PreviewDriver: a new client (clientGeneration bump) forces a coord-table re-send; advancing the clock several seconds with no client change asserts NO re-send (guards against re-introducing a timer-based rebuild).
- scenario JSONs: live observed.* write-backs from running perf_light / perf_full / GridLayout_resize on the S3 and P4.

Docs:
- decisions.md: the mDNS async-handle-lifecycle-race lesson (don't hold a vendor library's async handle across your event loop; it races the library's internal timers) and the preview coord-cadence keepalive-timer lesson.
- ImprovProvisioningModule.md: present-tense wording fix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
docs/moonmodules/core/ImprovProvisioningModule.md (1)

30-32: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep this paragraph in present tense.

The no HTTP, no browser handoff, and there's no STA phrasing still reads as absence narration. This is the same doc-style cleanup already called out above; please rephrase those clauses to describe the current flow directly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/moonmodules/core/ImprovProvisioningModule.md` around lines 30 - 32,
Rephrase the negative absence constructions in this documentation paragraph to
describe the actual present-tense flows directly. Specifically, convert phrases
like "no HTTP and no browser handoff", "there's no STA to provision", and
"aren't linked" into active descriptions of what actually happens or what is
available on eth-only builds. Replace absence narration with direct statements
of the current behavior and capabilities, keeping the entire paragraph in
present tense throughout.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/moondeck_ui/app.js`:
- Around line 893-899: The pushBoard function lacks timeout protection, which
can wedge the button indefinitely if a network request stalls. Add an
AbortController to the fetch call in pushBoard with a timeout (e.g., 10 seconds)
that aborts the request if it exceeds the duration, ensuring the abort triggers
the existing catch handler to call onDone(false) and allow the button to be
re-enabled. Pass the abort signal to the fetch request options.
- Around line 893-899: The pushBoard function is incorrectly using r.ok (HTTP
status) to determine operation success, but the /api/push-board endpoint returns
the actual success status in the JSON response body as {"ok": ...}. Since HTTP
200 can be returned even when the operation failed, this causes false success
reports. Fix this by parsing the response as JSON using r.json() and checking
the ok field from the parsed JSON object, then passing that value to onDone.
Apply the same fix to the other location mentioned at lines 926-928.

In `@src/platform/esp32/platform_esp32.cpp`:
- Around line 1005-1007: Update the documentation comments in the mDNS service
browse section to reflect the current synchronous implementation instead of the
removed async pattern. Remove or update the comments at lines 1005–1006 that
reference the old async behavior (mentions of "poll a few ms each tick",
"mdnsSearch_", and "cancelMdnsBrowse"). Additionally, update the section header
at line 1002 that labels this section as "(async, non-blocking)" to accurately
describe the current synchronous behavior, ensuring the documentation matches
the actual implementation.

In `@src/ui/preview3d.js`:
- Around line 480-483: The comment block describes adding "plus half a cell" to
the half-extents calculation to enclose the outermost LED centres, but the
actual code computing hx = previewBox_.x / 2 / md does not implement this
offset. Update the comment to accurately describe what the code actually
does—remove the mention of "plus half a cell" since that adjustment is not
present in the hx calculation, making the documentation match the actual
implementation.

---

Duplicate comments:
In `@docs/moonmodules/core/ImprovProvisioningModule.md`:
- Around line 30-32: Rephrase the negative absence constructions in this
documentation paragraph to describe the actual present-tense flows directly.
Specifically, convert phrases like "no HTTP and no browser handoff", "there's no
STA to provision", and "aren't linked" into active descriptions of what actually
happens or what is available on eth-only builds. Replace absence narration with
direct statements of the current behavior and capabilities, keeping the entire
paragraph in present tense throughout.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3aee67fe-270c-4cd5-9813-6759c2a6be24

📥 Commits

Reviewing files that changed from the base of the PR and between 37be156 and 0c9fbd3.

📒 Files selected for processing (18)
  • docs/history/decisions.md
  • docs/moonmodules/core/ImprovProvisioningModule.md
  • scripts/moondeck_ui/app.js
  • scripts/moondeck_ui/style.css
  • src/core/BinaryBroadcaster.h
  • src/core/DevicesModule.h
  • src/core/HttpServerModule.cpp
  • src/core/HttpServerModule.h
  • src/light/drivers/PreviewDriver.h
  • src/platform/desktop/platform_desktop.cpp
  • src/platform/esp32/platform_esp32.cpp
  • src/platform/platform.h
  • src/ui/index.html
  • src/ui/preview3d.js
  • test/scenarios/light/scenario_GridLayout_resize.json
  • test/scenarios/light/scenario_perf_full.json
  • test/scenarios/light/scenario_perf_light.json
  • test/unit/light/unit_PreviewDriver.cpp

Comment thread scripts/moondeck_ui/app.js
Comment thread src/platform/esp32/platform_esp32.cpp Outdated
Comment thread src/ui/preview3d.js Outdated
The 3D preview now streams every light at high resolution (up to 16K and beyond) over WebSocket without freezing, tearing, or stalling the render loop — large grids that previously showed nothing or a garbled image now render. The preview send was reworked from a blocking all-or-nothing socket write on the render task into a staged, non-blocking drain with adaptive resolution. Also bundles this session's preview UX polish (filled-panel dots, sequence numbers, cursor/deep zoom, grey-circle off-LEDs, one control bar, dot-size slider).

KPI: 16384lights | PC:365KB | tick:121/92/123/9/1/338/36/16/19/118/11us(FPS:8264/10869/8130/111111/1000000/2958/27777/62500/52631/8474/90909) | ESP32 (P4-eth, bench): tick~2.6ms idle / streams 16K live; src:97(19807) | test:68(10349) | lizard:76w

Core:
- HttpServerModule: broadcastBinary now STAGES one frame into an owned buffer and returns (never blocks the render task on the socket); new drainWsSends() on loop20ms streams it to all clients a slice per tick via the new non-blocking platform::TcpConnection::writeSome. Backpressure: a new frame is dropped while the previous drains (broadcastBinary returns bool); lastDrainTicks() reports per-frame drain latency. 64-bit WS length form for frames >64KB. A stuck client (no progress ~3s) is closed, never spliced. This is the producer→consumer seam the architecture's two-task split will host.
- platform: added writeSome (non-blocking partial write); removed the now-dead writeChunks + WriteResult enum + MAX_WRITE_CHUNKS from platform.h and both platform impls; dropped the now-unused <sys/uio.h> includes.
- BinaryBroadcaster: broadcastBinary returns bool (accepted/dropped) + new lastDrainTicks() — the producer's adaptive-resolution signal.

Light domain:
- PreviewDriver: RAM-derived MAX_PREVIEW_POINTS (131072 PSRAM / 16384 no-PSRAM) replacing the fixed 1800; spatial-lattice downsample kept as the fallback above the cap; wire count widened u16->u32 (>65535-light HUB75 walls). Adaptive downscaling driven by drain LATENCY (not dropped frames): sustained high latency coarsens the lattice, a low-latency stretch refines back, with hysteresis; the factor rides the coord header's stride field to the browser. Coord table is retried when dropped under backpressure and colour frames are withheld until it lands, so a rebuild can never desync the stream and freeze the preview.

UI:
- preview3d.js: parse u32 count + the new 10/7-byte headers; skip a colour frame whose count != the coord-table count (mid-rebuild) to prevent a scrambled mapping; "preview 1/N · link limited" status when the device downscales; plus this session's filled-panel dot sizing (cube-root pitch for 3D), fit-to-bulb sequence numbers, cursor-anchored + deep zoom, soft grey-circle off-LED placeholders, lit-on-top two-pass draw.
- index.html/style.css: dot-size slider + preview-status element + all preview controls consolidated into one bar; ⌖ reset now resets camera, dot size, numbers, and docked/PiP layout (browser-local, no backend).

Tests:
- unit_PreviewDriver: u32 headers, RAM-derived cap + spatial-lattice regularity, and a regression test pinning the coord-table-dropped-under-backpressure freeze the reviewer found (retry + frames-withheld-until-it-lands).

Docs:
- PreviewDriver.md / HttpServerModule.md: wire contracts rewritten for the u32 count, staged drain, RAM cap, spatial lattice, and adaptive downscale (the old single-writev / u16 / 1800-cap / index-downsample text was stale).
- Plan-20260622 - Non-blocking preview send.md saved to docs/history/plans/.

Reviews:
- 👾 BLOCKER (fixed): adaptive rebuild dropped the coord table under backpressure -> browser count-mismatch guard skipped every colour frame -> preview froze for the session; fixed via coord-pending retry + a regression test (the test mock had masked it by always accepting 0x03).
- 👾 (fixed): stale wire-contract specs rewritten; dead <sys/uio.h> includes + stale writeChunks/mdns comments removed; dropStreak_ renamed slowStreak_ (latency, not drops); sampledIdx_ alloc switched to new(std::nothrow) so OOM degrades instead of aborting; seq-number label shows the sent index (i*stride was wrong for a spatial lattice).
- 👾 (deferred): PreviewDriver still holds rgb_/coords_ alongside the HttpServer staging buffer (two frame copies); zero-copy rgb_ removal + a live classic-board 16K RAM-headroom check are a follow-up (need the classic board connected). 195² adaptive-transition churn is a known tuning rough edge, not a freeze.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/moonmodules/core/ImprovProvisioningModule.md (1)

23-30: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Stale vendor-extension count after SET_DEVICE_MODEL removal.

Line 23 states the implementation supports "four standard RPC commands plus three vendor extensions," but with SET_DEVICE_MODEL (0xFE) deleted in this PR, only two vendor extensions remain in the list below (SET_TX_POWER 0xFD, APPLY_OP 0xFC).

📝 Suggested correction
-Both transports speak the same Improv-WiFi serial protocol — frames of `IMPROV` + version byte + type + length + payload + checksum. Full protocol details: <https://www.improv-wifi.com/serial/>. The on-device implementation supports four standard RPC commands plus three vendor extensions:
+Both transports speak the same Improv-WiFi serial protocol — frames of `IMPROV` + version byte + type + length + payload + checksum. Full protocol details: <https://www.improv-wifi.com/serial/>. The on-device implementation supports four standard RPC commands plus two vendor extensions:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/moonmodules/core/ImprovProvisioningModule.md` around lines 23 - 30, The
vendor extension count in the opening paragraph is outdated. Update the text
that currently states "four standard RPC commands plus three vendor extensions"
to reflect that only two vendor extensions remain after the removal of
SET_DEVICE_MODEL in this PR. Change "three vendor extensions" to "two vendor
extensions" to align with the documented extensions below (SET_TX_POWER and
APPLY_OP).
src/core/HttpServerModule.cpp (1)

46-58: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Drain queued preview bytes before accepting HTTP work.

Line 51 returns after every accepted HTTP connection, so an HTTP request burst skips drainWsSends() indefinitely. The staged frame stays queued, wsPreviewLen_ remains non-zero, and later broadcastBinary() calls all see backpressure.

Proposed fix
 void HttpServerModule::loop20ms() {
+    // Drain staged preview bytes first; queued preview bytes keep advancing during HTTP bursts.
+    drainWsSends();
+
     // Accept one HTTP connection per tick
     auto conn = server_.accept();
     if (conn.valid()) {
         handleConnection(conn);
         return; // don't broadcast in same tick as accept (WebSocket needs time to process 101)
     }
-
-    // Drain the live-preview frame queued by broadcastBinary() — a slice per tick, never
-    // blocking. This is the consumer side of the preview producer/consumer handoff; the
-    // producer (PreviewDriver) only stages bytes, the transport poll sends them. (When the
-    // §145 two-task split lands, this call moves to the consumer/network task unchanged.)
-    drainWsSends();
 }

Based on learnings, “No UI action or API-call sequence should crash or wedge a running device; robustness is mandatory.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/HttpServerModule.cpp` around lines 46 - 58, The early return
statement in the loop20ms() method after handleConnection(conn) prevents
drainWsSends() from ever being called when HTTP connections are being accepted
in bursts, causing queued preview bytes to remain stuck indefinitely. To fix
this, move the drainWsSends() call to execute before the HTTP connection
accept/handle block so that queued WebSocket preview data is always drained on
each tick regardless of incoming HTTP traffic, ensuring the preview
producer/consumer handoff remains responsive and backpressure is relieved even
during connection bursts.

Source: Learnings

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/history/plans/Plan-20260622` - Non-blocking preview send.md:
- Line 33: In the section describing HttpServerModule::drainWsSends() at line
33, replace the reference to the outdated API name writeChunks with the current
implemented transport API name writeSome to maintain consistency with the actual
shipped interface. The rest of the explanation about the non-blocking behavior
and staging buffer flushing should remain unchanged.

In `@docs/moonmodules/core/HttpServerModule.md`:
- Around line 56-63: In the documentation around the `broadcastBinary` method
and backpressure description, replace the inaccurate "newest-wins" language with
wording that correctly states the in-flight frame is retained and newer frames
are dropped while draining. Additionally, convert all future-tense language
(such as "will later host") to present-tense and eliminate absence-narration to
comply with the style guideline requiring present-tense documentation. Ensure
the prose describes the current behavior and structure positively rather than
describing future intentions or what is absent.

In `@docs/moonmodules/light/drivers/PreviewDriver.md`:
- Line 17: The documentation for the 0x03 coordinate table message describes a
periodic (~1s) rebroadcast mechanism for newly-connected clients, but this does
not match the current implementation which re-sends based on generation change
or pending retry triggers. Update the coordinate table description to remove the
~1s periodic rebroadcast language and instead document that it is sent when the
generation changes or on pending retry attempts, removing the outdated
"newly-connected client catches up" concept.

In `@scripts/moondeck_ui/app.js`:
- Around line 898-904: The promise chain starting with the fetch call to
"/api/push-board" uses .then() and .catch() syntax, which conflicts with the
repository's requirement to use async/await for promise handling. Refactor this
code by converting the entire fetch operation into an async/await structure.
Ensure the fetch call (with AbortSignal.timeout), the response.json() call, and
the callback invocation are all handled using await, and replace the .catch()
block with a proper try/catch statement to handle any errors that occur during
the fetch or JSON parsing.

In `@scripts/scenario/run_live_scenario.py`:
- Around line 425-434: The HTTPError exception handler currently treats all HTTP
errors as skippable for optional steps, but this silently suppresses legitimate
failures like backend errors or permission issues. Instead, inspect the HTTP
status code within the urllib.error.HTTPError exception object. Only treat HTTP
404 (Not Found) responses as true "not present" cases that should be skipped for
optional steps and marked as success. For other HTTP error codes (500, 403,
etc.), either log the error details or re-raise the exception to prevent silent
failure masking.
- Around line 368-378: The code currently treats all HTTPError exceptions the
same way as when a step is explicitly optional, marking any failure as "ok" and
skipped. You need to distinguish between "type unavailable" errors and other
actual failures. In the except urllib.error.HTTPError block, inspect the error
response to check if it specifically indicates the type is unavailable (check
the HTTP status code or response body for the appropriate indicator). Only when
it's a "type unavailable" error should you set step_result["status"] to "ok" and
skip it; for all other HTTPError cases, allow the error to propagate or set
step_result["status"] to "error" to properly surface real issues like bad
parent_id or malformed requests.

In `@src/core/HttpServerModule.cpp`:
- Around line 1145-1147: The code in the HttpServerModule.cpp at the anyClient
check is incorrectly returning false when there are no connected WebSocket
clients, which BinaryBroadcaster consumers interpret as a backpressure signal.
This causes unnecessary producer congestion adaptation when no clients are
connected. Instead of returning false when anyClient is false, restructure the
logic to return true (indicating no backpressure) when there are no clients, and
only return false as a backpressure indicator when there are actual connected
clients experiencing congestion. This ensures idle periods without clients are
not misreported as backpressure events.
- Around line 1211-1246: The wsPreviewAge_ counter resets whenever any byte
moves across all WebSocket clients, allowing a single slow client that trickles
data to indefinitely block frame transmission for all clients. Modify the
stuck-client guard logic to track progress more granularly—either implement
per-client age counters or measure overall frame drain speed—so that clients
draining too slowly over consecutive ticks (not just completely stalled ones)
trigger the kPreviewMaxDrainTicks timeout. When the timeout fires, close any
unfinished clients and clear wsPreviewLen_ to allow broadcastBinary() to send
new frames instead of remaining stuck on a stale frame held up by a slow
trickling client.

In `@src/core/HttpServerModule.h`:
- Around line 97-121: The comment block describing wsPreviewBuf_ and related
fields contains two issues: it incorrectly describes backpressure as "PER
CLIENT" when the implementation actually enforces global backpressure (refusing
new frames while ANY client drains), and it violates present-tense coding
guidelines by using future-tense phrases like "will host" and absence-narration
like "never freezes" and "never stalling". Revise the entire comment block to
accurately characterize the global backpressure model where frame acceptance is
blocked until all clients complete draining, and rewrite all sentences in
present-tense without future or absence narration (e.g., replace "will host"
with current description, replace "never freezes" and "never stalling" with
present-tense descriptions of what actually happens).

In `@src/light/drivers/PreviewDriver.h`:
- Around line 83-84: The buildAndSendCoordTable() method contains heap
allocations (coords_.allocate and new[] operations) that execute on the hot path
when called from loop(). Remove all heap allocations from
buildAndSendCoordTable() and any other methods invoked from the loop path (as
referenced in lines 109-113 and 181-190). Pre-allocate buffers and memory during
initialization or configuration time instead, so that the loop-reachable code
paths only use pre-allocated resources without triggering memory growth or
fragmentation.
- Around line 167-173: The latticeCount lambda in the stride selection logic
computes lattice grid points based on bounding box dimensions (ax, ay, az)
rather than the actual light count n, causing sparse layouts to be prematurely
downsampled. Replace the latticeCount lambda to operate on the actual light
count n instead, computing the stride that would result in downsampled light
count using the formula (n + s - 1) / s, and update the while loop condition to
check if this downsampled light count exceeds MAX_PREVIEW_POINTS rather than
checking lattice grid points.

In `@src/platform/desktop/platform_desktop.cpp`:
- Around line 701-712: In the TcpConnection::writeSome method, the socket needs
to be toggled to non-blocking mode before the ::send() call and restored to
blocking mode after it to prevent indefinite blocking when the kernel send
buffer is full. Add a call to make_nonblocking(fd_) immediately before the
::send() line, and add a call to make_blocking(fd_) immediately after the
::send() line completes, ensuring the socket's blocking state is restored to
maintain SO_RCVTIMEO semantics for read operations as documented in the design
comments.

In `@src/ui/index.html`:
- Line 40: The input element with id="preview-dot" is missing an explicit
`<label>` element that associates with it via the `for` attribute, which is
required by the input-requires-label linting rule. Add a `<label>` element with
`for="preview-dot"` that contains descriptive text (such as "Dot size" to match
the aria-label and title attributes). The label can be visually hidden using CSS
if visual presentation is a concern, but the HTML element must be present to
satisfy both accessibility standards and the linter configuration.

In `@src/ui/preview3d.js`:
- Around line 57-58: The dotScale_ variable accepts unbounded values from
localStorage without validation, which can cause performance issues. Clamp the
parseFloat result to the range [0.25, 1.5] when initializing dotScale_ at the
load time. Additionally, apply the same clamping logic to any input handlers or
updates to dotScale_ that occur around lines 377-383 where user input modifies
this value. Ensure all assignments to dotScale_ use a helper function or inline
logic to constrain the value within the specified bounds before it takes effect.
- Around line 485-497: The stride field from the incoming frame header at offset
5 (as a u16 value) is currently being ignored during validation. Extract the
stride value from the header using the view object and compare it against the
expected stride from the active coordinate table (alongside the existing count
validation). Return early if the stride does not match, preventing color
misalignment when the stride changes even if the count remains the same. This
ensures both the count and stride are synchronized before drawing the colors.

In `@test/unit/light/unit_PreviewDriver.cpp`:
- Around line 128-160: The current test only validates dense-grid downsampling
where the pixel count exceeds the RAM cap, but does not cover the sparse-layout
edge case where bounding-box dimensions are large but totalLightCount() remains
under cap. Add a new TEST_CASE function following the same pattern as the
existing "PreviewDriver downsamples a large layout on a regular spatial lattice"
test that creates a mm::GridLayout with large dimensions (e.g., 512x512) but
sparse lighting content (low totalLightCount under 131072), instantiates a
PreviewRig with that layout, calls rig.produce(), and then asserts that
rig.cap.coordStride() equals 1 to verify full-resolution output is preserved
when the actual light count does not require downsampling despite large spatial
bounds.

---

Outside diff comments:
In `@docs/moonmodules/core/ImprovProvisioningModule.md`:
- Around line 23-30: The vendor extension count in the opening paragraph is
outdated. Update the text that currently states "four standard RPC commands plus
three vendor extensions" to reflect that only two vendor extensions remain after
the removal of SET_DEVICE_MODEL in this PR. Change "three vendor extensions" to
"two vendor extensions" to align with the documented extensions below
(SET_TX_POWER and APPLY_OP).

In `@src/core/HttpServerModule.cpp`:
- Around line 46-58: The early return statement in the loop20ms() method after
handleConnection(conn) prevents drainWsSends() from ever being called when HTTP
connections are being accepted in bursts, causing queued preview bytes to remain
stuck indefinitely. To fix this, move the drainWsSends() call to execute before
the HTTP connection accept/handle block so that queued WebSocket preview data is
always drained on each tick regardless of incoming HTTP traffic, ensuring the
preview producer/consumer handoff remains responsive and backpressure is
relieved even during connection bursts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 08f7c2a0-2883-4108-8da3-d914546a986e

📥 Commits

Reviewing files that changed from the base of the PR and between 0c9fbd3 and 1e48e92.

📒 Files selected for processing (22)
  • docs/history/plans/Plan-20260622 - Non-blocking preview send.md
  • docs/moonmodules/core/HttpServerModule.md
  • docs/moonmodules/core/ImprovProvisioningModule.md
  • docs/moonmodules/light/drivers/PreviewDriver.md
  • scripts/moondeck_ui/app.js
  • scripts/scenario/run_live_scenario.py
  • src/core/BinaryBroadcaster.h
  • src/core/HttpServerModule.cpp
  • src/core/HttpServerModule.h
  • src/light/drivers/PreviewDriver.h
  • src/platform/desktop/platform_desktop.cpp
  • src/platform/esp32/platform_esp32.cpp
  • src/platform/platform.h
  • src/ui/index.html
  • src/ui/preview3d.js
  • src/ui/style.css
  • test/scenarios/light/scenario_Audio_mutation.json
  • test/scenarios/light/scenario_Driver_mutation.json
  • test/scenarios/light/scenario_Layouts_mutation.json
  • test/scenarios/light/scenario_perf_full.json
  • test/scenarios/light/scenario_perf_light.json
  • test/unit/light/unit_PreviewDriver.cpp

Comment thread docs/history/plans/Plan-20260622 - Non-blocking preview send.md Outdated
Comment thread docs/moonmodules/core/HttpServerModule.md Outdated
Comment thread docs/moonmodules/light/drivers/PreviewDriver.md Outdated
Comment on lines +898 to +904
fetch("/api/push-board", {
method: "POST",
headers: {"Content-Type": "application/json"},
body: JSON.stringify({ip: device.ip, board}),
signal: AbortSignal.timeout(10000),
}).then(r => r.json()).then(j => onDone && onDone(!!j.ok))
.catch(() => onDone && onDone(false));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Use async/await for pushBoard promise handling.

Line 898-Line 904 uses .then/.catch, which conflicts with the repository JS rule requiring async/await.

Suggested refactor
-        const pushBoard = (board, onDone) => {
-            // Success is the device-side result in the JSON body ({"ok": bool} from
-            // _push_board_to_device) — HTTP 200 alone can wrap a failed push (a device
-            // timeout / non-2xx mid-fan-out), so r.ok would falsely report success.
-            // 10s AbortSignal timeout so a stalled request can't wedge the button forever.
-            fetch("/api/push-board", {
-                method: "POST",
-                headers: {"Content-Type": "application/json"},
-                body: JSON.stringify({ip: device.ip, board}),
-                signal: AbortSignal.timeout(10000),
-            }).then(r => r.json()).then(j => onDone && onDone(!!j.ok))
-              .catch(() => onDone && onDone(false));
-        };
+        const pushBoard = async (board, onDone) => {
+            // Success is the device-side result in the JSON body ({"ok": bool} from
+            // _push_board_to_device) — HTTP 200 alone can wrap a failed push (a device
+            // timeout / non-2xx mid-fan-out), so r.ok would falsely report success.
+            // 10s AbortSignal timeout so a stalled request can't wedge the button forever.
+            try {
+                const response = await fetch("/api/push-board", {
+                    method: "POST",
+                    headers: {"Content-Type": "application/json"},
+                    body: JSON.stringify({ip: device.ip, board}),
+                    signal: AbortSignal.timeout(10000),
+                });
+                const payload = await response.json();
+                if (onDone) onDone(!!payload.ok);
+            } catch (_) {
+                if (onDone) onDone(false);
+            }
+        };

As per coding guidelines, “Always use async/await for promises in JavaScript/TypeScript”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/moondeck_ui/app.js` around lines 898 - 904, The promise chain
starting with the fetch call to "/api/push-board" uses .then() and .catch()
syntax, which conflicts with the repository's requirement to use async/await for
promise handling. Refactor this code by converting the entire fetch operation
into an async/await structure. Ensure the fetch call (with AbortSignal.timeout),
the response.json() call, and the callback invocation are all handled using
await, and replace the .catch() block with a proper try/catch statement to
handle any errors that occur during the fetch or JSON parsing.

Source: Coding guidelines

Comment on lines +368 to +378
elif step.get("optional"):
step_result["status"] = "ok"
skipped_ids.add(step.get("id", ""))
print(f" + {step.get('id','?')} ({step['type']}) — skipped (optional, type unavailable on {target})")
else:
step_result["status"] = "error"
except urllib.error.HTTPError:
if step.get("optional"):
step_result["status"] = "ok"
skipped_ids.add(step.get("id", ""))
print(f" + {step.get('id','?')} ({step['type']}) — skipped (optional, type unavailable on {target})")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Narrow optional add_module skip to only “type unavailable” failures

Current logic marks any optional add_module failure as success (ok:false and every HTTPError). That hides real regressions (e.g., bad parent_id, malformed request, handler errors) and can turn broken scenarios into false passes.
Please only skip when the backend explicitly reports “unknown/unavailable type” (or the exact status/code used for that case), and fail other errors.

Proposed fix
                 try:
                     resp = client.post("/api/modules", data)
                     if resp.get("ok"):
                         step_result["status"] = "ok"
                         if resp.get("note") == "already exists":
                             print(f"  =     {step.get('id', '?')} (exists)")
                         else:
                             print(f"  +     {step.get('id', '?')} ({step['type']})")
                             created_modules.append(step.get("id", ""))
-                    elif step.get("optional"):
+                    elif step.get("optional") and "unknown type" in str(resp.get("error", "")).lower():
                         step_result["status"] = "ok"
                         skipped_ids.add(step.get("id", ""))
                         print(f"  +     {step.get('id','?')} ({step['type']}) — skipped (optional, type unavailable on {target})")
                     else:
                         step_result["status"] = "error"
-                except urllib.error.HTTPError:
-                    if step.get("optional"):
+                except urllib.error.HTTPError as e:
+                    if step.get("optional") and e.code == 400:
                         step_result["status"] = "ok"
                         skipped_ids.add(step.get("id", ""))
                         print(f"  +     {step.get('id','?')} ({step['type']}) — skipped (optional, type unavailable on {target})")
                     else:
                         raise
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/scenario/run_live_scenario.py` around lines 368 - 378, The code
currently treats all HTTPError exceptions the same way as when a step is
explicitly optional, marking any failure as "ok" and skipped. You need to
distinguish between "type unavailable" errors and other actual failures. In the
except urllib.error.HTTPError block, inspect the error response to check if it
specifically indicates the type is unavailable (check the HTTP status code or
response body for the appropriate indicator). Only when it's a "type
unavailable" error should you set step_result["status"] to "ok" and skip it; for
all other HTTPError cases, allow the error to propagate or set
step_result["status"] to "error" to properly surface real issues like bad
parent_id or malformed requests.

Comment thread src/platform/desktop/platform_desktop.cpp
Comment thread src/ui/index.html
Comment thread src/ui/preview3d.js Outdated
Comment thread src/ui/preview3d.js Outdated
Comment thread test/unit/light/unit_PreviewDriver.cpp
The 3D preview now streams straight from the render buffer with no preview-side frame buffers, freeing the contiguous RAM that blocked large grids on no-PSRAM boards. Separately, two output drivers stop doing work for lights they don't drive: RmtLed encodes only the LEDs its strands transmit, and NetworkSend gains a light_count so a sink covers just its slice. Together these cut a 16K-LED tick on the S3 from ~180 ms to ~11 ms. Known limitation: the preview send is synchronous on its loop, so a 128x128+ frame briefly pauses the device — the resumable cross-tick send is the next follow-up.

KPI: 16384lights | PC:365KB | tick:129/93/121/9/1/486/43/17/25/139/32us(FPS:7751/10752/8264/111111/1000000/2057/23255/58823/40000/7194/31250) | ESP32 (S3, bench): 128² tick ~11ms with strands capped (was ~180ms); src:97(19716) | test:68(10375) | lizard:74w

Core:
- HttpServerModule / BinaryBroadcaster: replaced the chunk-array broadcastBinary + per-client staging buffer + drainWsSends with a streaming begin/pushBinary/endBinaryFrame trio — the WS header goes out on begin, each pushed slice fans to every client, end reports all-sent. No frame-sized buffer on either side. Removed the now-dead staging members, drainWsSends, directBroadcast, lastDrainTicks, and the broadcastBinary chunk form.
- platform (desktop): writeSome toggles non-blocking around send so the streamed push never blocks on a full kernel buffer.

Light domain:
- PreviewDriver: streams both messages — the 0x03 coordinate table from forEachCoord (no coords_ buffer) and the per-frame 0x02 colours straight from the producer buffer 1:1 at full resolution (no rgb_/sampledIdx_). A downsampled frame re-applies the same per-axis lattice skip over forEachCoord, so colour k lines up with coord k with no stored index map. Dropped coords_, rgb_, sampledIdx_ (≈80KB+ of contiguous internal RAM on a 16K grid); per-frame preview cost is now ~75µs. Adaptive downscale now keys off endBinaryFrame() (a frame that didn't reach every client) instead of drain latency.
- RmtLedDriver: bound the symbol-encode loop to txLightCount_ (Σ pinCounts, the lights the strands actually transmit) instead of the whole source buffer — a 64-leds/pin config on a 16K grid encoded all 16384, ~93ms; now ~2.6ms.
- NetworkSendDriver: new light_count control (uint16, 0 = whole buffer) sends only the first N lights, so a sink covers its slice — split some lights to LEDs and the rest to ArtNet, or run multiple senders for different ranges — and a frame isn't packed/sent for lights it doesn't own (~80ms → ~0.38ms at 64). ParallelLedDriver/LcdLedDriver already bound by maxLaneLights_; a start-offset for arbitrary slices across all drivers is a planned follow-up.

UI:
- preview3d.js: unchanged wire parsing (u32 count, 10/7-byte headers, count+stride mismatch guard) works with the streamed frames; comment fix (positions re-sent on rebuild / new client, not a ~1Hz timer).

Tests:
- unit_PreviewDriver: CaptureBroadcaster reassembles the begin/push/end stream; assertions (count, lattice regularity, sparse-large-box stays full-res, coord-pending retry) pass against the streamed model.

Docs:
- PreviewDriver.md / HttpServerModule.md: rewritten to the streamed (no-buffer) model; NetworkSendDriver.md documents light_count; ImprovProvisioningModule.md vendor-extension count corrected to two. Plan-20260623 saved.

Reviews:
- 🐇 (processed earlier, carried): loop20ms drain ordering, anyClient backpressure, latticeCount sparse-layout fix, dot-size clamp, input label, browser stride guard, dead-include + comment cleanups — all preserved or superseded by the streamed model.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/light/drivers/NetworkSendDriver.h (1)

149-151: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

light_count cap is bypassed in the passthrough branch.

When correction is unavailable, the code still sends sourceBuffer_->bytes(), so light_count is ignored and full frames are transmitted.

Proposed fix
         } else {
             data = sourceBuffer_->data();
-            totalBytes = sourceBuffer_->bytes();
+            totalBytes = static_cast<size_t>(nLights) * sourceBuffer_->channelsPerLight();
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/light/drivers/NetworkSendDriver.h` around lines 149 - 151, In the
passthrough branch where correction is unavailable, the code assigns totalBytes
= sourceBuffer_->bytes() without applying the light_count cap, which bypasses
the intended frame size limit. Apply the same light_count capping logic that is
used in the correction path to this passthrough branch by ensuring totalBytes
respects the light_count limit when setting its value from
sourceBuffer_->bytes(), similar to how it is handled when correction is
available.
src/light/drivers/PreviewDriver.h (1)

261-269: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Update the cap rationale comment to the current streaming path.

The comment still cites broadcastBinary/drainWsSends, which no longer matches this implementation’s beginBinaryFrame/pushBinaryFrame/endBinaryFrame flow.

As per coding guidelines, src/**/*.{h,cpp,ts,tsx,js} content “describe[s] the system as it is now in present tense.”

Suggested fix
-    // Frame cap: the most points one preview frame carries before the spatial-lattice
-    // downsample engages. The old ~1800 limit was the single-writev wall; that's gone now —
-    // broadcastBinary enqueues the frame and HttpServerModule::drainWsSends() streams it
-    // across loop20ms ticks (non-blocking), so the cap is no longer "what fits one writev"
-    // but "how big a frame the device can stage + stream comfortably". That's RAM-bound:
-    // the staging frame is points×3 bytes (16K LEDs = 48 KB), so a no-PSRAM classic board
+    // Frame cap: the most points one preview frame carries before the spatial-lattice
+    // downsample engages. Frames stream via begin/push/end, so the cap reflects how much
+    // payload the device can generate and send comfortably at runtime. That's RAM-bound:
+    // 16K LEDs = 48 KB payload, so a no-PSRAM classic board
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/light/drivers/PreviewDriver.h` around lines 261 - 269, The frame cap
rationale comment in PreviewDriver.h references outdated method names
broadcastBinary and drainWsSends that no longer match the current
implementation. Update this multi-line comment (starting around the "Frame cap"
explanation) to replace references to broadcastBinary and drainWsSends with the
current streaming flow methods: beginBinaryFrame, pushBinaryFrame, and
endBinaryFrame. Ensure the comment uses present tense language that accurately
describes how the system currently works, rather than how it worked previously.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/history/plans/Plan-20260622` - Non-blocking preview send.md:
- Line 33: The documentation at line 33 in the file describes an outdated
implementation that references the removed `HttpServerModule::drainWsSends()`
method. Replace this stale description with documentation that accurately
reflects the current implementation using the streamed binary frame API.
Specifically, remove the references to `drainWsSends()` and instead document
that binary frames are streamed through `beginBinaryFrame()`,
`pushBinaryFrame()`, and `endBinaryFrame()` methods, and clarify that
`loop20ms()` now only handles connection acceptance rather than draining a send
queue. Either provide the corrected description of the current staged API design
or explicitly reference any newer documentation that supersedes this approach.

In `@src/core/HttpServerModule.cpp`:
- Around line 1127-1182: The frame streaming implementation does not track
frame-level state, causing the stall counter in sendAllOrClose to reset per
slice (wasting the intended stall budget) and lacking verification that pushed
bytes match the totalLen declared in beginBinaryFrame. Add frame-state member
variables to track whether a frame is open, the remainingBytes from totalLen,
and a frame-wide stall budget. In beginBinaryFrame, initialize the frame state
(open=true, remainingBytes=totalLen, reset the frame-wide stall counter). In
pushBinaryFrame, decrement remainingBytes by the len being pushed and close all
clients if remainingBytes goes negative (over-push detected), while using the
frame-wide stall counter instead of letting sendAllOrClose reset it per slice.
In endBinaryFrame, verify that remainingBytes equals zero before marking
success; if there is a mismatch or the frame-wide stall budget is exhausted,
return false and ensure the frame is marked closed.

In `@src/light/drivers/NetworkSendDriver.h`:
- Around line 40-43: The comment for the lightCount variable in
NetworkSendDriver.h contains future-tense roadmap wording with the phrase "A
start offset for arbitrary slices is a planned follow-up across all drivers."
This violates the coding guidelines requiring present-tense comments. Remove the
sentence about the planned follow-up and keep only the description of the
current behavior of the lightCount variable (that it controls how many lights to
send, with 0 meaning the whole buffer and greater than 0 sending the first N
lights, allowing sinks to cover their own slice).

In `@src/light/drivers/PreviewDriver.h`:
- Around line 241-249: In the ColCtx struct definition and its initialization in
the sendFrame() method, the stride value s is being truncated to uint8_t which
causes incorrect lattice predicate behavior when previewStride_ exceeds 255.
Change the type of the s member variable in the ColCtx struct from uint8_t to a
wider type (such as uint16_t) that can accommodate the full stride value, and
update the ColCtx initialization to remove the cast that truncates s to uint8_t,
ensuring the complete stride width is preserved for the modulo operation in the
lambda's keep predicate.

In `@src/light/drivers/RmtLedDriver.h`:
- Around line 200-205: The encoding is capped to variable n (minimum of
txLightCount_ and sourceBuffer_->count()), but each pin still transmits its full
configured pinCounts_[i], creating a mismatch. When txLightCount_ exceeds
sourceBuffer_->count(), pins will transmit unencoded lights with stale symbol
data. Fix this by ensuring each pin only transmits up to the number of lights
that were actually encoded in this frame (n). Either cap the per-pin
transmission counts to not exceed n, or adjust the transmission loop logic to
stop transmitting each pin at the encoded boundary rather than using the full
pinCounts_[i].

In `@src/ui/style.css`:
- Around line 287-290: In the `.visually-hidden` class, replace or augment the
deprecated `clip: rect(0 0 0 0)` property with the modern equivalent `clip-path:
inset(50%)`. This will resolve the stylelint warning while maintaining the
screen-reader-only styling functionality that hides the element visually but
keeps it accessible to assistive technologies.

---

Outside diff comments:
In `@src/light/drivers/NetworkSendDriver.h`:
- Around line 149-151: In the passthrough branch where correction is
unavailable, the code assigns totalBytes = sourceBuffer_->bytes() without
applying the light_count cap, which bypasses the intended frame size limit.
Apply the same light_count capping logic that is used in the correction path to
this passthrough branch by ensuring totalBytes respects the light_count limit
when setting its value from sourceBuffer_->bytes(), similar to how it is handled
when correction is available.

In `@src/light/drivers/PreviewDriver.h`:
- Around line 261-269: The frame cap rationale comment in PreviewDriver.h
references outdated method names broadcastBinary and drainWsSends that no longer
match the current implementation. Update this multi-line comment (starting
around the "Frame cap" explanation) to replace references to broadcastBinary and
drainWsSends with the current streaming flow methods: beginBinaryFrame,
pushBinaryFrame, and endBinaryFrame. Ensure the comment uses present tense
language that accurately describes how the system currently works, rather than
how it worked previously.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 313d08bf-6bfd-45db-b52f-b90ae9a1bd1f

📥 Commits

Reviewing files that changed from the base of the PR and between 1e48e92 and c26ef2a.

📒 Files selected for processing (17)
  • docs/history/plans/Plan-20260622 - Non-blocking preview send.md
  • docs/history/plans/Plan-20260623 - Stream preview from buffers, zero preview buffers.md
  • docs/moonmodules/core/HttpServerModule.md
  • docs/moonmodules/core/ImprovProvisioningModule.md
  • docs/moonmodules/light/drivers/NetworkSendDriver.md
  • docs/moonmodules/light/drivers/PreviewDriver.md
  • src/core/BinaryBroadcaster.h
  • src/core/HttpServerModule.cpp
  • src/core/HttpServerModule.h
  • src/light/drivers/NetworkSendDriver.h
  • src/light/drivers/PreviewDriver.h
  • src/light/drivers/RmtLedDriver.h
  • src/platform/desktop/platform_desktop.cpp
  • src/ui/index.html
  • src/ui/preview3d.js
  • src/ui/style.css
  • test/unit/light/unit_PreviewDriver.cpp


- **One staging buffer for the live-preview client**, sized to the RAM-derived point cap, allocated once via `platform::alloc` (PSRAM-preferred; classic falls back to internal RAM). Single live client (§2) → one buffer.
- `broadcastBinary` → **non-blocking enqueue**: **backpressure gate first** — if the live client still has unsent bytes from the previous frame, **drop this frame** (newest-wins). Else copy WS header + payload into the staging buffer, set `len`, `sent=0`, return. Never blocks.
- New **`HttpServerModule::drainWsSends()`** called from `loop20ms()`: flush the staging buffer with the **non-blocking** `writeSome` — send what the socket takes now, advance `sent`, leave the rest for the next tick. Mid-frame partial is expected (we own the offset); only a real socket error closes. The exact function §145 later hosts on the consumer task. (As implemented, the drain runs before the accept so a connection burst can't strand it.)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Replace the stale drain-queue description with the shipped streamed API.

Line 33 still says HttpServerModule::drainWsSends() exists and runs before accept, but the current implementation streams binary frames through beginBinaryFrame() / pushBinaryFrame() / endBinaryFrame() and loop20ms() only accepts/handles a connection. This design record now points readers at a removed path; update this bullet to describe the staged API or explicitly defer to the newer zero-buffer plan.

Suggested wording
-- New **`HttpServerModule::drainWsSends()`** called from `loop20ms()`: flush the staging buffer with the **non-blocking** `writeSome` — send what the socket takes now, advance `sent`, leave the rest for the next tick. Mid-frame partial is expected (we own the offset); only a real socket error closes. The exact function §145 later hosts on the consumer task. (As implemented, the drain runs before the accept so a connection burst can't strand it.)
+- Binary preview frames use **`BinaryBroadcaster::beginBinaryFrame()` / `pushBinaryFrame()` / `endBinaryFrame()`**: the server sends the WS header up front and fans each payload slice to clients through non-blocking `writeSome` via `sendAllOrClose()`. There is no staging-buffer drain step in the shipped path; the zero-buffer streaming plan records that follow-up design.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- New **`HttpServerModule::drainWsSends()`** called from `loop20ms()`: flush the staging buffer with the **non-blocking** `writeSome` — send what the socket takes now, advance `sent`, leave the rest for the next tick. Mid-frame partial is expected (we own the offset); only a real socket error closes. The exact function §145 later hosts on the consumer task. (As implemented, the drain runs before the accept so a connection burst can't strand it.)
- Binary preview frames use **`BinaryBroadcaster::beginBinaryFrame()` / `pushBinaryFrame()` / `endBinaryFrame()`**: the server sends the WS header up front and fans each payload slice to clients through non-blocking `writeSome` via `sendAllOrClose()`. There is no staging-buffer drain step in the shipped path; the zero-buffer streaming plan records that follow-up design.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/history/plans/Plan-20260622` - Non-blocking preview send.md at line 33,
The documentation at line 33 in the file describes an outdated implementation
that references the removed `HttpServerModule::drainWsSends()` method. Replace
this stale description with documentation that accurately reflects the current
implementation using the streamed binary frame API. Specifically, remove the
references to `drainWsSends()` and instead document that binary frames are
streamed through `beginBinaryFrame()`, `pushBinaryFrame()`, and
`endBinaryFrame()` methods, and clarify that `loop20ms()` now only handles
connection acceptance rather than draining a send queue. Either provide the
corrected description of the current staged API design or explicitly reference
any newer documentation that supersedes this approach.

Comment on lines +1127 to +1182
bool HttpServerModule::sendAllOrClose(platform::TcpConnection& ws, const uint8_t* data, size_t len) {
size_t sent = 0;
int stalls = 0; // TOTAL WouldBlock spins this span (NOT reset on progress) — hard-bounds
// how long this synchronous send can occupy the tick.
while (sent < len) {
int n = ws.writeSome(data + sent, len - sent);
if (n < 0) { ws.close(); return false; } // real socket error
if (n == 0) { // WouldBlock — lwIP send buffer momentarily full
// Brief no-sleep spin to let the lwIP buffer drain (TCP ACKs free it sub-ms). Capped
// on TOTAL spins so a slow link can't hold the tick: once the budget is spent the
// send gives up (close + false), and the producer's adaptive downscale shrinks the
// next frame so it fits. NOTE: this whole send is still synchronous on the caller's
// loop — a large frame on a slow link briefly pauses it. The resumable cross-tick
// send (carry a byte cursor, resume next loop) is the follow-up that removes that.
if (++stalls > kDirectSendSpins) { ws.close(); return false; }
continue;
}
sent += static_cast<size_t>(n);
}
return true;
}

// Scatter-gather: our WS header, then the caller's payload chunks. The
// payload buffers are caller-owned (e.g. PreviewDriver's downsample buffer);
// no copy here. Stack array sized for WS header + a small fixed payload
// (the preview uses 2 chunks). MAX_PAYLOAD_CHUNKS caps it so this stays a
// stack array, not an allocation in the broadcast path.
static constexpr int MAX_PAYLOAD_CHUNKS = 4;
if (chunkCount > MAX_PAYLOAD_CHUNKS) return; // caller bug; don't allocate
platform::WriteChunk chunks[1 + MAX_PAYLOAD_CHUNKS];
chunks[0] = { wsHeader, static_cast<size_t>(wsHeaderLen) };
for (int i = 0; i < chunkCount; i++) chunks[1 + i] = payload[i];
const int totalChunks = 1 + chunkCount;
// Streamed frame: header now, payload pushed in slices, no frame-sized staging buffer — so a
// large frame (PreviewDriver's coordinate table or colour frame) goes out on a memory-tight
// board where a contiguous block won't fit. The producer (forEachCoord) pushes forward-only;
// each slice fans to every client before the next push. A client that can't keep up is closed
// (its WS message ends incomplete → it reconnects), so this never blocks the tick indefinitely.
void HttpServerModule::beginBinaryFrame(size_t totalLen) {
wsFrameAllSent_ = true;
uint8_t wsHeader[10];
int wsHeaderLen;
wsHeader[0] = 0x82;
if (totalLen < 126) { wsHeader[1] = static_cast<uint8_t>(totalLen); wsHeaderLen = 2; }
else if (totalLen < 65536) {
wsHeader[1] = 126; wsHeader[2] = static_cast<uint8_t>((totalLen >> 8) & 0xFF);
wsHeader[3] = static_cast<uint8_t>(totalLen & 0xFF); wsHeaderLen = 4;
} else {
wsHeader[1] = 127;
for (int i = 0; i < 8; i++)
wsHeader[2 + i] = static_cast<uint8_t>((static_cast<uint64_t>(totalLen) >> (56 - 8 * i)) & 0xFF);
wsHeaderLen = 10;
}
for (auto& ws : wsClients_) {
if (ws.valid() && !sendAllOrClose(ws, wsHeader, static_cast<size_t>(wsHeaderLen)))
wsFrameAllSent_ = false;
}
}

void HttpServerModule::pushBinaryFrame(const uint8_t* data, size_t len) {
if (!data || len == 0) return;
for (auto& ws : wsClients_) {
if (!ws.valid()) continue;
switch (ws.writeChunks(chunks, totalChunks)) {
case platform::WriteResult::Complete:
case platform::WriteResult::WouldBlock:
// WouldBlock: browser is backpressured — skip this frame,
// keep the connection open (the next frame may fit).
break;
case platform::WriteResult::Partial:
case platform::WriteResult::Error:
// Partial: a truncated WS message went out — the stream is
// corrupt, the connection must be dropped. Error: dead socket.
ws.close();
break;
}
if (ws.valid() && !sendAllOrClose(ws, data, len)) wsFrameAllSent_ = false;
}
}

bool HttpServerModule::endBinaryFrame() { return wsFrameAllSent_; }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

Track frame-level send state instead of resetting it per slice.

sendAllOrClose() resets its stall counter on every pushBinaryFrame() slice, so a large coordinate table streamed in many chunks can spend kDirectSendSpins repeatedly on the scheduler loop. The begin/push/end implementation also never verifies that pushed bytes add up to totalLen; an under-push leaves clients waiting for bytes, and an over-push corrupts the WebSocket message boundary. Carry frame state from begin to end (open, remainingBytes, and a frame-wide or per-client stall budget), then close/return false on mismatch or budget exhaustion. As per coding guidelines, “No UI action or API-call sequence must crash or wedge a running device, including deleting, replacing, or clearing modules in any order; crashes and hangs are bugs, not user error.”

Sketch of the expected state tracking
 // HttpServerModule.h
     bool wsFrameAllSent_ = true;
+    bool wsFrameOpen_ = false;
+    size_t wsFrameRemaining_ = 0;
+    int wsFrameStallBudget_ = 0;
-bool HttpServerModule::sendAllOrClose(platform::TcpConnection& ws, const uint8_t* data, size_t len) {
+bool HttpServerModule::sendAllOrClose(platform::TcpConnection& ws, const uint8_t* data, size_t len,
+                                      int& stallBudget) {
     size_t sent = 0;
-    int stalls = 0;
     while (sent < len) {
         int n = ws.writeSome(data + sent, len - sent);
         if (n < 0) { ws.close(); return false; }
         if (n == 0) {
-            if (++stalls > kDirectSendSpins) { ws.close(); return false; }
+            if (--stallBudget < 0) { ws.close(); return false; }
             continue;
         }
         sent += static_cast<size_t>(n);
     }
     return true;
 }

 void HttpServerModule::beginBinaryFrame(size_t totalLen) {
     wsFrameAllSent_ = true;
+    wsFrameOpen_ = true;
+    wsFrameRemaining_ = totalLen;
+    wsFrameStallBudget_ = kDirectSendSpins;
     ...
 }

 void HttpServerModule::pushBinaryFrame(const uint8_t* data, size_t len) {
-    if (!data || len == 0) return;
+    if (!wsFrameOpen_ || len == 0) return;
+    if (!data || len > wsFrameRemaining_) {
+        wsFrameAllSent_ = false;
+        return;
+    }
     ...
+    wsFrameRemaining_ -= len;
 }

-bool HttpServerModule::endBinaryFrame() { return wsFrameAllSent_; }
+bool HttpServerModule::endBinaryFrame() {
+    const bool ok = wsFrameOpen_ && wsFrameRemaining_ == 0 && wsFrameAllSent_;
+    if (!ok) {
+        for (auto& wsClient : wsClients_) {
+            if (wsClient.valid()) {
+                wsClient.close();
+            }
+        }
+    }
+    wsFrameOpen_ = false;
+    wsFrameRemaining_ = 0;
+    return ok;
+}
🧰 Tools
🪛 Clang (14.0.6)

[warning] 1127-1127: use a trailing return type for this function

(modernize-use-trailing-return-type)


[warning] 1127-1127: parameter name 'ws' is too short, expected at least 3 characters

(readability-identifier-length)


[warning] 1132-1132: variable name 'n' is too short, expected at least 3 characters

(readability-identifier-length)


[warning] 1156-1156: do not declare C-style arrays, use std::array<> instead

(modernize-avoid-c-arrays)


[warning] 1157-1157: variable 'wsHeaderLen' is not initialized

(cppcoreguidelines-init-variables)


[warning] 1165-1165: statement should be inside braces

(readability-braces-around-statements)


[warning] 1169-1169: variable name 'ws' is too short, expected at least 3 characters

(readability-identifier-length)


[warning] 1170-1170: statement should be inside braces

(readability-braces-around-statements)


[warning] 1176-1176: implicit conversion 'const uint8_t *' (aka 'const unsigned char *') -> bool

(readability-implicit-bool-conversion)


[warning] 1176-1176: statement should be inside braces

(readability-braces-around-statements)


[warning] 1177-1177: variable name 'ws' is too short, expected at least 3 characters

(readability-identifier-length)


[warning] 1178-1178: statement should be inside braces

(readability-braces-around-statements)


[warning] 1182-1182: use a trailing return type for this function

(modernize-use-trailing-return-type)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/HttpServerModule.cpp` around lines 1127 - 1182, The frame streaming
implementation does not track frame-level state, causing the stall counter in
sendAllOrClose to reset per slice (wasting the intended stall budget) and
lacking verification that pushed bytes match the totalLen declared in
beginBinaryFrame. Add frame-state member variables to track whether a frame is
open, the remainingBytes from totalLen, and a frame-wide stall budget. In
beginBinaryFrame, initialize the frame state (open=true,
remainingBytes=totalLen, reset the frame-wide stall counter). In
pushBinaryFrame, decrement remainingBytes by the len being pushed and close all
clients if remainingBytes goes negative (over-push detected), while using the
frame-wide stall counter instead of letting sendAllOrClose reset it per slice.
In endBinaryFrame, verify that remainingBytes equals zero before marking
success; if there is a mismatch or the frame-wide stall budget is exhausted,
return false and ensure the frame is marked closed.

Source: Coding guidelines

Comment thread src/light/drivers/NetworkSendDriver.h Outdated
Comment thread src/light/drivers/PreviewDriver.h Outdated
Comment thread src/light/drivers/RmtLedDriver.h
Comment thread src/ui/style.css
Adds a serpentine control to GridLayout so a dense grid can be wired in reverse on odd rows — a one-control lever to exercise the non-identity mapping path without a sparse layout or modifier. The Layer now MEASURES whether a dense grid is in natural order (rather than assuming it) before taking the memcpy fast path, so a shuffled-but-dense grid correctly routes through the mapping LUT. Also processes a batch of CodeRabbit findings, fixes the preview not drawing on a UI refresh, names the preview's graceful-degradation model in the architecture doc, and genericises incidental "Olimex" references (the classic board's default Ethernet pin map isn't specific to that one early board).

KPI: 16384lights | PC:365KB | tick:120/89/122/9/1/326/37/15/19/119/11us(FPS:8333/11235/8196/111111/1000000/3067/27027/66666/52631/8403/90909) | ESP32:1221KB | src:97(19757) | test:68(10442) | lizard:74w

Core:
- Layer: identity mapping is now MEASURED, not declared — isNaturalOrder() walks the layout's coords once (allocation-free, cold path) and the dense memcpy fast path is taken only when driver index i actually equals box cell i. A dense-but-shuffled grid (serpentine) falls through to the box→driver LUT exactly as a sparse layout does. Replaces the count-only sparse check that would have wrongly memcpy'd a serpentine grid. No isIdentityOrder() virtual — the order is a fact of the coords, with one source of truth.
- platform (esp32): reworded the Ethernet default-pin-map comments — the classic default is the common LAN8720 RMII wiring (reset GPIO5, MDIO addr 0, clock GPIO17), described as the capability rather than as one specific board's map (Olimex demoted to an example).

Light domain:
- GridLayout: new serpentine bool control — odd rows wired in reverse (boustrophedon); forEachCoord reverses x there while still emitting the true (x,y,z), so the index→position order is what changes. Drives the non-identity mapping path on demand.
- PreviewDriver (🐇 R1): the downsampled colour pass kept its lattice stride in a uint8_t clamped to 255, so above stride 255 it disagreed with the coord table's predicate and the browser dropped every frame. Widened to the full nrOfLightsType so colour[k] lines up with coord[k] at any stride.
- RmtLedDriver (🐇 R2): cap each pin's transmit at the n lights actually encoded this frame — if the buffer shrank since the last config parse, a pin would otherwise clock out stale symbols past the encoded boundary.
- NetworkSendDriver (🐇 R3): the no-correction passthrough branch ignored light_count and sent the whole buffer; it now honours the same slice cap as the corrected path. (🐇 R4) light_count comment made present-tense.

UI:
- preview3d.js: the grid now draws the instant the coordinate table arrives (a new drawLights() helper inits GL + builds the geometry with off-LED placeholders), so a page/UI refresh shows the layout immediately instead of waiting for the first colour frame (which never comes on a paused scene).
- style.css (🐇 R6): .visually-hidden uses clip-path: inset(50%) instead of the deprecated clip: rect().

Scripts / MoonDeck:
- run_live_scenario.py, MoonDeck.md, build_esp32.py: Ethernet-default and settle-time references genericised from "Olimex" to the capability ("classic board", "default LAN8720 pins"); firmwares.json regenerated to match the build_esp32.py description edit.

Tests:
- unit_GridLayout: serpentine reverses x on odd rows (even rows L→R, odd R→L), non-serpentine unchanged.
- unit_Layer_sparse_mapping: a serpentine grid leaves the identity path and builds a box→driver LUT (with the reversed-row box→driver assertions), and returns to the memcpy fast path when toggled off.
- unit_HttpServerModule_apply: arbitrary "Olimex Gateway" test label → "Living Room".

Docs:
- architecture.md: named "graceful degradation under transport backpressure" as the transport sibling of the memory degradation cascade (resumable send + newest-wins backpressure + adaptive lattice); fixed a stale writeChunks reference to writeSome.
- GridLayout.md: documents serpentine + the measured-identity mapping. testing.md/building.md: classic-tier framing leads with the capability, board named only as measured-on provenance / example.
- backlog: captured the mid/long-term preview items (self-describing frame header, RGBW end-to-end, fixture model for moving heads, ThrottledChannel extraction).

Reviews:
- 🐇 R5 (frame-level stall budget + over-push guard): deferred into the resumable cross-tick send — it needs exactly that frame-state, so building it now then rebuilding would be throwaway.
- 🐇 R7 (Plan-20260622 stale drainWsSends line): skipped — docs/history plan archives intentionally record the approved design, not the final code, and are present-tense-exempt per CLAUDE.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Note

Unit test generation is a public access feature. Expect some limitations and changes as we gather feedback and continue to improve it.


Generating unit tests... This may take up to 20 minutes.

1 similar comment
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Note

Unit test generation is a public access feature. Expect some limitations and changes as we gather feedback and continue to improve it.


Generating unit tests... This may take up to 20 minutes.

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

✅ Created PR with unit tests: #25

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

✅ Created PR with unit tests: #26

@ewowi

ewowi commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator Author

@coderabbitai, please review e0e5b32

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch the commit details
gh api repos/MoonModules/projectMM/commits/e0e5b328e277892f9611d6650e8c3fb47dd8d4fe \
  --jq '{sha: .sha, message: .commit.message, author: .commit.author.name, date: .commit.author.date, files: [.files[] | {filename: .filename, status: .status, additions: .additions, deletions: .deletions, patch: .patch}]}'

Length of output: 47629



@ewowi Here is the review of commit e0e5b328:


Commit Review

Summary: Adds serpentine to GridLayout, switches the Layer identity-path check from a count heuristic to a measured isNaturalOrder() walk, and batch-processes a set of CodeRabbit bug findings (stride overflow in PreviewDriver, stale RMT symbols in RmtLedDriver, and the passthrough branch ignoring light_count in NetworkSendDriver). Also includes a UI fix for immediate preview rendering on coord-table arrival and various doc/comment cleanups.


Bugs fixed — all verified correct

Driver Finding Fix
PreviewDriver Stride stored as uint8_t → overflows at stride > 255, misaligning colour and coord frames on large layouts Widened to nrOfLightsType
RmtLedDriver Pin could transmit stale symbols past the encoded boundary when the buffer shrank between a parseConfig and the next loop() Caps each pin's transmit to min(pinLights, n - pinStart)
NetworkSendDriver No-correction passthrough branch sent sourceBuffer_->bytes() ignoring light_count Now uses nLights * channelsPerLight()

src/light/layers/Layer.hisNaturalOrder()

Unused box field in Ctx:

struct Ctx { lengthType w, h; nrOfLightsType box; bool ok; };
Ctx ctx{physicalWidth_, physicalHeight_, boxCount, true};

The box member is initialized to boxCount but never read inside the lambda — the lambda recomputes the index locally from x/y/z. It can be removed:

struct Ctx { lengthType w, h; bool ok; };
Ctx ctx{physicalWidth_, physicalHeight_, true};

No early exit once mismatch is found:

Once k->ok = false, the lambda continues iterating all remaining coordinates. For a 128×128 serpentine grid the first mismatch lands at driver index 4 (first reversed-row entry) — still 16380 iterations to go. This runs on the cold path (onBuildState) so it is not a correctness issue, but an early-exit would be a cheap improvement. Since forEachCoord uses a plain C callback with no stop signal, the cleanest approach is a flag check at the top of the lambda:

layouts_->forEachCoord([](void* c, nrOfLightsType driverIdx, lengthType x, lengthType y, lengthType z) {
    auto* k = static_cast<Ctx*>(c);
    if (!k->ok) return;   // short-circuit once mismatch found
    nrOfLightsType box = ...;
    if (driverIdx != box) k->ok = false;
}, &ctx);

src/light/layouts/GridLayout.h — serpentine logic

The boustrophedon logic is correct and clean:

const bool reverse = serpentine && (y & 1);
for (lengthType i = 0; i < width && idx < limit; i++) {
    const lengthType x = reverse ? static_cast<lengthType>(width - 1 - i) : i;
    cb(ctx, static_cast<nrOfLightsType>(idx++), x, y, z);
}

Emits the true (x, y, z) coordinate while advancing the driver index linearly — the order change (not the position) is what breaks the identity path. The z-loop outer nesting is preserved correctly.


src/ui/preview3d.jsdrawLights(null) on coord table arrival

Good fix. Extracting drawLights(rgb) with rgb ? rgb[i*3]/255 : 0 correctly renders placeholder geometry (all-off LEDs) the moment the coordinate table lands, rather than waiting for the first colour frame (which never arrives on a paused scene). No concerns.


src/ui/style.css.visually-hidden

clip-path: inset(50%) in place of the deprecated clip: rect(0 0 0 0) is the currently recommended pattern — correct change.


Tests

  • unit_GridLayout serpentine assertions are correct — even rows L→R, odd rows R→L, toggling off restores natural order. ✅
  • unit_Layer_sparse_mapping LUT assertions are correct:
    • driverOf(4) == 7 (box cell (x=0,y=1) maps to driver index 7 — far end of the reversed row). ✅
    • driverOf(7) == 4 (box cell (x=3,y=1) maps to driver index 4 — near end). ✅
    • Toggling serpentine = false and calling onBuildState() returns to !hasLUT(). ✅

Overall: Solid batch of fixes with comprehensive tests. The only actionable item is removing the unused box field from isNaturalOrder()'s Ctx and optionally adding the early-exit guard in the lambda.

The 3D preview now streams the full-resolution colour frame resumably — drained from the producer buffer a chunk per transport poll, off the LED render tick — so a 128x128 grid previews smoothly without stalling the device, verified sustaining on all three boards (classic ~9 fps, S3 ~7.7 fps, P4 ~8 fps). The downsample for large grids is now closed-form index arithmetic instead of a per-frame coordinate walk, the point budget adapts to a sane display limit and to free memory, and a batch of smaller fixes ride along (preview wireframe box, Lines sweep range, a discovery-sweep stutter). Effective preview frame rate self-limits to what the link sustains; the LED output path is never charged for preview work.

KPI: 16384lights | PC:383KB | tick:117/87/119/9/1/320/36/15/18/116/11us(FPS:8547/11494/8403/111111/1000000/3125/27777/66666/55555/8620/90909) | ESP32:1261KB | src:97(20031) | test:68(10574) | lizard:75w

Core:
- HttpServerModule / BinaryBroadcaster: added a resumable buffered send (sendBufferedFrame / bufferedSendIdle / cancelBufferedSend) that streams one WS message from a stable caller-owned buffer, drained a memory-adaptive chunk per client on loop20ms — never spins the caller's loop, never blocks the render tick, newest-wins backpressure drop. The full-res colour frame uses it (body = the producer buffer, zero copy); the rarely-sent coord table + downsampled frames keep the synchronous begin/push/end stream. A new client resets its slot cursor and cancels any in-flight send so it starts clean.
- Layer: identity mapping is MEASURED (isNaturalOrder walks the coords once) so a dense grid in natural order takes the memcpy fast path and a shuffled/sparse one routes through the LUT — feeds the preview's closed-form vs forEachCoord choice.

Light domain:
- PreviewDriver: full-res colour frame routed through the resumable send (gated on bufferedSendIdle, so the effective fps self-limits to the link). Downsample is now closed-form for a dense grid — count, coord positions, and downsampled colours stride the box directly (z*H*W + y*W + x), no per-frame forEachCoord walk; sparse/serpentine layouts (arbitrary index<->position map) still walk forEachCoord. Point cap = min(display cap ~4096, memory-derived from maxAllocBlock) so every board downsamples a large grid to a link-friendly, visually-sufficient frame. Coord-table box extent is size-1 (the max coordinate the positions reach), so the wireframe box hugs the lights instead of sitting one cell large. Adaptive downscale keys off send latency.
- LinesEffect: the sweep now reaches index 0..N-1 (beat * N / 65536) — the prior beat * (N-1) / 65535 never hit the last row/column because beat tops out below full scale; the strip stalled one short of the far edge.

Core (Devices):
- DevicesModule: boot HTTP discovery probe timeout cut 150ms -> 30ms and the mDNS browse throttled, so the once-a-second blocking probe stops hitching animation during the ~minutes-long boot sweep (a live host on a LAN answers in a few ms).

UI:
- preview3d.js: the grid draws the instant the coordinate table arrives (a drawLights() helper renders off-LED placeholders), so a page refresh shows the layout immediately. Effective preview fps shown in the status bar as a 1-second sliding-window count (immune to bursty WS arrivals).

Tests:
- unit_PreviewDriver: CaptureBroadcaster models the resumable send; cases pin the closed-form downsample exact placement, dense vs sparse buffer selection (sphere uses the LUT-mapped driver buffer, not the dense box), full-res routing through the buffered send, resize-during-send cancel, and adaptive-fps gating. unit_GridLayout / unit_Layer_sparse_mapping cover serpentine + measured identity.

Docs / CI:
- architecture.md: named "graceful degradation under transport backpressure" as the transport sibling of the memory degradation cascade. HttpServerModule.md / PreviewDriver.md: the resumable + closed-form + display-cap model. backlog: mid/long-term preview items (self-describing frame header, RGBW, fixture model for moving heads, ThrottledChannel extraction); socket-pair WS test fixture. Plan-20260623 saved.

Reviews:
- 👾 self: further send-path changes were tried to chase a P4-Ethernet preview drop (treating lwIP ENOMEM as transient, yielding instead of spinning in sendAllOrClose, a non-fatal periodic state push, a buffered coord table) and ALL reverted — they regressed the working S3/classic 128 preview and the P4 drop turned out to be those very changes, not a pre-existing bug. The committed midpoint (resumable colour + synchronous coord table + the original spin) sustains on all three boards. Lesson: stop at the first failed fix on a working path; don't re-engineer the seam that already works.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/ui/preview3d.js (2)

467-471: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Commit previewStride_ only after accepting the coordinate packet.

Lines 469-470 update global parser state before the payload-length guard. A truncated 0x03 table can change previewStride_ without replacing previewCoords_, so subsequent valid 0x02 frames for the current table fail the stride check and the preview freezes until another valid coord table arrives.

🐛 Proposed fix
     const count = view.getUint32(1, true);
     const bx = view.getUint8(5), by = view.getUint8(6), bz = view.getUint8(7);
-    previewStride_ = view.getUint16(8, true) || 1;   // = device's adaptive downscale factor
-    updatePreviewStatus();
+    const stride = view.getUint16(8, true) || 1;     // = device's adaptive downscale factor
     if (buf.byteLength < 10 + count * 3) return;
     const pos = new Uint8Array(buf, 10);
@@
     }
+    previewStride_ = stride;
     previewCoordCount_ = count;
     previewBox_ = { x: bx, y: by, z: bz };
+    updatePreviewStatus();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/ui/preview3d.js` around lines 467 - 471, The global parser state updates
(the previewStride_ assignment and updatePreviewStatus() call on lines 469-470)
must be moved to after the payload-length validation check on line 471.
Currently, if the buffer is truncated and the function returns early due to
insufficient bytes, previewStride_ and preview status have already been modified
without the corresponding coordinate data being updated, causing the parser
state to become inconsistent. Move both the previewStride_ assignment and
updatePreviewStatus() call to execute only after confirming that buf.byteLength
is sufficient for the full payload.

338-356: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Rewrite changed comments in present tense.

These comments use hypothetical/future narration (would only catch, would be, would land, would spike). The repo rule asks JS comments/docs to stay present-tense; rewrite them as state descriptions.

♻️ Proposed comment cleanup
-    // matchMedia would only catch the breakpoint crossing; a resize listener also keeps
+    // matchMedia catches only the breakpoint crossing; a resize listener also keeps
@@
-    // only vanished later when narrow auto-PiP kicked in would be a confusing surprise.
+    // disappears later when narrow auto-PiP activates.
@@
-    // mid-flight: the colours would land on the wrong positions (a visibly scrambled frame).
+    // mid-flight: the colours map to the wrong positions (a visibly scrambled frame).
@@
-// reading would spike to absurd values on a 0 ms gap (the "200 fps" artifact). Counting over a
+// reading spikes to absurd values on a 0 ms gap (the "200 fps" artifact). Counting over a

As per coding guidelines, "**/*.{h,cpp,hpp,js,py,md}: Sweep documentation for forward-looking language ... rewrite present-tense or move to docs/backlog/backlog.md." Based on learnings, "Use present tense only in code, comments, and documentation."

Also applies to: 501-516

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/ui/preview3d.js` around lines 338 - 356, The comments in the resize event
listener section and the preview-dock click handler use future/hypothetical
tense ("would only catch", "would be") which violates the repo's present-tense
documentation guideline. Rewrite the comment above the resize addEventListener
call to describe in present tense what matchMedia and resize listeners actually
do together, and rewrite the comment following the preview-dock click handler to
describe in present tense how the dismissal behavior works across docked and PiP
modes instead of using conditional language like "would be". Also apply the same
present-tense rewriting to the comments in the sections mentioned to exist
around lines 501-516.

Sources: Coding guidelines, Learnings

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/architecture.md`:
- Line 314: In the graceful degradation section of the architecture
documentation, correct the backpressure mechanism description by replacing
"newest-wins backpressure" with "drop-new backpressure" to accurately reflect
that the mechanism keeps the in-flight frame and rejects new frames (not the
inverse), and replace the forward-looking phrase "a future bulky stream could
ride the same transport" with present-tense wording such as "other bulky streams
can ride the same transport" to comply with documentation guidelines requiring
present-tense language throughout.

In `@docs/history/plans/Plan-20260623` - Resumable adaptive preview send.md:
- Around line 29-75: Add the missing language specifier to the code fence around
the PreviewSend struct definition by changing the opening triple backticks to
include the language tag (use `text`). Additionally, add a blank line after the
"### §2 Adaptive frame rate (the elegant part)" heading to satisfy the MD022
rule for blank lines around headings. These changes will resolve the
markdownlint violations MD040 and MD022 respectively.

In `@docs/history/plans/Plan-20260623` - Stream preview from buffers, zero preview
buffers.md:
- Around line 1-3: The document at the title "# Plan — Stream the preview from
the producer buffer; eliminate all preview-side buffers" uses forward-looking
language but lacks the required one-line archival justification. Add a one-line
note immediately after the main title heading that explains this is an
intentional archival plan document using forward-looking language, following the
same pattern used in the sibling 2026-06-23 plan document to satisfy the
documentation policy requirements for archived product-owner plans.

In `@docs/moonmodules/light/drivers/NetworkSendDriver.md`:
- Line 12: The documentation for the `light_count` parameter in
NetworkSendDriver.md contains forward-looking language with a "planned
follow-up" reference, which violates the present-tense documentation style
guide. Remove or rephrase the parenthetical statement about the planned offset
feature to eliminate forward-looking language. Either delete the parenthetical
sentence entirely if it's not essential to current functionality, or if the
roadmap item is important, move that roadmap note to `docs/backlog/backlog.md`
instead. Keep the remainder of the `light_count` documentation description in
present tense, describing only current behavior.

In `@src/core/BinaryBroadcaster.h`:
- Around line 35-43: Update the documentation comments for the BinaryBroadcaster
class methods starting at sendBufferedFrame(). First, change the "newest-wins
backpressure" terminology to "drop-new-frame-wins" or "current-frame-wins" to
accurately describe the behavior. Second, fix the cancelBufferedSend()
documentation to reflect that browsers do not discard partially sent WebSocket
frames but instead consume subsequent bytes as part of the payload, and update
the contract to require implementations to close clients that have received a
partial buffered frame upon cancellation rather than relying on browser-side
discard behavior.

In `@src/core/DevicesModule.h`:
- Around line 383-390: The comment block containing kMdnsBrowseMs and
kMdnsEveryTicks constants includes historical explanation about the old async
API behavior (the portion describing how "The old async API polled cheaply every
tick but raced..."). Remove this changelog prose that describes past
implementation details. Additionally, update the nearby comment block that still
describes mDNS as async and non-blocking every tick to accurately reflect the
current synchronous and throttled contract. Rewrite both comment sections in
present-tense to describe what the current implementation does (the synchronous
polling with throttling mechanism) rather than explaining what the old behavior
was.
- Around line 273-278: The comment block for the kProbeTimeoutMs constant
contains forward-looking language about a backlog fix that should be removed to
keep the comment focused on the current invariant. Remove the clause starting
with "The real fix — running the probe on its own task..." and rewrite the
remaining comment to describe only the present-tense rationale for the 30ms
timeout value. Leave any backlog items for tracking in docs/backlog/backlog.md
instead of including them in source code comments.

In `@src/core/HttpServerModule.cpp`:
- Around line 49-51: In the comment block within HttpServerModule.cpp, replace
the forward-looking roadmap language about the two-core render/transport split
with present tense architecture guidance. Rewrite the phrase "When the two-core
render/transport split lands" to describe the current two-task architecture in
present tense, clarifying where the drain belongs and identifying the
architectural seam (such as loop20ms) that represents this boundary. Remove all
future-looking language and use statements that describe the existing design
rather than planned changes.
- Around line 1239-1267: Add timeout tracking to prevent buffered sends from
blocking indefinitely when valid clients repeatedly return 0 from writeSome().
Modify the drainPreviewSend() method to track either per-client idle tick
counters in the previewSend_ structure or the total age of the frame being sent,
then close any client that exceeds a bounded timeout window within the loop that
iterates through wsClients_ and checks ws.valid(). This allows allDone to
eventually become true and previewSend_.active to be set to false, enabling the
producer to send a replacement frame without waiting for external events.
- Around line 1081-1086: The current code at the previewSend_.active assignment
abandons in-flight frames without closing clients that are mid-transmission,
causing parsing corruption. Create a cancelBufferedSend() helper method that
checks if previewSend_.active is true, calculates the total frame size by adding
previewSend_.hdrLen and previewSend_.bodyLen, iterates through all
MAX_WS_CLIENTS to identify any clients where 0 < sent[i] < total (indicating
partial transmission), closes those clients via wsClients_[i].close(), then
clears all sent[i] counters and resets previewSend_.active, previewSend_.body,
previewSend_.bodyLen, and previewSend_.hdrLen to their default states. Replace
the direct assignment of previewSend_.active = false with a call to this
cancelBufferedSend() helper, and use the same helper for any existing
cancelBufferedSend() method to ensure consistent behavior.

In `@src/core/HttpServerModule.h`:
- Around line 116-122: The comment in the HttpServerModule.h file (starting with
"Resumable full-frame send") contains references to a nonexistent
bodyGeneration_ field and guard mechanism that do not exist in the PreviewSend
class. Remove the portion of the comment that describes bodyGeneration_
invalidating sends and the stale drain guard behavior (the text about
"bodyGeneration_ invalidates a send..." through "...between cancel and the next
begin"). Additionally, replace the phrase "newest-wins" with
"drop-new/current-frame-wins" to better describe the frame dropping behavior.
Keep the core explanation about the resumable send mechanism, buffering
strategy, and the one-in-flight-at-a-time constraint.

In `@src/light/drivers/PreviewDriver.h`:
- Around line 108-110: The framesWaiting_ member variable is declared as
uint8_t, which can only hold values 0-255, causing the counter to wrap when
incremented past 255 fps slots. This wrapping breaks the slow-frame detection in
the sentFrameWasSlow check, delaying downscaling on slow links. Implement a
saturating counter pattern for framesWaiting_ by preventing it from exceeding
its maximum useful value instead of allowing it to overflow. Apply this fix to
both the increment operation around line 109 (in the else block) and the other
location mentioned at line 417.
- Around line 84-87: The issue is that `loop()` method calls
`buildAndSendCoordTable()` and `sendFrame()` which trigger staged WebSocket
writes using beginBinaryFrame/pushBinaryFrame/endBinaryFrame operations that are
drained synchronously with repeated writeSome() calls in the hot path. This can
block the render tick when dealing with slow browser connections. Refactor these
operations to either queue the staged streams for transport cadence similar to
how sendBufferedFrame() works, or convert the writes to single non-blocking
attempts with resumable state that can complete asynchronously. Ensure that
buildAndSendCoordTable() and the downsampled sendFrame() path do not perform
blocking socket drain work directly within the loop() method execution.

In `@test/unit/light/unit_GridLayout.cpp`:
- Around line 88-94: The second pass of the test accesses coords[4] and
coords[5] after clearing and refilling the coords vector without verifying the
vector has sufficient size. If the count of coordinates regresses, accessing
these indices will cause undefined behavior instead of a clear test failure. Add
a size assertion (CHECK) immediately after the grid.forEachCoord call in the
second pass to verify that coords has at least 6 elements before attempting to
access coords[4] and coords[5].

---

Outside diff comments:
In `@src/ui/preview3d.js`:
- Around line 467-471: The global parser state updates (the previewStride_
assignment and updatePreviewStatus() call on lines 469-470) must be moved to
after the payload-length validation check on line 471. Currently, if the buffer
is truncated and the function returns early due to insufficient bytes,
previewStride_ and preview status have already been modified without the
corresponding coordinate data being updated, causing the parser state to become
inconsistent. Move both the previewStride_ assignment and updatePreviewStatus()
call to execute only after confirming that buf.byteLength is sufficient for the
full payload.
- Around line 338-356: The comments in the resize event listener section and the
preview-dock click handler use future/hypothetical tense ("would only catch",
"would be") which violates the repo's present-tense documentation guideline.
Rewrite the comment above the resize addEventListener call to describe in
present tense what matchMedia and resize listeners actually do together, and
rewrite the comment following the preview-dock click handler to describe in
present tense how the dismissal behavior works across docked and PiP modes
instead of using conditional language like "would be". Also apply the same
present-tense rewriting to the comments in the sections mentioned to exist
around lines 501-516.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 899c75b1-6b10-4ebf-ae1e-5272fd1339f4

📥 Commits

Reviewing files that changed from the base of the PR and between 1e48e92 and 63a8fe8.

⛔ Files ignored due to path filters (1)
  • scripts/build/build_esp32.py is excluded by !**/build/**
📒 Files selected for processing (36)
  • docs/architecture.md
  • docs/backlog/backlog.md
  • docs/building.md
  • docs/history/plans/Plan-20260622 - Non-blocking preview send.md
  • docs/history/plans/Plan-20260623 - Resumable adaptive preview send.md
  • docs/history/plans/Plan-20260623 - Stream preview from buffers, zero preview buffers.md
  • docs/install/firmwares.json
  • docs/moonmodules/core/HttpServerModule.md
  • docs/moonmodules/core/ImprovProvisioningModule.md
  • docs/moonmodules/light/drivers/NetworkSendDriver.md
  • docs/moonmodules/light/drivers/PreviewDriver.md
  • docs/moonmodules/light/layouts/GridLayout.md
  • docs/testing.md
  • scripts/MoonDeck.md
  • scripts/scenario/run_live_scenario.py
  • src/core/BinaryBroadcaster.h
  • src/core/DevicesModule.h
  • src/core/HttpServerModule.cpp
  • src/core/HttpServerModule.h
  • src/light/drivers/NetworkSendDriver.h
  • src/light/drivers/PreviewDriver.h
  • src/light/drivers/RmtLedDriver.h
  • src/light/effects/LinesEffect.h
  • src/light/layers/Layer.h
  • src/light/layouts/GridLayout.h
  • src/platform/desktop/platform_desktop.cpp
  • src/platform/esp32/platform_config.h
  • src/platform/esp32/platform_esp32.cpp
  • src/ui/index.html
  • src/ui/preview3d.js
  • src/ui/style.css
  • test/scenarios/light/scenario_Driver_mutation.json
  • test/unit/core/unit_HttpServerModule_apply.cpp
  • test/unit/light/unit_GridLayout.cpp
  • test/unit/light/unit_Layer_sparse_mapping.cpp
  • test/unit/light/unit_PreviewDriver.cpp

Comment thread docs/architecture.md Outdated
Comment thread docs/moonmodules/light/drivers/NetworkSendDriver.md Outdated
Comment thread src/core/BinaryBroadcaster.h Outdated
Comment on lines +35 to +43
// sendBufferedFrame(...) — begin a send; while one is in flight a new call is DROPPED
// (newest-wins backpressure), the caller reads that as "link busy".
// bufferedSendIdle() — true when no send is in flight (the previous frame fully drained
// or was cancelled). The caller gates the next frame on this, so the
// effective frame rate self-limits to what the link sustains.
// cancelBufferedSend() — abandon the in-flight send NOW (its WS messages end incomplete →
// the browser discards them). The caller MUST call this before the
// `body` buffer is freed/reallocated (e.g. a geometry rebuild), so a
// cursor never reads freed memory.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Fix the buffered-send abort contract.

A browser does not discard a partially sent WebSocket frame while the TCP connection stays open; it consumes subsequent frame bytes as the unfinished payload. The contract should require implementations to close clients that have received a partial buffered frame on cancel. Also rename newest-wins to drop-new/current-frame-wins because the in-flight frame is retained.

Contract wording sketch
-    //   sendBufferedFrame(...) — begin a send; while one is in flight a new call is DROPPED
-    //                            (newest-wins backpressure), the caller reads that as "link busy".
+    //   sendBufferedFrame(...) — begin a send; while one is in flight a new call is dropped
+    //                            (drop-new backpressure), the caller reads that as "link busy".
@@
-    //   cancelBufferedSend()   — abandon the in-flight send NOW (its WS messages end incomplete →
-    //                            the browser discards them). The caller MUST call this before the
+    //   cancelBufferedSend()   — abort the in-flight send and close clients that have received a
+    //                            partial WS frame. The caller MUST call this before the
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// sendBufferedFrame(...) — begin a send; while one is in flight a new call is DROPPED
// (newest-wins backpressure), the caller reads that as "link busy".
// bufferedSendIdle() — true when no send is in flight (the previous frame fully drained
// or was cancelled). The caller gates the next frame on this, so the
// effective frame rate self-limits to what the link sustains.
// cancelBufferedSend() — abandon the in-flight send NOW (its WS messages end incomplete →
// the browser discards them). The caller MUST call this before the
// `body` buffer is freed/reallocated (e.g. a geometry rebuild), so a
// cursor never reads freed memory.
// sendBufferedFrame(...) — begin a send; while one is in flight a new call is dropped
// (drop-new backpressure), the caller reads that as "link busy".
// bufferedSendIdle() — true when no send is in flight (the previous frame fully drained
// or was cancelled). The caller gates the next frame on this, so the
// effective frame rate self-limits to what the link sustains.
// cancelBufferedSend() — abort the in-flight send and close clients that have received a
// partial WS frame. The caller MUST call this before the
// `body` buffer is freed/reallocated (e.g. a geometry rebuild), so a
// cursor never reads freed memory.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/BinaryBroadcaster.h` around lines 35 - 43, Update the documentation
comments for the BinaryBroadcaster class methods starting at
sendBufferedFrame(). First, change the "newest-wins backpressure" terminology to
"drop-new-frame-wins" or "current-frame-wins" to accurately describe the
behavior. Second, fix the cancelBufferedSend() documentation to reflect that
browsers do not discard partially sent WebSocket frames but instead consume
subsequent bytes as part of the payload, and update the contract to require
implementations to close clients that have received a partial buffered frame
upon cancellation rather than relying on browser-side discard behavior.

Comment on lines +1239 to +1267
void HttpServerModule::drainPreviewSend() {
if (!previewSend_.active) return;
const size_t total = previewSend_.hdrLen + previewSend_.bodyLen;
const size_t chunk = previewChunkBytes();
bool anyLiveClient = false;
bool allDone = true;
for (int i = 0; i < MAX_WS_CLIENTS; i++) {
auto& ws = wsClients_[i];
if (!ws.valid()) continue;
switch (ws.writeChunks(chunks, totalChunks)) {
case platform::WriteResult::Complete:
case platform::WriteResult::WouldBlock:
// WouldBlock: browser is backpressured — skip this frame,
// keep the connection open (the next frame may fit).
break;
case platform::WriteResult::Partial:
case platform::WriteResult::Error:
// Partial: a truncated WS message went out — the stream is
// corrupt, the connection must be dropped. Error: dead socket.
ws.close();
break;
anyLiveClient = true;
size_t& cur = previewSend_.sent[i];
size_t budget = chunk; // bound bytes pushed to THIS client this tick → bounded tick cost
while (cur < total && budget > 0) {
// Source the next byte run from hdr (cursor < hdrLen) or body (cursor >= hdrLen).
const uint8_t* src;
size_t span;
if (cur < previewSend_.hdrLen) { src = previewSend_.hdr + cur; span = previewSend_.hdrLen - cur; }
else { src = previewSend_.body + (cur - previewSend_.hdrLen); span = total - cur; }
if (span > budget) span = budget;
int n = ws.writeSome(src, span);
if (n < 0) { ws.close(); break; } // real error — drop this client
if (n == 0) break; // WouldBlock — leave the rest for next tick (no spin)
cur += static_cast<size_t>(n);
budget -= static_cast<size_t>(n);
}
if (ws.valid() && cur < total) allDone = false;
}
// Done when every live client finished, or no client remains to send to.
if (!anyLiveClient || allDone) previewSend_.active = false;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

Add a timeout for buffered sends that stop draining.

If a valid client keeps returning 0 from writeSome(), allDone stays false and previewSend_.active remains true forever. The producer only observes bufferedSendIdle() == false, so it cannot send a smaller replacement frame until an external rebuild/client event cancels the send. Track per-client idle ticks or total frame age and close unfinished clients after a bounded window.

🧰 Tools
🪛 Clang (14.0.6)

[warning] 1240-1240: statement should be inside braces

(readability-braces-around-statements)


[warning] 1246-1246: variable name 'ws' is too short, expected at least 3 characters

(readability-identifier-length)


[warning] 1247-1247: statement should be inside braces

(readability-braces-around-statements)


[warning] 1253-1253: variable 'src' is not initialized

(cppcoreguidelines-init-variables)


[warning] 1254-1254: variable 'span' is not initialized

(cppcoreguidelines-init-variables)


[warning] 1257-1257: statement should be inside braces

(readability-braces-around-statements)


[warning] 1258-1258: variable name 'n' is too short, expected at least 3 characters

(readability-identifier-length)


[warning] 1260-1260: statement should be inside braces

(readability-braces-around-statements)


[warning] 1264-1264: statement should be inside braces

(readability-braces-around-statements)


[warning] 1267-1267: statement should be inside braces

(readability-braces-around-statements)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/HttpServerModule.cpp` around lines 1239 - 1267, Add timeout tracking
to prevent buffered sends from blocking indefinitely when valid clients
repeatedly return 0 from writeSome(). Modify the drainPreviewSend() method to
track either per-client idle tick counters in the previewSend_ structure or the
total age of the frame being sent, then close any client that exceeds a bounded
timeout window within the loop that iterates through wsClients_ and checks
ws.valid(). This allows allDone to eventually become true and
previewSend_.active to be set to false, enabling the producer to send a
replacement frame without waiting for external events.

Comment thread src/core/HttpServerModule.h Outdated
Comment on lines +84 to +87
if (coordCount_ == 0 || gen != lastClientGen_ || coordPending_) {
lastClientGen_ = gen;
buildAndSendCoordTable(); // streams positions; sets coordPending_ if not all clients got it
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

Keep staged WebSocket writes off loop().

loop() can call buildAndSendCoordTable() and the downsampled sendFrame() path, both of which use beginBinaryFrame() / pushBinaryFrame() / endBinaryFrame(). The current server implementation drains each pushed slice with repeated writeSome() calls, so a slow browser can charge socket-drain work to the render tick. Queue these staged streams for the transport cadence like sendBufferedFrame(), or make each push a single non-blocking attempt with resumable state. Based on learnings, hot-path code must avoid blocking work in loop() methods.

Also applies to: 123-131, 297-330

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/light/drivers/PreviewDriver.h` around lines 84 - 87, The issue is that
`loop()` method calls `buildAndSendCoordTable()` and `sendFrame()` which trigger
staged WebSocket writes using beginBinaryFrame/pushBinaryFrame/endBinaryFrame
operations that are drained synchronously with repeated writeSome() calls in the
hot path. This can block the render tick when dealing with slow browser
connections. Refactor these operations to either queue the staged streams for
transport cadence similar to how sendBufferedFrame() works, or convert the
writes to single non-blocking attempts with resumable state that can complete
asynchronously. Ensure that buildAndSendCoordTable() and the downsampled
sendFrame() path do not perform blocking socket drain work directly within the
loop() method execution.

Source: Learnings

Comment thread src/light/drivers/PreviewDriver.h
Comment thread test/unit/light/unit_GridLayout.cpp
The UI's audio level read 0 even while the level LEDs moved — it sampled the live value once a second and caught the gaps between beats; it now shows the per-second peak and is renamed `level RMS` (the algorithm and what drives the LEDs are unchanged). The four firmware-identity controls move from the System card to the Firmware card where they belong. Plus a batch of CodeRabbit review findings on the preview/devices code.

KPI: 16384lights | PC:383KB | tick:117/89/117/9/1/316/37/16/20/117/11us(FPS:8547/11235/8547/111111/1000000/3164/27027/62500/50000/8547/90909) | ESP32:1223KB | tick:3228us(FPS:309) | heap:8349KB | src:97(20038) | test:69(10594) | lizard:75w

Core:
- AudioModule: the `level` UI control was snapshotted from frame.level once per second, catching the silence between beats and reading 0 even while the level meter LEDs (driven live every render tick from the same value) moved with the music. It now displays the PEAK level over each 1-second window (tracked in loop(), reset in loop1s()) — display-only, the live frame.level the effects/LEDs use is untouched. Renamed the control `level` → `level RMS` to make clear it is the RMS loudness, distinct from the per-band spectrum values.
- FirmwareUpdateModule: now owns the firmware-identity controls — `version`, `build`, `firmware` (populated in setup() from build_info.h) and the `firmwarePartition` progress bar (queried in onBuildControls()). These moved from SystemModule: firmware identity belongs on the firmware card, next to the OTA status it relates to.
- SystemModule: dropped `version` / `build` / `firmware` / `firmwarePartition` and their backing fields + population (moved to FirmwareUpdateModule); dropped the now-unused build_info.h include. System keeps the hardware/system fields (chip, sdk, flash, bootReason, heap/psram/filesystem, wifiCoproc).
- HttpServerModule / BinaryBroadcaster (🐇): "newest-wins" → "drop-new" in the resumable-send docs (a frame offered while one is in flight is dropped, the in-flight one kept — the comments said the inverse); removed the stale bodyGeneration_ comment (no such field); corrected the cancelBufferedSend contract (a new-client cancel also re-sends the coord table, so a partial frame is re-primed, not left mis-framed); present-tense the loop20ms drain comment (no "when the split lands").
- DevicesModule (🐇): dropped the changelog prose ("the old async API…") and the backlog clause from the mDNS / probe-timeout comments; present-tense the synchronous-throttled mDNS contract.
- PreviewDriver (🐇): framesWaiting_ saturates at 255 instead of wrapping (a uint8_t wrap delayed the slow-link downscale).

UI:
- app.js: the install-picker read the `firmware` variant key off SystemModule; it now reads it off FirmwareUpdateModule's own control (the control moved there).
- preview3d.js (🐇): the 0x03 parser updated previewStride_ + the status line BEFORE validating the payload length — a truncated buffer left the parser state describing coords it never stored; the state mutation now happens only after the length check. Present-tense two resize/dock comments.

Scripts:
- run_live_scenario.py: reads the build-target `firmware` key from FirmwareUpdateModule now (it moved off SystemModule).

Tests:
- unit_FirmwareUpdateModule (new): the firmware control is present + non-empty, and version/build are present — moved from unit_SystemModule (which dropped that case). Registered in test/CMakeLists.txt.
- unit_GridLayout (🐇): guard the serpentine-off index reads with a size REQUIRE so a count regression fails clearly instead of UB.

Docs:
- moonmodules: firmware control descriptions moved SystemModule.md → FirmwareUpdateModule.md; AudioModule.md documents `level RMS` (peak-over-window display). architecture.md: drop-new wording + present-tense. The two history/ plan files: markdownlint (code-fence language tag, heading blank line) + the archival-justification note on the sibling plan.

Reviews:
- 🐇 CodeRabbit findings processed as above. Skipped (with reason in the PR thread): the suggestions to make the coord-table / downsampled send resumable and to close partial clients on cancel — those re-introduce the send-path changes reverted earlier this branch as a regression on all three boards; the current synchronous coord path is the verified-working state.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/core/HttpServerModule.h (1)

44-45: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Rewrite absence narration in the preview-send comments.

never spins / never reads freed memory violate the C++ comment tense rule. Prefer direct present-tense behavior wording such as “drains with bounded loop20ms work” and “the caller cancels before freeing/reallocating the body.”

Also applies to: 120-122

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/HttpServerModule.h` around lines 44 - 45, Revise the preview-send
comments in HttpServerModule to use direct present-tense behavior wording
instead of absence narration; update the comment near the resumable one-frame
send and the related comment around BinaryBroadcaster so they describe what the
code does now, such as bounded loop20ms draining and caller cancellation before
body reallocation, rather than saying it “never spins” or “never reads freed
memory.”

Source: Coding guidelines

docs/history/plans/Plan-20260623 - Resumable adaptive preview send.md (1)

37-43: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Mark the superseded generation-guard design or update it to the implemented cancel contract.

This archive still documents bodyGeneration, a drain-time tag check, and immediate header sends, but the current PreviewSend state uses only copied header/body pointers, cursors, and active. Either prune these absorbed details or add a short “superseded by cancelBufferedSend() + clientGeneration re-prime” note. As per path instructions, “History documents are backward-looking and should be pruned when their lessons are absorbed into code, docs, or principles.”

Also applies to: 57-65

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/history/plans/Plan-20260623` - Resumable adaptive preview send.md around
lines 37 - 43, The history note still describes the old generation-guard based
PreviewSend design, but the implemented contract now uses cancelBufferedSend()
and clientGeneration re-prime with copied header/body pointers, cursors, and
active state. Update the plan entry by removing or pruning the obsolete
bodyGeneration/drain-time tag-check language, or add a brief superseded-by note
that points to cancelBufferedSend() and the current PreviewSend flow so the
documentation matches the code.

Source: Path instructions

src/core/HttpServerModule.cpp (1)

1196-1199: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Rename this to drop-new backpressure.

The active frame is kept and the newly offered frame is rejected, so Newest-wins describes the opposite behavior and conflicts with the interface wording.

Suggested wording
-    // Newest-wins backpressure: one frame in flight at a time. A caller that asks while a send is
-    // active is told "busy" and drops this frame — the producer reads that as the link not keeping
+    // Drop-new backpressure: one frame in flight at a time. A caller that asks while a send is
+    // active is told "busy" and drops the offered frame — the producer reads that as the link not keeping
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/HttpServerModule.cpp` around lines 1196 - 1199, The backpressure
comment and related wording in HttpServerModule’s preview send path describe the
wrong policy: the code in the previewSend_ check keeps the active frame and
rejects the newly offered one, so rename the behavior from “Newest-wins” to
“drop-new” and update the surrounding comment/message to match the actual
interface semantics. Use the previewSend_ active-state check to locate the spot
and make the terminology consistent everywhere it’s referenced.
src/light/drivers/PreviewDriver.h (1)

83-87: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Cancel the in-flight buffered frame before the generation-triggered coord resend.

A clientGeneration() bump can arrive while the full-res 0x02 sendBufferedFrame() is still draining. This block immediately starts a streamed 0x03 via buildAndSendCoordTable(), so the server can write a new WebSocket message before the old buffered one finishes. BinaryBroadcaster’s contract already calls out new-client connect as a cancelBufferedSend() case.

Suggested fix
         uint32_t gen = broadcaster_ ? broadcaster_->clientGeneration() : 0;
         if (coordCount_ == 0 || gen != lastClientGen_ || coordPending_) {
+            if (gen != lastClientGen_ && broadcaster_) broadcaster_->cancelBufferedSend();
             lastClientGen_ = gen;
             buildAndSendCoordTable();   // streams positions; sets coordPending_ if not all clients got it
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/light/drivers/PreviewDriver.h` around lines 83 - 87, The
generation-triggered coord resend in PreviewDriver::buildAndSendCoordTable is
starting while a buffered send may still be in flight from sendBufferedFrame,
which can interleave WebSocket messages. Before calling buildAndSendCoordTable
when clientGeneration() changes, cancel any active buffered frame through
BinaryBroadcaster’s cancelBufferedSend path (the same lifecycle used for
new-client connect) and then proceed with the coord resend; keep the
lastClientGen_ update and coordPending_ flow unchanged.
♻️ Duplicate comments (6)
src/core/HttpServerModule.cpp (3)

1082-1086: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Use the real buffered-send cancel path here.

This direct active = false has the same partial-frame corruption risk as cancelBufferedSend(): existing clients can remain mid-message while the next bytes are parsed as payload. After fixing the helper, call it here instead of clearing the flag.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/HttpServerModule.cpp` around lines 1082 - 1086, The buffered-send
abandon logic in HttpServerModule should use the existing cancellation helper
instead of manually clearing the active flag. Update the code path that
currently sets previewSend_.active to false so it calls cancelBufferedSend() (or
the helper you just fixed) to ensure any in-flight frame is fully cancelled and
clients cannot stay mid-message. Keep the generation bump behavior intact around
this change.

1239-1268: 🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

Bound buffered sends that make no progress.

If a live client keeps returning 0 from writeSome(), previewSend_.active remains true forever and the producer only sees bufferedSendIdle() == false. Track per-client idle ticks or frame age and close unfinished clients after a bounded window so replacement frames can resume.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/HttpServerModule.cpp` around lines 1239 - 1268, drainPreviewSend()
can stay active forever if a live wsClients_ entry keeps returning 0 from
writeSome(), because previewSend_.sent never advances and previewSend_.active is
never cleared. Add a bounded progress watchdog in
HttpServerModule::drainPreviewSend() tied to each client’s previewSend_ state,
such as per-client idle ticks or frame age, and close or drop unfinished clients
once they exceed the limit so new preview frames can proceed. Use the existing
symbols previewSend_, wsClients_, ws.writeSome(), and previewChunkBytes() to
implement the timeout cleanly.

1161-1189: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Track the streamed frame’s declared length across begin/push/end.

beginBinaryFrame(totalLen) sends a WS header but no state verifies that later pushBinaryFrame() calls add up to exactly totalLen; under-push leaves clients waiting for payload, and over-push corrupts the next WS frame boundary. Add open/remainingBytes state and make endBinaryFrame() fail/close partial clients on mismatch.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/HttpServerModule.cpp` around lines 1161 - 1189,
`HttpServerModule::beginBinaryFrame`, `pushBinaryFrame`, and `endBinaryFrame`
currently send a header and payload without tracking whether the pushed bytes
match the declared `totalLen`. Add per-frame state on `HttpServerModule` (such
as an open flag and remaining/expected byte count) so `pushBinaryFrame()`
decrements and validates progress, and make `endBinaryFrame()` detect
mismatches, fail the frame, and close or mark partial websocket clients
accordingly.
src/core/BinaryBroadcaster.h (1)

41-46: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Require cancel to close partially framed clients.

This still documents an unsafe abort: a browser that has received part of a WebSocket message stays inside that message, so the next “fresh” frame is consumed as payload unless the connection is closed. Make the contract require closing clients with partial buffered-frame bytes before clearing the in-flight send.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/BinaryBroadcaster.h` around lines 41 - 46, The BinaryBroadcaster
buffered-send contract still allows an unsafe abort path: a client that has
already received part of a WebSocket frame can remain partially framed and
consume the next message incorrectly. Update the cancelBufferedSend-related
contract in BinaryBroadcaster to require closing any client with partial
buffered-frame bytes before clearing the in-flight send, and adjust the
surrounding documentation/comments so it reflects that partial sends must not be
left open.
src/core/HttpServerModule.h (1)

49-49: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Make buffered-send cancellation close partial clients and reset state.

Clearing only active abandons any clients already inside the WS payload and leaves stale cursors/body metadata. Move this out of line and close clients where 0 < sent[i] < hdrLen + bodyLen, then reset sent, hdrLen, body, and bodyLen; use the same helper from new-client handling.

Suggested direction
-    void cancelBufferedSend() override { previewSend_.active = false; }
+    void cancelBufferedSend() override;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/HttpServerModule.h` at line 49, The buffered-send cancellation in
`cancelBufferedSend()` only flips `previewSend_.active`, but it leaves partially
sent clients and stale payload state behind. Move this logic out of the inline
override into a shared helper used by both cancellation and new-client handling,
and ensure it closes any client with `0 < sent[i] < hdrLen + bodyLen` before
resetting `sent`, `hdrLen`, `body`, and `bodyLen` on the preview send state.
src/light/drivers/PreviewDriver.h (1)

84-87: 🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

loop() still does synchronous socket-drain work on the coord-table and downsampled paths.

This is the same hot-path problem noted earlier: loop() can still reach beginBinaryFrame() / pushBinaryFrame() / endBinaryFrame(), and HttpServerModule::sendAllOrClose() drains those slices with repeated writeSome() calls. Slow clients therefore still charge variable socket work to PreviewDriver::loop() whenever coords resend or stride > 1/cpl != 3 forces the streamed path.

Also applies to: 105-106, 130-130, 297-330

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/light/drivers/PreviewDriver.h` around lines 84 - 87,
`PreviewDriver::loop()` still performs synchronous socket-drain work through
coord-table resends and the streamed/downsampled paths, so move that send/drain
work off the hot path. Update `buildAndSendCoordTable()`, the
`beginBinaryFrame()`/`pushBinaryFrame()`/`endBinaryFrame()` flow, and
`HttpServerModule::sendAllOrClose()` so `loop()` only queues or marks pending
data while the actual `writeSome()` draining happens asynchronously or in a
separate send step. Keep the fix centered on `PreviewDriver`,
`buildAndSendCoordTable`, and `HttpServerModule::sendAllOrClose` so slow clients
no longer add variable socket cost to `loop()`.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/architecture.md`:
- Line 314: The transport contract in the preview architecture text uses
absolute “never” guarantees that overstate the implementation. Rephrase the
`PreviewDriver` + `HttpServerModule` description to focus on bounded work,
complete delivered frames, adaptive downsampling, and reconnect behavior, and
remove any wording that promises the connection or render loop is never shed
when `sendAllOrClose()` may close slow direct-stream clients.

In `@docs/moonmodules/core/FirmwareUpdateModule.md`:
- Around line 11-14: The control-location rename is incomplete in this spec: the
table now places `version`, `build`, `firmware`, and `firmwarePartition` under
`FirmwareUpdateModule`, but the later lifecycle/OTA text still references the
`SystemModule` card. Update the remaining prose in this document to consistently
describe these controls as belonging to `FirmwareUpdateModule` and align any OTA
flow wording so readers follow the same module name throughout.

In `@docs/moonmodules/light/drivers/NetworkSendDriver.md`:
- Line 12: The NetworkSendDriver docs entry for light_count uses
absence-narration and negative phrasing that violates the docs style rule.
Reword the sentence in positive present tense in the same Markdown section so it
describes what happens for owned lights and slice-based sending without using
phrases like “isn’t” or “doesn’t”; update the wording around the light_count
description to keep the behavior clear while staying in positive phrasing.

In `@src/core/AudioModule.h`:
- Line 93: The control name exposed from AudioModule::initControls() is
misleading because loop1s() publishes the maximum RMS-derived level over the
last second, not a raw RMS snapshot. Rename the read-only control added via
controls_.addReadOnly("level RMS", ...) to match the sampled statistic, and keep
the label consistent with the value computed in loop1s() and stored in
levelStr_.

In `@src/core/DevicesModule.h`:
- Around line 376-382: Update the mDNS browse comment near
DevicesModule::mdnsBrowse so it matches the platform.h contract by saying it
blocks up to timeoutMs, not the full timeout, and rewrite the wording in present
tense. Also remove absence-narration phrases like “does not,” “doesn’t,” and
“can’t” from that comment block while preserving the same intent and details.

In `@test/unit/core/unit_FirmwareUpdateModule.cpp`:
- Around line 33-40: The FirmwareUpdateModule test currently verifies only the
version/build/firmware string controls, but it does not assert the new
firmwarePartition control added to the module contract. Update the unit test
around fw.controls() to also check for firmwarePartition alongside the existing
control-name checks, using the same control iteration pattern and identifiers in
unit_FirmwareUpdateModule so the spec change is covered.

---

Outside diff comments:
In `@docs/history/plans/Plan-20260623` - Resumable adaptive preview send.md:
- Around line 37-43: The history note still describes the old generation-guard
based PreviewSend design, but the implemented contract now uses
cancelBufferedSend() and clientGeneration re-prime with copied header/body
pointers, cursors, and active state. Update the plan entry by removing or
pruning the obsolete bodyGeneration/drain-time tag-check language, or add a
brief superseded-by note that points to cancelBufferedSend() and the current
PreviewSend flow so the documentation matches the code.

In `@src/core/HttpServerModule.cpp`:
- Around line 1196-1199: The backpressure comment and related wording in
HttpServerModule’s preview send path describe the wrong policy: the code in the
previewSend_ check keeps the active frame and rejects the newly offered one, so
rename the behavior from “Newest-wins” to “drop-new” and update the surrounding
comment/message to match the actual interface semantics. Use the previewSend_
active-state check to locate the spot and make the terminology consistent
everywhere it’s referenced.

In `@src/core/HttpServerModule.h`:
- Around line 44-45: Revise the preview-send comments in HttpServerModule to use
direct present-tense behavior wording instead of absence narration; update the
comment near the resumable one-frame send and the related comment around
BinaryBroadcaster so they describe what the code does now, such as bounded
loop20ms draining and caller cancellation before body reallocation, rather than
saying it “never spins” or “never reads freed memory.”

In `@src/light/drivers/PreviewDriver.h`:
- Around line 83-87: The generation-triggered coord resend in
PreviewDriver::buildAndSendCoordTable is starting while a buffered send may
still be in flight from sendBufferedFrame, which can interleave WebSocket
messages. Before calling buildAndSendCoordTable when clientGeneration() changes,
cancel any active buffered frame through BinaryBroadcaster’s cancelBufferedSend
path (the same lifecycle used for new-client connect) and then proceed with the
coord resend; keep the lastClientGen_ update and coordPending_ flow unchanged.

---

Duplicate comments:
In `@src/core/BinaryBroadcaster.h`:
- Around line 41-46: The BinaryBroadcaster buffered-send contract still allows
an unsafe abort path: a client that has already received part of a WebSocket
frame can remain partially framed and consume the next message incorrectly.
Update the cancelBufferedSend-related contract in BinaryBroadcaster to require
closing any client with partial buffered-frame bytes before clearing the
in-flight send, and adjust the surrounding documentation/comments so it reflects
that partial sends must not be left open.

In `@src/core/HttpServerModule.cpp`:
- Around line 1082-1086: The buffered-send abandon logic in HttpServerModule
should use the existing cancellation helper instead of manually clearing the
active flag. Update the code path that currently sets previewSend_.active to
false so it calls cancelBufferedSend() (or the helper you just fixed) to ensure
any in-flight frame is fully cancelled and clients cannot stay mid-message. Keep
the generation bump behavior intact around this change.
- Around line 1239-1268: drainPreviewSend() can stay active forever if a live
wsClients_ entry keeps returning 0 from writeSome(), because previewSend_.sent
never advances and previewSend_.active is never cleared. Add a bounded progress
watchdog in HttpServerModule::drainPreviewSend() tied to each client’s
previewSend_ state, such as per-client idle ticks or frame age, and close or
drop unfinished clients once they exceed the limit so new preview frames can
proceed. Use the existing symbols previewSend_, wsClients_, ws.writeSome(), and
previewChunkBytes() to implement the timeout cleanly.
- Around line 1161-1189: `HttpServerModule::beginBinaryFrame`,
`pushBinaryFrame`, and `endBinaryFrame` currently send a header and payload
without tracking whether the pushed bytes match the declared `totalLen`. Add
per-frame state on `HttpServerModule` (such as an open flag and
remaining/expected byte count) so `pushBinaryFrame()` decrements and validates
progress, and make `endBinaryFrame()` detect mismatches, fail the frame, and
close or mark partial websocket clients accordingly.

In `@src/core/HttpServerModule.h`:
- Line 49: The buffered-send cancellation in `cancelBufferedSend()` only flips
`previewSend_.active`, but it leaves partially sent clients and stale payload
state behind. Move this logic out of the inline override into a shared helper
used by both cancellation and new-client handling, and ensure it closes any
client with `0 < sent[i] < hdrLen + bodyLen` before resetting `sent`, `hdrLen`,
`body`, and `bodyLen` on the preview send state.

In `@src/light/drivers/PreviewDriver.h`:
- Around line 84-87: `PreviewDriver::loop()` still performs synchronous
socket-drain work through coord-table resends and the streamed/downsampled
paths, so move that send/drain work off the hot path. Update
`buildAndSendCoordTable()`, the
`beginBinaryFrame()`/`pushBinaryFrame()`/`endBinaryFrame()` flow, and
`HttpServerModule::sendAllOrClose()` so `loop()` only queues or marks pending
data while the actual `writeSome()` draining happens asynchronously or in a
separate send step. Keep the fix centered on `PreviewDriver`,
`buildAndSendCoordTable`, and `HttpServerModule::sendAllOrClose` so slow clients
no longer add variable socket cost to `loop()`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7788f162-9162-47f1-b1fd-3cc48c361e44

📥 Commits

Reviewing files that changed from the base of the PR and between 63a8fe8 and 0c0e992.

📒 Files selected for processing (22)
  • docs/architecture.md
  • docs/history/plans/Plan-20260623 - Resumable adaptive preview send.md
  • docs/history/plans/Plan-20260623 - Stream preview from buffers, zero preview buffers.md
  • docs/moonmodules/core/AudioModule.md
  • docs/moonmodules/core/FirmwareUpdateModule.md
  • docs/moonmodules/core/SystemModule.md
  • docs/moonmodules/light/drivers/NetworkSendDriver.md
  • scripts/scenario/run_live_scenario.py
  • src/core/AudioModule.h
  • src/core/BinaryBroadcaster.h
  • src/core/DevicesModule.h
  • src/core/FirmwareUpdateModule.h
  • src/core/HttpServerModule.cpp
  • src/core/HttpServerModule.h
  • src/core/SystemModule.h
  • src/light/drivers/PreviewDriver.h
  • src/ui/app.js
  • src/ui/preview3d.js
  • test/CMakeLists.txt
  • test/unit/core/unit_FirmwareUpdateModule.cpp
  • test/unit/core/unit_SystemModule.cpp
  • test/unit/light/unit_GridLayout.cpp
💤 Files with no reviewable changes (1)
  • docs/moonmodules/core/SystemModule.md

Comment thread docs/architecture.md Outdated
Comment thread docs/moonmodules/core/FirmwareUpdateModule.md Outdated
Comment thread docs/moonmodules/light/drivers/NetworkSendDriver.md Outdated
Comment thread src/core/AudioModule.h
// nothing to persist, so ReadOnly (the display-only type) not a flipped
// Text — same idiom as SystemModule's uptime/fps.
controls_.addReadOnly("level", levelStr_, sizeof(levelStr_));
controls_.addReadOnly("level RMS", levelStr_, sizeof(levelStr_));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Rename the control to match the sampled statistic.

Line 93 exposes level RMS, but loop1s() publishes the maximum RMS-derived level seen over the last second, not a plain RMS snapshot. That makes the wire/UI name misleading for consumers reading this control.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/AudioModule.h` at line 93, The control name exposed from
AudioModule::initControls() is misleading because loop1s() publishes the maximum
RMS-derived level over the last second, not a raw RMS snapshot. Rename the
read-only control added via controls_.addReadOnly("level RMS", ...) to match the
sampled statistic, and keep the label consistent with the value computed in
loop1s() and stored in levelStr_.

Comment thread src/core/DevicesModule.h Outdated
Comment thread test/unit/core/unit_FirmwareUpdateModule.cpp Outdated
The Getting Started guide now has two chapters: chapter 1 installs projectMM (as before), chapter 2 tours the device's web interface for a new user — the list/preview/controls layout, the system modules, and the layout→layers→drivers pipeline, with links to dig deeper. A new browser-faithful preview-health probe (MoonDeck Live tab + CLI) measures the 3D-preview stream the way a real browser experiences it. Plus a batch of CodeRabbit review findings on docs and comments.

KPI: 16384lights | PC:383KB | tick:217/152/206/18/3/555/62/26/33/204/20/1us(FPS:4608/6578/4854/55555/333333/1801/16129/38461/30303/4901/50000/1000000) | ESP32:1223KB | tick:14650us(FPS:68) | heap:6129KB | src:97(20045) | test:69(10599) | lizard:75w

Core:
- AudioModule (🐇): the `level RMS` read-out clarified in a comment — the displayed value is the per-second PEAK of the RMS level (loop1s publishes levelPeak_), not a raw RMS snapshot; the live frame_.level the LEDs use is the instantaneous RMS. Kept the deliberate control name, fixed the misleading impression.
- DevicesModule (🐇): mDNS browse comment now says it blocks UP TO the timeout (matches the IDF mdns_query_ptr contract, not "the full timeout"), present-tense, dropped the absence-narration ("does not", "doesn't", "can't").
- HttpServerModule / BinaryBroadcaster (🐇): "Newest-wins" → "drop-new" in the resumable-send comment (the code keeps the in-flight frame, rejects the new one — the term was backwards); the resumable-send and cancelBufferedSend contract comments rewritten present-tense ("stays off the hot path", "keeping a cursor reading only live memory") instead of "never spins" / "never reads freed".

Scripts / MoonDeck:
- preview_health.py (new): browser-faithful preview-stream health probe — reads the binary frames, sends the 25s keepalive ping, and auto-reconnects on close like src/ui/app.js's connectWs, so its fps / reconnects / maxgap / SMOOTH-CHOPPY-DEAD verdict match what a person watching the preview sees. Pure WebSocket client, no device-side changes. Registered as "Preview Health" on MoonDeck's Live tab (needs_device) and documented in MoonDeck.md.

Tests:
- unit_FirmwareUpdateModule (🐇): also asserts the firmwarePartition control's type (Progress) when present — conditional, since it's gated on platform::firmwarePartition() > 0 and so is absent on desktop/test where that returns 0.

Docs / CI:
- gettingstarted.md: split into chapter 1 (install) + chapter 2 (the UI tour) — written for a new end user, no TL;DR, engaging rather than reference-style. Chapter 2 walks the responsive list/preview/controls layout, the ⏻ enable toggle, the 🕒 loop-speed and 🧠 memory read-outs, the System/Firmware/Network modules, and the layout→layers→drivers pipeline, each with a "know more" link to its module spec. Chapter 1 gained an explanation of the device-card capability pills (🟢 active / 🟡 supported / 🟠 planned) with a one-line teaser of the spec-and-test-first method, and the two previously-unused install screenshots (port-selected, device-details). The microphone note now names the Audio module and adds that any sensor/input peripheral lives in the System area, with more coming.
- assets/gettingstarted: renamed all 12 chapter-1 install images to the <chapter><sequence>-name convention (01-01 … 01-12, in install-flow order); added 10 chapter-2 UI images (02-01 … 02-10), compressing the oversized ones (the 1.9MB 2836px hero down to 150KB at 1600px, three others from 400-700KB to ~80KB) into the chapter-1 size range.
- architecture.md (🐇): the transport-backpressure description no longer claims the connection is "never" shed — the resumable full-res path never drops the link, while the synchronous coord/downsampled path closes a wedged client past the spin budget (it reconnects, a brief blip), which the prior absolute wording overstated.
- FirmwareUpdateModule.md (🐇): the OTA-flow step now says the new version/firmware appear on this Firmware card, not the SystemModule card (those controls moved here).
- NetworkSendDriver.md (🐇): the light_count description reworded in positive present tense (each frame packs and sends exactly the lights this sink owns) instead of absence-narration.

Reviews:
- 🐇 CodeRabbit findings processed as above. Skipped (with reason): the behavioral preview-send findings (drain stall-watchdog, partial-client close on cancel, pushBinaryFrame totalLen tracking, move loop() drain off the hot path, PreviewDriver gen-resend cancel) — these change preview/LED-hotpath behavior and overlap a resumable-send rework that regressed all three boards in testing; backlogged with the diagnosis rather than applied. Also skipped the Plan-20260623 history note — a docs/history/plans/ design-intent record is expected to diverge from the shipped code and is not kept in sync.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/core/AudioModule.h (1)

169-173: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Use present-tense wording in this comment block.

Line 171 and Line 172 use backward-looking phrasing (“caught… showed”). Please rewrite in present tense to match repo policy.

As per coding guidelines, "**/*.{h,hh,hpp,hxx,cpp,cc,cxx}: Write code, comments, and documentation in the present tense only; avoid changelog-style or backward-looking phrasing...".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/AudioModule.h` around lines 169 - 173, Update the comment block in
AudioModule.h to use present-tense wording throughout. In the PEAK-level
explanation near the frame_.level and UI string logic, replace backward-looking
phrasing like “caught” and “showed” with present-tense equivalents while keeping
the technical meaning unchanged. Preserve the same intent around display-only
window peak behavior and the live frame_.level path used by the effects/LEDs.

Source: Coding guidelines

docs/moonmodules/core/FirmwareUpdateModule.md (1)

11-14: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Rewrite this controls-row prose in present tense.

Line 13 and Line 14 include backward-looking/history phrasing (for example “flashed before…”, “once broke…”, “historical/predates…”). Rephrase in present tense, and keep historical notes in allowed history/backlog locations only.

As per coding guidelines, "**/*.{md,mdx}: Write code, comments, and documentation in the present tense only; avoid changelog-style or backward-looking phrasing except in docs/backlog/, docs/history/, and decisions.md lessons."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/moonmodules/core/FirmwareUpdateModule.md` around lines 11 - 14, The
controls-row prose in the firmware update module still uses backward-looking
phrasing in the `version` and `firmware` entries. Rewrite the affected
descriptions in present tense only, keeping the meaning intact, and remove
references like “flashed before,” “once broke,” or similar history-style wording
from this table. Use the existing `version` and `firmware` labels to locate the
text and leave any historical notes for approved history/backlog docs only.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/diag/preview_health.py`:
- Around line 146-170: The reconnect logic in preview_health.py is not matching
the browser because the backoff only applies during the _Ws(host) handshake and
is skipped after an established ws.recv_frame() session closes. Update the
reconnect path around the websocket receive loop so any close/error from
ws.recv_frame() or the surrounding try/except uses the same 500 ms to 5 s delay
as src/ui/app.js connectWs, and reset that delay only after a connection is
successfully established.
- Line 1: The script entrypoint still uses python3 directly, bypassing the
repo-managed uv environment. Update the shebang in preview_health.py so direct
execution goes through uv run instead of python3, matching the scripting
convention used across scripts/**/*.py and keeping the execution environment
consistent.
- Around line 227-231: The preview_health main flow always exits 0 even when
measure() returns CHOPPY or DEAD, so MoonDeck treats unhealthy runs as passing.
Update main() to propagate a non-zero exit code for unhealthy verdicts, or
change the /api/stream contract and the Live-tab logic in app.js to read the
verdict separately instead of relying only on exitCode. Use the existing main()
and measure() symbols to locate the fix.
- Around line 125-128: The grid probe in preview_health.py mutates Grid.width,
Grid.height, and Grid.depth and never restores them, which leaves the device
changed after the diagnostic runs. Update the logic around the grid handling
block to snapshot the original control values before calling _set_control in the
grid path, then restore those values in a finally block so the probe remains
non-destructive; use the existing _set_control helper and the grid branch in the
preview/health flow to locate the fix.

In `@scripts/moondeck_config.json`:
- Around line 155-163: The preview_health config currently forces device
selection through needs_device: true, which blocks the no-host sweep path from
the UI. Update the preview_health entry in moondeck_config.json so it can launch
without a selected device, either by removing needs_device or by introducing a
separate non-device launch configuration that maps to
scripts/diag/preview_health.py and preserves the documented sweep behavior.

In `@test/unit/core/unit_FirmwareUpdateModule.cpp`:
- Around line 33-36: The test comment in unit_FirmwareUpdateModule.cpp is
written in backward-looking language and should be rewritten in present tense.
Update the nearby comment describing version/build and firmwarePartition so it
states the current behavior of the FirmwareUpdateModule test setup, avoiding
phrasing like “moved here” while keeping the explanation about
platform::firmwarePartition() and the Progress control.

---

Outside diff comments:
In `@docs/moonmodules/core/FirmwareUpdateModule.md`:
- Around line 11-14: The controls-row prose in the firmware update module still
uses backward-looking phrasing in the `version` and `firmware` entries. Rewrite
the affected descriptions in present tense only, keeping the meaning intact, and
remove references like “flashed before,” “once broke,” or similar history-style
wording from this table. Use the existing `version` and `firmware` labels to
locate the text and leave any historical notes for approved history/backlog docs
only.

In `@src/core/AudioModule.h`:
- Around line 169-173: Update the comment block in AudioModule.h to use
present-tense wording throughout. In the PEAK-level explanation near the
frame_.level and UI string logic, replace backward-looking phrasing like
“caught” and “showed” with present-tense equivalents while keeping the technical
meaning unchanged. Preserve the same intent around display-only window peak
behavior and the live frame_.level path used by the effects/LEDs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0ab13031-4542-449e-953d-792238b7a918

📥 Commits

Reviewing files that changed from the base of the PR and between 0c0e992 and bf1a05b.

⛔ Files ignored due to path filters (22)
  • docs/assets/gettingstarted/01-01-installer-start.png is excluded by !**/*.png
  • docs/assets/gettingstarted/01-02-select-port.png is excluded by !**/*.png
  • docs/assets/gettingstarted/01-03-port-selected.png is excluded by !**/*.png
  • docs/assets/gettingstarted/01-04-pick-device.png is excluded by !**/*.png
  • docs/assets/gettingstarted/01-05-device-details.png is excluded by !**/*.png
  • docs/assets/gettingstarted/01-06-device-setup.png is excluded by !**/*.png
  • docs/assets/gettingstarted/01-07-erasing.png is excluded by !**/*.png
  • docs/assets/gettingstarted/01-08-installing.png is excluded by !**/*.png
  • docs/assets/gettingstarted/01-09-wifi-credentials.png is excluded by !**/*.png
  • docs/assets/gettingstarted/01-10-online-wifi.png is excluded by !**/*.png
  • docs/assets/gettingstarted/01-11-online-ethernet.png is excluded by !**/*.png
  • docs/assets/gettingstarted/01-12-online-existing-ip.png is excluded by !**/*.png
  • docs/assets/gettingstarted/02-01-UI-large.png is excluded by !**/*.png
  • docs/assets/gettingstarted/02-02-UI-mid.png is excluded by !**/*.png
  • docs/assets/gettingstarted/02-03-UI-small.png is excluded by !**/*.png
  • docs/assets/gettingstarted/02-04-UI-Preview.png is excluded by !**/*.png
  • docs/assets/gettingstarted/02-05-UI-System.png is excluded by !**/*.png
  • docs/assets/gettingstarted/02-06-UI-Firmware.png is excluded by !**/*.png
  • docs/assets/gettingstarted/02-07-UI-Network.png is excluded by !**/*.png
  • docs/assets/gettingstarted/02-08-UI-Layouts.png is excluded by !**/*.png
  • docs/assets/gettingstarted/02-09-UI-Layers.png is excluded by !**/*.png
  • docs/assets/gettingstarted/02-10-UI-Drivers.png is excluded by !**/*.png
📒 Files selected for processing (18)
  • docs/architecture.md
  • docs/gettingstarted.md
  • docs/moonmodules/core/FirmwareUpdateModule.md
  • docs/moonmodules/light/drivers/NetworkSendDriver.md
  • scripts/MoonDeck.md
  • scripts/diag/preview_health.py
  • scripts/moondeck_config.json
  • src/core/AudioModule.h
  • src/core/BinaryBroadcaster.h
  • src/core/DevicesModule.h
  • src/core/HttpServerModule.cpp
  • src/core/HttpServerModule.h
  • test/scenarios/light/scenario_Audio_mutation.json
  • test/scenarios/light/scenario_Driver_mutation.json
  • test/scenarios/light/scenario_MultiplyModifier_pipeline.json
  • test/scenarios/light/scenario_perf_full.json
  • test/scenarios/light/scenario_perf_light.json
  • test/unit/core/unit_FirmwareUpdateModule.cpp

Comment thread scripts/diag/preview_health.py Outdated
@@ -0,0 +1,235 @@
#!/usr/bin/env python3

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Use uv for direct script execution.

The shebang still invokes python3 directly, so anyone running the file directly bypasses the repo-managed uv environment.

As per coding guidelines, scripts/**/*.py: Use uv run for every Python invocation; never call python or python3 directly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/diag/preview_health.py` at line 1, The script entrypoint still uses
python3 directly, bypassing the repo-managed uv environment. Update the shebang
in preview_health.py so direct execution goes through uv run instead of python3,
matching the scripting convention used across scripts/**/*.py and keeping the
execution environment consistent.

Source: Coding guidelines

Comment on lines +125 to +128
if grid:
for ctrl in (("width", grid), ("height", grid), ("depth", 1)):
_set_control(host, "Grid", ctrl[0], ctrl[1])
time.sleep(3) # let the geometry rebuild settle

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

--grid leaves the device mutated after the probe.

This path writes Grid.width, Grid.height, and Grid.depth and never restores the previous values, so the diagnostic changes live device state after measurement ends. That also contradicts the "Stamps nothing on the device" contract in scripts/MoonDeck.md. Snapshot the original controls and restore them in a finally block.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/diag/preview_health.py` around lines 125 - 128, The grid probe in
preview_health.py mutates Grid.width, Grid.height, and Grid.depth and never
restores them, which leaves the device changed after the diagnostic runs. Update
the logic around the grid handling block to snapshot the original control values
before calling _set_control in the grid path, then restore those values in a
finally block so the probe remains non-destructive; use the existing
_set_control helper and the grid branch in the preview/health flow to locate the
fix.

Comment thread scripts/diag/preview_health.py Outdated
Comment on lines +146 to +170
last_ping = time.monotonic()
try:
while time.monotonic() < deadline:
now = time.monotonic()
if now - last_ping >= 25:
ws.send_text("ping")
last_ping = now
opcode, payload = ws.recv_frame(deadline)
if opcode == 0x08: # server close frame
raise ConnectionError("server close")
if opcode != 0x02 or not payload:
continue # text (state JSON) / ping / pong — skip
t = time.monotonic()
if payload[0] == 0x02:
if last_colour and t - last_colour > maxgap:
maxgap = t - last_colour
last_colour = t
colour += 1
elif payload[0] == 0x03:
coord += 1
if len(payload) >= 5:
pts = struct.unpack("<I", payload[1:5])[0]
backoff = 0.5
except Exception:
pass # closed/timeout → outer loop reconnects (like the browser)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Reconnect timing is not browser-faithful yet.

After an established socket closes, the outer loop reconnects immediately. The 500 ms→5 s backoff only runs when _Ws(host) fails during the handshake, so short device-side closes look healthier here than they do in src/ui/app.js's connectWs. Apply the same delay on every close/error reconnect path, and reset the delay when a connection opens successfully.

🧰 Tools
🪛 Ruff (0.15.18)

[warning] 155-155: Abstract raise to an inner function

(TRY301)


[warning] 155-155: Avoid specifying long messages outside the exception class

(TRY003)


[error] 169-170: try-except-pass detected, consider logging the exception

(S110)


[warning] 169-169: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/diag/preview_health.py` around lines 146 - 170, The reconnect logic
in preview_health.py is not matching the browser because the backoff only
applies during the _Ws(host) handshake and is skipped after an established
ws.recv_frame() session closes. Update the reconnect path around the websocket
receive loop so any close/error from ws.recv_frame() or the surrounding
try/except uses the same 500 ms to 5 s delay as src/ui/app.js connectWs, and
reset that delay only after a connection is successfully established.

Comment thread scripts/diag/preview_health.py Outdated
Comment on lines +227 to +231
worst_ok = True
for h in hosts:
v = measure(h, args.seconds, args.grid)
worst_ok = worst_ok and v == "SMOOTH"
return 0 if worst_ok else 0 # diagnostic, not a gate: always exit 0 (report, don't fail CI)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

MoonDeck will show DEAD runs as passing.

main() always returns 0, but scripts/moondeck_ui/app.js:542-553 marks the Live-tab status dot as pass whenever exitCode === 0. With this integration, a CHOPPY or DEAD verdict still renders green. Either propagate unhealthy verdicts via exit code, or extend the /api/stream payload/UI contract to carry the verdict separately.

🧰 Tools
🪛 Ruff (0.15.18)

[warning] 231-231: Useless if-else condition

(RUF034)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/diag/preview_health.py` around lines 227 - 231, The preview_health
main flow always exits 0 even when measure() returns CHOPPY or DEAD, so MoonDeck
treats unhealthy runs as passing. Update main() to propagate a non-zero exit
code for unhealthy verdicts, or change the /api/stream contract and the Live-tab
logic in app.js to read the verdict separately instead of relying only on
exitCode. Use the existing main() and measure() symbols to locate the fix.

Comment on lines +155 to +163
{
"id": "preview_health",
"tab": "live",
"group": "scenario",
"label": "Preview Health",
"help": "preview_health",
"script": "diag/preview_health.py",
"needs_device": true
},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
ast-grep outline scripts/moondeck_ui/app.js --view expanded
rg -n -C3 '"needs_device"|needs_device|host' scripts/moondeck_ui/app.js scripts/moondeck.py
sed -n '480,620p' scripts/moondeck_ui/app.js
sed -n '1210,1265p' scripts/moondeck.py

Repository: MoonModules/projectMM

Length of output: 23570


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '1,260p' scripts/diag/preview_health.py
printf '\n--- MoonDeck.md ---\n'
rg -n -C3 'preview_health|no host|all online devices|needs_device' scripts/MoonDeck.md scripts/moondeck_config.json

Repository: MoonModules/projectMM

Length of output: 15734


🏁 Script executed:

#!/bin/bash
set -euo pipefail

ast-grep outline scripts/diag/preview_health.py --view expanded

Repository: MoonModules/projectMM

Length of output: 612


Allow preview_health to launch without a selected device. needs_device: true makes the Live tab require selected devices and forces --host per device, so the documented no-host sweep path in scripts/diag/preview_health.py is not reachable from the UI. Either drop needs_device here or add a separate non-device launch path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/moondeck_config.json` around lines 155 - 163, The preview_health
config currently forces device selection through needs_device: true, which
blocks the no-host sweep path from the UI. Update the preview_health entry in
moondeck_config.json so it can launch without a selected device, either by
removing needs_device or by introducing a separate non-device launch
configuration that maps to scripts/diag/preview_health.py and preserves the
documented sweep behavior.

Comment on lines +33 to +36
// version + build are also present (firmware identity moved here). firmwarePartition is gated on
// platform::firmwarePartition() > 0 (the app-partition size), so it appears on a real device but
// NOT on desktop/test where firmwarePartition() returns 0 — assert its TYPE only when present
// (a Progress control), rather than its presence, so this stays valid on both.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Keep the test comment in present tense.

Line 33 uses backward-looking wording (“moved here”). Please rephrase to present tense to align with the repo rule.

As per coding guidelines, "**/*.{h,hh,hpp,hxx,cpp,cc,cxx}: Write code, comments, and documentation in the present tense only; avoid changelog-style or backward-looking phrasing..."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/unit/core/unit_FirmwareUpdateModule.cpp` around lines 33 - 36, The test
comment in unit_FirmwareUpdateModule.cpp is written in backward-looking language
and should be rewritten in present tense. Update the nearby comment describing
version/build and firmwarePartition so it states the current behavior of the
FirmwareUpdateModule test setup, avoiding phrasing like “moved here” while
keeping the explanation about platform::firmwarePartition() and the Progress
control.

Source: Coding guidelines

Handles the CodeRabbit findings and the pre-merge Reviewer agent's one should-fix, refreshes the docs/history reconciliation (plans kept and marked, decisions carried forward, a stale backlog item removed), and adds a short note to Getting Started that projectMM's engine is reusable beyond lights.

Core:
- platform.h / BinaryBroadcaster.h (👾): removed the dead `WriteChunk` struct + its stale `broadcastBinary()` comment (the scatter-gather API was fully replaced by writeSome + the begin/push/end + resumable send paths; grep-confirmed zero users), and dropped BinaryBroadcaster.h's now-unneeded `platform.h` include. Net subtraction.
- AudioModule (🐇): the level-peak comment rewritten present-tense (no "caught"/"showed").

Scripts / MoonDeck:
- preview_health.py (🐇): reconnect now backs off 500ms→5s on ANY session close, not only a failed handshake, matching app.js's ws.onclose; the `--grid` path snapshots the device's Grid and restores it in a finally so the probe stays non-destructive (like the live scenarios); main() exits non-zero on a CHOPPY/DEAD verdict so the exit code is honest (the per-device verdict still prints); shebang routes through `uv run` per the project convention.

Tests:
- unit_FirmwareUpdateModule (🐇): the firmwarePartition/version/build comment rephrased present-tense (no "moved here").

Docs / CI:
- gettingstarted.md: added a short callout before the light-show section — the modules/controls/3D-view/web-UI/networking are a general-purpose engine that knows nothing about LEDs; lights are one domain on top, and FastLED-MM (FastLED as the LED driver) is an example, on hold until projectMM ships as a reusable library.
- FirmwareUpdateModule.md (🐇): the `version` / `firmware` / `firmwarePartition` descriptions and the OTA-compat note rewritten present-tense (no "flashed before" / "once broke" / "dropped in the variant collapse").
- CLAUDE.md: plans are now KEPT, not pruned — they are the permanent design-intent record; mark a plan's outcome in the filename (`(shipped)` once it lands, `(attempted, abandoned)` if dropped). Reconciles the doc with the decision to retain plans for history.
- history/plans: the four shipped preview plans (Non-blocking send, Resumable send, Stream-from-buffers, Responsive split-pane PiP) renamed with the `(shipped)` marker — kept as the historical record, just tagged done.
- history/decisions.md: carried forward this branch's preview-arc lessons — stop at the first failed fix on a working seam, a measurement tool must be faithful to the real client or it invents and hides bugs, and a stale long-running process masquerades as a code bug (check the artifact's uptime before bisecting a "rebuild didn't fix it").
- backlog.md: removed the stale "Preview memory optimizations" item (it described a `detail` control + dedicated preview buffer that no longer exist, and its performance.md link was dead).

Reviews:
- 🐇 CodeRabbit findings processed as above. Accepted (with reason): preview_health's `needs_device: true` MoonDeck config — the Live-tab UX runs per selected device by design; the no-host sweep stays a CLI convenience.
- 👾 Reviewer agent (pre-merge, whole-branch diff): verdict architecturally sound, no blockers; its one should-fix (dead WriteChunk) is applied above. Two accept-with-reason items (the bounded sendAllOrClose spin on the coord path; single-threaded previewSend_ access) are documented and correct under the current single-core model.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ewowi ewowi changed the title Validate controls in the backend; drop the SET_DEVICE_MODEL Improv RPC Preview subsystem rework: non-blocking resumable send + graceful degradation Jun 24, 2026
@ewowi ewowi merged commit 55a4e86 into main Jun 24, 2026
3 checks passed
@ewowi ewowi deleted the next-iteration branch June 24, 2026 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant