fix(serial): watchdog UI Pico CDC link; detect silent stalls#42
Open
fix(serial): watchdog UI Pico CDC link; detect silent stalls#42
Conversation
When the USB CDC stream from the UI Pico stalls without closing the FD (seen after DUT BOOTSEL flash cycles on the shared xHCI hub), the reader thread blocks forever in readline() with no exception. Status shows ui_pico_connected: true while physical buttons silently stop working until someone notices and resets USB manually. Add a per-connection watchdog that: - Tracks last_rx_monotonic on every byte received. - After UI_PICO_HEARTBEAT_INTERVAL seconds of silence, sends PING to poke the link. Any response (PONG or otherwise) updates last_rx and defuses the watchdog. - After HEARTBEAT_INTERVAL * STALL_FACTOR seconds of continuous silence, closes the port so the reader exits with OSError and the existing reconnect loop reattaches. Also harden the reader loop: catch TypeError from readline (seen during shutdown race where port.close() nulls the fd mid-os.read) and bail quietly when stop_event is set. Refs #41
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #41.
When the USB CDC stream from the UI Pico stalls without closing the FD (observed after DUT BOOTSEL flash cycles on the shared xHCI hub on 2026-04-24), the reader thread blocks forever in
readline()with no exception./api/statuskeeps reportingui_pico_connected: truewhile physical start / e-stop / switches silently stop emitting events — the runtime gap that triggered this issue.Until now the only way out was a manual
echo 1-1 | sudo tee /sys/bus/usb/drivers/usb/unbind; echo 1-1 | sudo tee .../bind.Approach
Per-connection watchdog thread started alongside the reader when the UI Pico connects:
PicoConnection.last_rx_monotonicis updated by the reader on every byte that arrives (events, responses, INFO lines — anything).UI_PICO_HEARTBEAT_INTERVALseconds of silence, the watchdog writesPING\nto nudge the link. Any response updateslast_rx_monotonicand defuses the watchdog for another interval. PINGs are rate-limited to one per interval.UI_PICO_HEARTBEAT_INTERVAL * UI_PICO_HEARTBEAT_STALL_FACTORseconds of continuous silence, the watchdog closes the port. The reader loop hitsOSError, marks_ui_pico = None, emitsui_pico_disconnected, and the existing_reconnect_loopreattaches withinSERIAL_RECONNECT_INTERVAL.Defaults: interval 5s, factor 3.0 → ~15s to detect a full stall. Configurable via
HALSPA_RUNNER_UI_PICO_HEARTBEAT_INTERVALandHALSPA_RUNNER_UI_PICO_HEARTBEAT_STALL_FACTOR.Also in this PR
TypeErrorand bails quietly whenstop_eventis set. Addresses the shutdown race whereport.close()nulls the FD mid-os.read, which previously logged a scary traceback at every systemd restart._ui_pico = Noneupdate to only nullify the slot if it still points at this connection (protects against a race where reconnect has already swapped in a fresh connection).Test plan