Fix Android USB missed-cycle stalls#92
Open
businesscurry123 wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #46.
Panda::can_receive()CAN bulk reads to 5 ms instead of using the repo defaultTIMEOUT=0.boardd_can_recv100 Hz deadline arithmetic signed and addedrecv=<ns>timing to missed-cycle logs.docs/android_usb_host_missed_cycles.mdwith the Android no-root USB path, root-cause analysis, and a field checklist for ROM/cable/power/scheduler diagnosis.scripts/android_usb_host_missed_cycles_triage.sh, which can collect a diagnostic bundle either directly inside Android/Termux or from a desktop throughadb.README.md.Why This Targets the Blocking Path
This PR focuses on the USB receive path that can actually consume the 10 ms
boardd_can_recvloop budget on Android no-root setups. Timing reset alone is not enough here because the current loop already resetsnext_frame_time = cur_timewhen it falls behind; ifPanda::can_receive()can still block indefinitely insidelibusb_bulk_transfer(..., timeout=0), the next cycle can miss again for the same reason.The added
recv=<ns>log makes that distinction visible in field reports: if missed cycles correlate with CAN receive time, the Android USB host/ROM/cable/power path is the suspect rather than only scheduler catch-up arithmetic.Root Cause
boardd_can_recvhas a 10 ms loop budget. On Android no-root,pandad.pylaunchesboarddthroughtermux-usb, andboarddwraps the Android USB permission file descriptor withlibusb_wrap_sys_device.Before this PR,
Panda::can_receive()called:That used the default
TIMEOUT, which is0in this repo. libusb documents a zero synchronous transfer timeout as unlimited, so a slow Android USB host/ROM path can hold the receive loop longer than the 10 ms cycle budget and triggermissed cycles.The existing
usb_bulk_read()andcan_receive()logic already treats timeout/no-data as healthy, so bounding the CAN receive read is compatible with the current behavior while preventing one USB read from blocking indefinitely.Testing
git diff --checkgit diff --cached --checkbash -n scripts/android_usb_host_missed_cycles_triage.shscripts/android_usb_host_missed_cycles_triage.sh .triage-testlocally and verified it produced the output directory, then removed the test output.termux-usblaunch path.I could not run the SCons build in this local Windows checkout because
sconsand a C++ compiler are not installed here.