Skip to content

Fix Android USB missed-cycle stalls#92

Open
businesscurry123 wants to merge 1 commit into
flowdriveai:masterfrom
businesscurry123:bounty/46-android-usb-missed-cycles
Open

Fix Android USB missed-cycle stalls#92
businesscurry123 wants to merge 1 commit into
flowdriveai:masterfrom
businesscurry123:bounty/46-android-usb-missed-cycles

Conversation

@businesscurry123
Copy link
Copy Markdown

Summary

Fixes #46.

  • Bound Panda::can_receive() CAN bulk reads to 5 ms instead of using the repo default TIMEOUT=0.
  • Made the boardd_can_recv 100 Hz deadline arithmetic signed and added recv=<ns> timing to missed-cycle logs.
  • Added docs/android_usb_host_missed_cycles.md with the Android no-root USB path, root-cause analysis, and a field checklist for ROM/cable/power/scheduler diagnosis.
  • Added scripts/android_usb_host_missed_cycles_triage.sh, which can collect a diagnostic bundle either directly inside Android/Termux or from a desktop through adb.
  • Linked the guide from README.md.

Why This Targets the Blocking Path

This PR focuses on the USB receive path that can actually consume the 10 ms boardd_can_recv loop budget on Android no-root setups. Timing reset alone is not enough here because the current loop already resets next_frame_time = cur_time when it falls behind; if Panda::can_receive() can still block indefinitely inside libusb_bulk_transfer(..., timeout=0), the next cycle can miss again for the same reason.

The added recv=<ns> log makes that distinction visible in field reports: if missed cycles correlate with CAN receive time, the Android USB host/ROM/cable/power path is the suspect rather than only scheduler catch-up arithmetic.

Root Cause

boardd_can_recv has a 10 ms loop budget. On Android no-root, pandad.py launches boardd through termux-usb, and boardd wraps the Android USB permission file descriptor with libusb_wrap_sys_device.

Before this PR, Panda::can_receive() called:

usb_bulk_read(0x81, data, RECV_SIZE);

That used the default TIMEOUT, which is 0 in this repo. libusb documents a zero synchronous transfer timeout as unlimited, so a slow Android USB host/ROM path can hold the receive loop longer than the 10 ms cycle budget and trigger missed cycles.

The existing usb_bulk_read() and can_receive() logic already treats timeout/no-data as healthy, so bounding the CAN receive read is compatible with the current behavior while preventing one USB read from blocking indefinitely.

Testing

  • git diff --check
  • git diff --cached --check
  • bash -n scripts/android_usb_host_missed_cycles_triage.sh
  • Ran scripts/android_usb_host_missed_cycles_triage.sh .triage-test locally and verified it produced the output directory, then removed the test output.
  • Reviewed the touched C++ paths and the Android no-root termux-usb launch path.

I could not run the SCons build in this local Windows checkout because scons and a C++ compiler are not installed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BOUNTY $100: Research about usb host mode on android and reasons and fix for why missed cycles happen.

1 participant