Description:
In a complex async or multi-threaded application, tasks often get "stuck" due to deadlocks (waiting on a lock that never releases), infinite loops, or hanging network requests. Since Ferret will track "Open Spans" (active functions) in real-time (via Issue 7), we can implement a Watchdog that identifies spans that have been running longer than a safety threshold.
Proposed Solution:
- Stall Detection: The
ferret watch system scans the list of active spans. If a span's duration exceeds a configured stall_threshold (e.g., 10s), it is flagged as "Stalled".
- Stack Trace Dumping: When a stall is detected, Ferret should attempt to capture the exact line number where the code is stuck.
- For Threads: Use
sys._current_frames() to find the stack trace of the thread owning the span.
- For Async: Inspect the
asyncio task associated with the span.
Tasks:
Acceptance Criteria:
- If a profiled function sleeps for 30s (with a 5s threshold), it appears in the "Deadlocks" list after 5s.
- The detailed view shows the filename and line number (e.g.,
await future or lock.acquire()) where the code is hanging.
Description:
In a complex async or multi-threaded application, tasks often get "stuck" due to deadlocks (waiting on a lock that never releases), infinite loops, or hanging network requests. Since Ferret will track "Open Spans" (active functions) in real-time (via Issue 7), we can implement a Watchdog that identifies spans that have been running longer than a safety threshold.
Proposed Solution:
ferret watchsystem scans the list of active spans. If a span's duration exceeds a configuredstall_threshold(e.g., 10s), it is flagged as "Stalled".sys._current_frames()to find the stack trace of the thread owning the span.asynciotask associated with the span.Tasks:
ferret/core.py):stall_threshold(float, seconds) toProfilerconfig.enable_watchdog(bool).ferret/watchdog.py):Spanobjects in memory.(time.now() - span.start_time) > stall_threshold:STALLED.tracebackandsys._current_frames().METRICorERRORevent to BeaverDB containing this stack dump.ferret/tui.py):ferret watchto show a "💀 Deadlocks / Stalls" section.Acceptance Criteria:
await futureorlock.acquire()) where the code is hanging.