Enhance warmup inference handling and error logging#1475
Enhance warmup inference handling and error logging#1475jayy-77 wants to merge 1 commit intoexo-explore:mainfrom
Conversation
d05a75e to
5bd8258
Compare
Co-authored-by: Cursor <cursoragent@cursor.com>
5bd8258 to
1ca98c4
Compare
PR #1475 Review: "Enhance warmup inference handling and error logging"References: #1431 Summary of ChangesThis PR wraps warmup inference in try/except blocks so that if warmup fails (e.g. MLX CPU JIT compile failure on Linux ARM), the runner continues instead of crashing. It also adds error handling in the runner supervisor for unexpected channel closures. Note: The PR template is mostly empty — only the Changes section references issue #1431. No motivation, explanation, or test plan provided. Files Changed (3)
Detailed Analysis1. generate.py — warmup_inference() error handling
Assessment: Sound approach. Warmup is a performance optimization, not a correctness requirement — failing gracefully is the right behavior. Minor concern: Catching only 2. runner.py — warmup caller error handling
Assessment: Good — this is the caller-side complement to the generate.py change. Without this, even if Concerns:
3. runner_supervisor.py — _forward_events() error handlingThis is the most concerning change. A new Concerns:
Overall AssessmentPositives:
Concerns:
Recommendation: The |
Code ReviewThanks for working on this! Graceful handling of warmup failures is the right approach — warmup is a performance optimization, not a correctness requirement, so the runner should continue even if it fails (#1431). A few issues to address: 1. Remove the duplicate
|
Code Review — PR #1475: Enhance warmup inference handling and error loggingCI: No checks (fork PR — CI doesn't run automatically) Overview+68/-35 across 3 files. Attempts to handle MLX CPU JIT compile failures during warmup (e.g.,
Critical bugs1. Runner crashes on first inference after failed warmup When warmup fails,
The PR's stated goal ("runner will still be ready") is not achieved. The first real inference request crashes the runner. Fix: set a default in the except block: 2. Dead code in runner_supervisor.py The second Remove this entire block. Design issues3. Double catch is redundant RuntimeError is caught in both 4. Broad RuntimeError catch may mask bugs Catching all Minor
VerdictDo not merge. The core idea (graceful warmup failure) is correct, but the implementation has a critical bug: the runner crashes on first inference after failed warmup due to |
Code Review: PR #1475 — Enhance warmup inference handling and error loggingSummaryMakes warmup inference failures non-fatal — catches ReviewWarmup error handling ( try:
for _r in stream_generate(...):
tokens_generated += 1
except RuntimeError as e:
logger.warning("Warmup inference failed: {}", e)Correctly catches warmup failures (e.g., MLX CPU JIT compile errors on Linux ARM) and continues. The runner can still serve requests — first request may just be slower. ✅ Runner-side error handling ( Runner supervisor error handling ( except Exception as e:
if type(e).__name__ in ("ClosedResourceError", "BrokenResourceError"):This is string-matching on exception class names instead of using Issues1. Logger format strings use logger.warning("Warmup inference failed (e.g. MLX CPU compile on this platform): {}", e)Python's 2. VerdictGood defensive improvement — warmup failures shouldn't crash the runner. The main concern is the LGTM with the cancel-check concern. |
Motivation
Changes
#1431
Why It Works
Test Plan
Manual Testing
Automated Testing