Skip to content

[codex] Fix driver error monitoring#999

Open
Alb3e3 wants to merge 1 commit into
txpipe:mainfrom
Alb3e3:fix-stale-ouroboros-socket
Open

[codex] Fix driver error monitoring#999
Alb3e3 wants to merge 1 commit into
txpipe:mainfrom
Alb3e3:fix-stale-ouroboros-socket

Conversation

@Alb3e3
Copy link
Copy Markdown

@Alb3e3 Alb3e3 commented May 21, 2026

Summary

  • add a shared driver monitor that polls driver tasks by completion order
  • use it from both serve and daemon so driver failures are observed promptly
  • add a regression test for a later failing driver sitting behind an earlier long-running driver

Root cause

serve and daemon collected driver join handles in a FuturesUnordered, but then awaited each join handle from the collection iterator. That made failures depend on insertion order rather than completion order. If an earlier driver kept running and a later Ouroboros Unix socket driver failed to bind, the bind error could remain unread until shutdown.

Fixes #293.

Bounty note

I linked Githoney bounty 64 from #293 with /githoney link-bounty --bountyId 64. The issue bot comment shows a 2025-01-26 deadline, so please confirm whether it is still claimable.

Verification

  • cargo test monitor_drivers_observes_later_failures_before_waiting_for_earlier_drivers --no-default-features
  • cargo test --no-default-features
  • git diff --check

I also tried cargo fmt --check, but this checkout has pre-existing rustfmt drift outside this PR on stable rustfmt, so I left those unrelated files untouched.

Summary by CodeRabbit

  • Bug Fixes

    • Improved driver failure detection to respond promptly to errors without waiting for unrelated tasks to complete.
  • Refactor

    • Unified driver monitoring and error handling logic across service configurations for consistency.

Review Change Stack

Copy link
Copy Markdown
Author

Alb3e3 commented May 21, 2026

/githoney link-bounty --bountyId 64

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7aaf0a61-c249-4229-a90b-4b2c15129044

📥 Commits

Reviewing files that changed from the base of the PR and between 6136eac and ba00b6c.

📒 Files selected for processing (3)
  • src/bin/dolos/common.rs
  • src/bin/dolos/daemon.rs
  • src/bin/dolos/serve.rs

📝 Walkthrough

Walkthrough

Centralizes driver task monitoring across daemon and serve entry points by extracting inline error-handling logic into a shared monitor_drivers function. The function iterates FuturesUnordered driver tasks, logs failures, and cancels the exit token when any driver fails, with a test validating prompt detection of later failures.

Changes

Driver monitoring refactor

Layer / File(s) Summary
monitor_drivers implementation and test
src/bin/dolos/common.rs
Adds imports for FuturesUnordered, StreamExt, JoinHandle, and expanded tracing levels. Implements async monitor_drivers that awaits driver task handles, ignores successes, and cancels the exit token on any error or join failure. Includes test verifying the function detects a later failure before waiting on a slow earlier task.
daemon.rs integration
src/bin/dolos/daemon.rs
Updates tracing imports to remove error (now handled in monitor_drivers). Replaces inline loop over driver results with a single call to monitor_drivers.
serve.rs integration
src/bin/dolos/serve.rs
Removes direct tracing::error import. Replaces local async control flow awaiting each driver with a call to monitor_drivers.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • scarmuega

Poem

A rabbit refactors with care and delight, 🐰
Three drivers now monitored—centralized right!
Failures are caught at one shared checkpoint,
No code duplication, no logic misdirect.
One helper to guide them when things go amiss!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[codex] Fix driver error monitoring' accurately describes the main change: adding a shared driver monitor function to fix driver error detection ordering.
Linked Issues check ✅ Passed The PR addresses the core technical requirement from #293 by implementing prompt detection of driver failures through a monitor_drivers function that observes failures by completion order rather than insertion order.
Out of Scope Changes check ✅ Passed All changes are in-scope: monitor_drivers function implementation, integration into serve and daemon, and regression test are all aligned with fixing driver error monitoring.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Alb3e3 Alb3e3 marked this pull request as ready for review May 21, 2026 15:02
@Alb3e3 Alb3e3 requested a review from scarmuega as a code owner May 21, 2026 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to recover after forceful shutdown

1 participant