Skip to content

Lee1258561/sync shutdown executor#50

Open
lee1258561 wants to merge 4 commits intopinterest/main-2.52.1from
lee1258561/sync_shutdown_executor
Open

Lee1258561/sync shutdown executor#50
lee1258561 wants to merge 4 commits intopinterest/main-2.52.1from
lee1258561/sync_shutdown_executor

Conversation

@lee1258561
Copy link

Thank you for contributing to Ray! 🚀
Please review the Ray Contribution Guide before opening a pull request.

⚠️ Remove these instructions before submitting your PR.

💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete.

Description

Briefly describe what this PR accomplishes and why it's needed.

Related issues

Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

lee1258561 and others added 4 commits March 17, 2026 18:31
Adds optional synchronized shutdown mode for streaming_split where all N
shards must call shutdown_executor before the actual shutdown occurs.
This enables proper coordination in distributed training scenarios.

- Add sync_shutdown and split_idx parameters to shutdown_executor
- Only shard 0 performs actual shutdown; others wait
- Fail if same shard calls shutdown twice before sync completes
- Ignore shutdown calls if called before any epoch has started
- Add unit tests for the new functionality

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The shutdown state variables (_shutdown_received_from, _shutdown_complete,
_shutdown_force_requested) were not reset when a new epoch started, causing
duplicate shutdown errors when reusing streaming split iterators across
multiple epochs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When iterators go out of scope or are deleted, cleanup is automatically
triggered on the coordinator via end_epoch(), eliminating the need for
explicit shutdown_executor calls between epochs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace end_epoch() with shutdown_executor(sync_shutdown=True) for
epoch cleanup. This removes duplicate logic since _barrier already
resets shutdown state when starting a new epoch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant