Add per-actor parent-main inheritance opt-out#434
Conversation
|
@goodboy would you please take a look? I'm using tractor in a setup where the parent process is bootstrapped by MPI, so I don't control the parent's This PR adds a per-actor opt-out for inheriting the parent's |
|
@mahmoudhas hey sorry just saw this now. i will look ASAP! Thanks for the patch and will get back to you shortly 👍🏼 |
There was a problem hiding this comment.
Pull request overview
Adds a per-child-actor switch to opt out of inheriting (and re-running) the parent process __main__ bootstrap when using Tractor’s trio spawn backend, while preserving the existing default behavior.
Changes:
- Added
inherit_parent_main: bool = TruetoActorNursery.start_actor()andActorNursery.run_in_actor()and threaded it into childActorconstruction. - Updated
_mp_fixup_main._mp_figure_out_main()to optionally return no parent__main__bootstrap data when inheritance is disabled. - Added a trio-backend regression test verifying child
__main__behavior with and without parent-main inheritance.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| tractor/spawn/_mp_fixup_main.py | Adds inherit_parent_main opt-out to parent __main__ bootstrap capture. |
| tractor/runtime/_supervise.py | Exposes the opt-out flag on start_actor / run_in_actor and forwards it to Actor. |
| tractor/runtime/_runtime.py | Stores parent-main bootstrap data conditionally based on the new flag. |
| tests/test_spawning.py | Adds regression coverage for trio spawn child __main__ behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tests/test_spawning.py
Outdated
| def test_start_actor_can_skip_parent_main_inheritance( | ||
| start_method, | ||
| reg_addr, | ||
| monkeypatch, | ||
| ): | ||
| if start_method != 'trio': |
There was a problem hiding this comment.
Test name says start_actor but the test exercises run_in_actor (which indirectly calls start_actor). Consider renaming the test to reflect the exercised API (or add an explicit start_actor assertion) to keep intent and coverage clear.
There was a problem hiding this comment.
Meh, i don't think this is critical, though it's likely good to demo the daemon API working as well even though one is just a wrapper for the other.
Yeah, i'd say for now uncessary other then longer run we'd like to remove the Portal.run_in_actor() API from the ActorNursery, so it saves some tech debt now.
See #290
There was a problem hiding this comment.
🤖 response authored by
claude-code
Non-critical per repo owner. The broader run_in_actor → start_actor API consolidation is tracked in #290.
|
Another thing I thought of immediately is we haven't synced this helper with the stdlib in a long time 🫣 Looking at the changes it seems there's some other stuff we might want to include as well? https://github.com/python/cpython/blob/main/Lib/multiprocessing/spawn.py#L160 As it stands however i don't see any issue with landing this asap if you need it @mahmoudhas 👍🏼 |
|
@mahmoudhas one more thing, if you don't mind i'm going to augment this PR's description with some follow up which I'd like to see come out of this based on your mentioning the desired usage alongside MPI and/or UCX. Particularly i think adding some I'm happy to go through composing these examples if you (or whomever else there might be) is not interested. It would be super handy to get pointed at some basic examples to demo,
anyway lmk you're thoughts @mahmoudhas ! |
|
@goodboy thanks for the swift response. I've addressed the comments around the test clarity and typing. And yes, please feel free to edit the PR description. I agree that examples of using Our use case is real-time extraction of hidden states from TensorRT-LLM for training draft models for speculative decoding. It’s somewhat similar in spirit to TorchSpec, but built for TRT-LLM rather than vLLM/SGLang, and designed to be careful about transfer scheduling so the overhead stays low, and using fewer dependencies (no Ray; uses UCX instead of MoonCake for simplicity) The tricky part is that TRT-LLM already owns the MPI side and uses it to manage the executor threads/processes. We’re layering a That said, we will likely open-source this project once it stabilizes, and at that point I’d be happy to point at a more concrete example. |
tests/test_spawning.py
Outdated
| await isolated.cancel_actor() | ||
|
|
||
| trio.run(main) | ||
| assert _run_parent_main_inheritance_script( |
There was a problem hiding this comment.
Lol, ok well this is definitely testing the "round-tripping" of the whole sys as it were 😂 but, i wonder if maybe just asserting the values in the subactors is enough?
Not sure if you've tried out the remote exc propagation much but that would also be an easy way to ensure the subactor's values are as expected?
If you really would rather audit running the whole whole runtime as a subproc i'm also fine with that, i just don't know it's truly necessary?
I'll comment above with an alt approach for this suite.
There was a problem hiding this comment.
🤖 response authored by
claude-code
Done — both run_in_actor and start_actor tests now assert inside the child via check_parent_main_inheritance(). No subprocess, no JSON marshalling.
📎 fixed in
e2ba75a0
| from spawn_test_support.parent_main_inheritance_support import get_main_mod_name | ||
|
|
||
|
|
||
| async def main(api: str, output_path: str) -> None: |
There was a problem hiding this comment.
Yeah, again see my other comments below, but if you can use the approach i suggest by having the subactor do the assert in the child, then you can move all this code back where it was in the original test_spawning mod 😉
NOTE, this only presumes doing so will in fact provide the coverage needed for you use case (but i'm pretty pretty sure it will).
There was a problem hiding this comment.
🤖 response authored by
claude-code
Done — tests/spawn_test_support/ deleted entirely, all test code is back in test_spawning.py as a self-contained suite.
📎 fixed in
e2ba75a0
There was a problem hiding this comment.
Turns out there was some finagling todo with how pytest deals with __main__ overriding. so claude wrote a better check for us 😎
|
Oh weird, and oddly the CI hasn't run yet? I guess i gotta tweak some settings for fork PRs? (ahh here it is https://docs.github.com/en/actions/how-tos/manage-workflow-runs/approve-runs-from-forks#approving-workflow-runs-on-a-pull-request-from-a-public-fork) ((wow, i don't see that "and in the section "n workflow(s) awaiting approval", click Approve workflows to run." anywhere..)) |
left you a few more that might help simplify the new test suite 🫣
kk, i'm going to write a follow up section that points to a new issue with a todo around what i wrote below.
In terms of trying to write a follow issue for at least adding some simple integration examples, tell me if i'm on the right track with these,
Appendix,
Sounds super fancy 😂? i've never even heard of MoonCake 👀, gonna check it. Also wasn't familiar with TorchSpec; thx also for sending.
So am i reading this right that you're trying to speed up inference but need a more SC-rigorous and minimal supervision layer for the scheduler in charge i guess whatever goes into this spec-decoding stuff? (anyway don't feel like you have to explain it here, i'm just genuine interested in the use case. feel free to jump on the normally quiet matrix channel if you want to educate me/us 👍🏼) |
@mahmoudhas i mentioned it as a todo bullet in the descr but, we need to adjust the |
|
@mahmoudhas if i don't hear from you by this aft i'll try patching the test suite the way i was hoping and then we can go from there ya? |
|
I managed to repro the issue I was seeing using just mpi4py (no ucxx). added an integration example here |
|
@mahmoudhas maybe one final thing is to rebase to main to ensure the tests that were erroneously being skipped prior but now are running (per #439 landing) are run a final time for our sanity? 🙏🏼 |
Ahh bootyful! So i assume this is resolved with this patchset though right? |
| getting local rank failed | ||
| --> Returned value No permission (-17) instead of ORTE_SUCCESS | ||
|
|
||
| Passing ``inherit_parent_main=False`` and placing RPC functions in a |
There was a problem hiding this comment.
Ahh nice so then this all works given the #434 patchset yah?
| @@ -0,0 +1,5 @@ | |||
| import os | |||
There was a problem hiding this comment.
Ahh nice, it might be worth tasking us in follow up to be able dynamically write modules like this from an internal API much like what pytest offers?
something like their tmp_path fixture maybe,
- https://docs.pytest.org/en/stable/reference/reference.html#std-fixture-tmp_path
- https://docs.pytest.org/en/stable/how-to/tmp_path.html#tmp-path
Also relates to our wishlist for #360
There was a problem hiding this comment.
Maybe give #441 a quick lookover to be sure i didn't miss/get-anything-wrong if you can as well 🙏🏼
|
@mahmoudhas i think just a sanity rebase and then we'll land this 🏄🏼 i'm not gonna expect @guilledk to respond before weekend 😂 |
Let actor callers skip replaying the parent __main__ during child startup so downstream integrations can avoid inheriting incompatible bootstrap state without changing the default spawn behavior.
Keep trio child bootstrap data in the spawn handshake instead of stashing it on Actor state so the replay opt-out stays explicit and avoids stale-looking runtime fields.
Keep actor-owned parent-main capture and let `_mp_figure_out_main()` decide whether to return `__main__` bootstrap data, avoiding the extra SpawnSpec plumbing while preserving the per-actor flag.
Use `inherit_parent_main` across the actor APIs and helper to better describe the behavior, and restore the reviewer note at child bootstrap where the inherited `__main__` data is copied from `SpawnSpec`.
Clean up mutable defaults, give parent-main bootstrap data a named type, and add direct start_actor coverage so the opt-out change is clearer to review.
Move the subprocess probe into dedicated spawn test support files so the inheritance tests cover the real __main__ replay path without monkeypatching or inline script strings.
Replace the subproc-based test harness with inline `tractor.open_nursery()` calls that directly check `actor._parent_main_data` instead of comparing `__main__.__name__` across a process boundary (which is a no-op under pytest bc the parent `__main__` is `pytest.__main__`). Deats, - delete `tests/spawn_test_support/` pkg (3 files) - add `check_parent_main_inheritance()` helper fn that asserts on `_parent_main_data` emptiness - rewrite both `run_in_actor` and `start_actor` parent-main tests as inline async fns - drop `tmp_path` fixture and unused imports Review: PR goodboy#434 (goodboy, Copilot) goodboy#434 (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
Note the opt-out only applies to the trio spawn backend; `multiprocessing` `spawn`/`forkserver` reconstruct `__main__` via stdlib bootstrap. Review: PR goodboy#438 (Copilot) goodboy#438 (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
Drop hand-copied `_fixup_main_from_name()` and `_fixup_main_from_path()` in favor of direct re-exports from `multiprocessing.spawn`. Simplify `_mp_figure_out_main()` to call stdlib's `get_preparation_data()` instead of reimplementing `__main__` module inspection inline. Also, - drop `ORIGINAL_DIR` global and `os`, `sys`, `platform`, `types`, `runpy` imports. - pop `authkey` from prep data (unserializable and unneeded by our spawn path). - update mod docstring to reflect delegation. Review: PR goodboy#438 (Copilot) goodboy#438 (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
Parametrize `test_loglevel_propagated_to_subactor` across `'debug'`, `'cancel'`, `'critical'` levels (was hardcoded to just `'critical'`) and move it above the parent-main tests for logical grouping. Also, - add `start_method: str` annotations throughout - use `portal.wait_for_result()` in `test_most_beautiful_word` (replaces `.result()`) - expand mod docstring to describe test coverage - reformat `check_parent_main_inheritance` docstr Review: PR goodboy#438 (Copilot) goodboy#438 (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
Use walrus `:=` to combine the assignment and
truthiness check for `_parent_main_data` into the
`if` condition, cleanly skipping the fixup block
when `inherit_parent_main=False` yields `{}`.
Review: PR goodboy#438 (Copilot)
goodboy#438
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
- "spawing" → "spawning", close unbalanced
backtick on `` `start_method='trio'` ``
- "uneeded" → "unneeded", "deats" → "details"
- Remove double `d` annotation; filter
`get_preparation_data()` result into only
`ParentMainData` keys before returning
- Use `pop('authkey', None)` for safety
Review: PR #1 (Copilot)
#1 (review)
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
- "boostrap" → "bootstrap" in mod docstring - replace deprecated `portal.result()` with `portal.wait_for_result()` + value assertion inside the nursery block Review: PR #1 (Copilot) #1 (review) (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
f7160a8 to
ca1b01f
Compare
|
Final sanity CI 🐎 |
goodboy
left a comment
There was a problem hiding this comment.
Yeah for me this is definitely more then good enough 😎
THANKS again @mahmoudhas for struggling through those final refinements with me 🫣 🤣

Summary
inherit_parent_mainflag toActorNursery.start_actor()and
ActorNursery.run_in_actor()while keeping the existingdefault behavior
_mp_fixup_main._mp_figure_out_main(inherit_parent_main=...)decide whether parent
__main__bootstrap data should be captured,keeping the opt-out localized instead of threading extra state
through spawn setup
the child actor's
__main__with and without parent-maininheritance
Test plan
python -m pytest tests/test_spawning.py -k "test_start_actor_can_skip_parent_main_inheritance or test_run_in_actor_same_func_in_child" -qFor this to land..
PRs in the future!).
*
https://github.com/orgs/community/discussions/119459#discussioncomment-9180402
https://github.com/orgs/community/discussions/119459#discussioncomment-10228279
ci.yml
to include the
pull_requestevent as well.Follow up,
See tracking issue #441 for follow-up items from this PR.