Add per-actor parent-main inheritance opt-out by mahmoudhas · Pull Request #434 · goodboy/tractor

mahmoudhas · 2026-04-06T05:51:26Z

Summary

add an inherit_parent_main flag to ActorNursery.start_actor()
and ActorNursery.run_in_actor() while keeping the existing
default behavior
let _mp_fixup_main._mp_figure_out_main(inherit_parent_main=...)
decide whether parent __main__ bootstrap data should be captured,
keeping the opt-out localized instead of threading extra state
through spawn setup
cover the trio spawn behavior with a regression test that checks
the child actor's __main__ with and without parent-main
inheritance

Test plan

python -m pytest tests/test_spawning.py -k "test_start_actor_can_skip_parent_main_inheritance or test_run_in_actor_same_func_in_child" -q
downstream usage in the Baseten inference server

For this to land..

oustanding test suite adjustments (if applicable)
get CI running against this patch (and all further fork-based
PRs in the future!).
*
https://github.com/orgs/community/discussions/119459#discussioncomment-9180402
- for multiple see:
  https://github.com/orgs/community/discussions/119459#discussioncomment-10228279
- @mahmoudhas change the
  ci.yml
  to include the pull_request event as well.
mk tracking ish for below follow up.
- linked as Follow-up: MPI + UCX integration examples (PR #434) #441!
- get @mahmoudhas approval on the tracking issue's content!

Follow up,

See tracking issue #441 for follow-up items from this PR.

mahmoudhas · 2026-04-06T06:52:07Z

@goodboy would you please take a look?

I'm using tractor in a setup where the parent process is bootstrapped by MPI, so I don't control the parent's __main__. When a child actor inherits/replays that parent __main__, it ends up importing/initializing MPI in the child as well, which causes errors in some libraries like ucxx.

This PR adds a per-actor opt-out for inheriting the parent's __main__.

goodboy · 2026-04-06T20:44:05Z

@mahmoudhas hey sorry just saw this now.

i will look ASAP!

Thanks for the patch and will get back to you shortly 👍🏼

Copilot

Pull request overview

Adds a per-child-actor switch to opt out of inheriting (and re-running) the parent process __main__ bootstrap when using Tractor’s trio spawn backend, while preserving the existing default behavior.

Changes:

Added inherit_parent_main: bool = True to ActorNursery.start_actor() and ActorNursery.run_in_actor() and threaded it into child Actor construction.
Updated _mp_fixup_main._mp_figure_out_main() to optionally return no parent __main__ bootstrap data when inheritance is disabled.
Added a trio-backend regression test verifying child __main__ behavior with and without parent-main inheritance.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
tractor/spawn/_mp_fixup_main.py	Adds `inherit_parent_main` opt-out to parent `__main__` bootstrap capture.
tractor/runtime/_supervise.py	Exposes the opt-out flag on `start_actor` / `run_in_actor` and forwards it to `Actor`.
tractor/runtime/_runtime.py	Stores parent-main bootstrap data conditionally based on the new flag.
tests/test_spawning.py	Adds regression coverage for trio spawn child `__main__` behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tractor/runtime/_runtime.py

tractor/runtime/_supervise.py

Copilot · 2026-04-06T20:48:48Z

tests/test_spawning.py

+def test_start_actor_can_skip_parent_main_inheritance(
+    start_method,
+    reg_addr,
+    monkeypatch,
+):
+    if start_method != 'trio':


Test name says start_actor but the test exercises run_in_actor (which indirectly calls start_actor). Consider renaming the test to reflect the exercised API (or add an explicit start_actor assertion) to keep intent and coverage clear.

Meh, i don't think this is critical, though it's likely good to demo the daemon API working as well even though one is just a wrapper for the other.

Yeah, i'd say for now uncessary other then longer run we'd like to remove the Portal.run_in_actor() API from the ActorNursery, so it saves some tech debt now.

See #290

🤖 response authored by claude-code

Non-critical per repo owner. The broader run_in_actor → start_actor API consolidation is tracked in #290.

tractor/runtime/_supervise.py

tractor/runtime/_runtime.py

tests/test_spawning.py

goodboy · 2026-04-06T21:20:09Z

Another thing I thought of immediately is we haven't synced this helper with the stdlib in a long time 🫣

Looking at the changes it seems there's some other stuff we might want to include as well?

https://github.com/python/cpython/blob/main/Lib/multiprocessing/spawn.py#L160

As it stands however i don't see any issue with landing this asap if you need it @mahmoudhas 👍🏼

goodboy · 2026-04-06T22:12:07Z

@mahmoudhas one more thing, if you don't mind i'm going to augment this PR's description with some follow up which I'd like to see come out of this based on your mentioning the desired usage alongside MPI and/or UCX.

Particularly i think adding some examples/integrations/* scripts which demo how tractor can be used alongside/on-top-of such grander (or more general purpose?), pre-existing IPC frameworks and libs to potentially enforce SC would be pretty stellar no 🤔 ?

I'm happy to go through composing these examples if you (or whomever else there might be) is not interested. It would be super handy to get pointed at some basic examples to demo,

how to use multiple actors to conduct inter-proc IPC over each lib, ideally with some ode to cancellation and error propagation up a process tree.
some demo of how you would normally not be able to (as easily) enforce strict SC supervision without tractor if using those libs alone?

anyway lmk you're thoughts @mahmoudhas !

mahmoudhas · 2026-04-06T22:49:47Z

@goodboy thanks for the swift response. I've addressed the comments around the test clarity and typing.

And yes, please feel free to edit the PR description.

I agree that examples of using tractor alongside broader IPC frameworks like MPI/UCX would be very valuable. In our specific case though, I think it would be difficult to turn the integration into a clean upstream example or test.

Our use case is real-time extraction of hidden states from TensorRT-LLM for training draft models for speculative decoding. It’s somewhat similar in spirit to TorchSpec, but built for TRT-LLM rather than vLLM/SGLang, and designed to be careful about transfer scheduling so the overhead stays low, and using fewer dependencies (no Ray; uses UCX instead of MoonCake for simplicity)

The tricky part is that TRT-LLM already owns the MPI side and uses it to manage the executor threads/processes. We’re layering a tractor scope on top of that in order to supervise and offload the UCX transfer logic. Since the MPI setup is provided by TRT-LLM rather than by us, it’s not easy to extract a minimal standalone integration example from the full system.

That said, we will likely open-source this project once it stabilizes, and at that point I’d be happy to point at a more concrete example.

tests/test_spawning.py

goodboy · 2026-04-06T23:50:11Z

tests/test_spawning.py

-                await isolated.cancel_actor()
-
-    trio.run(main)
+    assert _run_parent_main_inheritance_script(


Lol, ok well this is definitely testing the "round-tripping" of the whole sys as it were 😂 but, i wonder if maybe just asserting the values in the subactors is enough?

Not sure if you've tried out the remote exc propagation much but that would also be an easy way to ensure the subactor's values are as expected?

If you really would rather audit running the whole whole runtime as a subproc i'm also fine with that, i just don't know it's truly necessary?

I'll comment above with an alt approach for this suite.

🤖 response authored by claude-code

Done — both run_in_actor and start_actor tests now assert inside the child via check_parent_main_inheritance(). No subprocess, no JSON marshalling.

📎 fixed in e2ba75a0

tests/spawn_test_support/parent_main_inheritance_support.py

goodboy · 2026-04-07T00:01:08Z

tests/spawn_test_support/parent_main_inheritance_case.py

+from spawn_test_support.parent_main_inheritance_support import get_main_mod_name
+
+
+async def main(api: str, output_path: str) -> None:


Yeah, again see my other comments below, but if you can use the approach i suggest by having the subactor do the assert in the child, then you can move all this code back where it was in the original test_spawning mod 😉

NOTE, this only presumes doing so will in fact provide the coverage needed for you use case (but i'm pretty pretty sure it will).

🤖 response authored by claude-code

Done — tests/spawn_test_support/ deleted entirely, all test code is back in test_spawning.py as a self-contained suite.

📎 fixed in e2ba75a0

Turns out there was some finagling todo with how pytest deals with __main__ overriding. so claude wrote a better check for us 😎

goodboy · 2026-04-07T00:13:09Z

Oh weird, and oddly the CI hasn't run yet?

I guess i gotta tweak some settings for fork PRs?

(ahh here it is https://docs.github.com/en/actions/how-tos/manage-workflow-runs/approve-runs-from-forks#approving-workflow-runs-on-a-pull-request-from-a-public-fork)

((wow, i don't see that "and in the section "n workflow(s) awaiting approval", click Approve workflows to run." anywhere..))

goodboy · 2026-04-07T00:25:58Z

@goodboy thanks for the swift response. I've addressed the comments around the test clarity and typing.f

left you a few more that might help simplify the new test suite 🫣
But definitely push back on me if you prefer to have this super duper end-2-end coverage 👍🏼

And yes, please feel free to edit the PR description.

kk, i'm going to write a follow up section that points to a new issue with a todo around what i wrote below.

I agree that examples of using tractor alongside broader IPC frameworks like MPI/UCX would be very valuable. In our specific case though, I think it would be difficult to turn the integration into a clean upstream example or test.
...
That said, we will likely open-source this project once it stabilizes, and at that point I’d be happy to point at a more concrete example.

In terms of trying to write a follow issue for at least adding some simple integration examples, tell me if i'm on the right track with these,

for MPI we can likely start by wrapping a few of these examples?
for UCX and more specifically the ucxx project, we can try gleaning some naive examples from the benchmark scripts ?

Appendix,

Our use case is real-time extraction of hidden states from TensorRT-LLM for training draft models for speculative decoding. It’s somewhat similar in spirit to TorchSpec, but built for TRT-LLM rather than vLLM/SGLang, and designed to be careful about transfer scheduling so the overhead stays low, and using fewer dependencies (no Ray; uses UCX instead of MoonCake for simplicity)

Sounds super fancy 😂?

i've never even heard of MoonCake 👀, gonna check it. Also wasn't familiar with TorchSpec; thx also for sending.

The tricky part is that TRT-LLM already owns the MPI side and uses it to manage the executor threads/processes. We’re layering a tractor scope on top of that in order to supervise and offload the UCX transfer logic. Since the MPI setup is provided by TRT-LLM rather than by us, it’s not easy to extract a minimal standalone integration example from the full system.

So am i reading this right that you're trying to speed up inference but need a more SC-rigorous and minimal supervision layer for the scheduler in charge i guess whatever goes into this spec-decoding stuff? (anyway don't feel like you have to explain it here, i'm just genuine interested in the use case. feel free to jump on the normally quiet matrix channel if you want to educate me/us 👍🏼)

goodboy · 2026-04-07T02:41:26Z

Oh weird, and oddly the CI hasn't run yet?

I guess i gotta tweak some settings for fork PRs?

(ahh here it is https://docs.github.com/en/actions/how-tos/manage-workflow-runs/approve-runs-from-forks#approving-workflow-runs-on-a-pull-request-from-a-public-fork)

((wow, i don't see that "and in the section "n workflow(s) awaiting approval", click Approve workflows to run." anywhere..))

@mahmoudhas i mentioned it as a todo bullet in the descr but, we need to adjust the ci.yml to make this happen. I'm also happy to bring in a PR beforehand if you don't feel like doing it here, just lmk 👍🏼

goodboy · 2026-04-07T15:50:52Z

@mahmoudhas if i don't hear from you by this aft i'll try patching the test suite the way i was hoping and then we can go from there ya?

mahmoudhas · 2026-04-10T20:52:21Z

I managed to repro the issue I was seeing using just mpi4py (no ucxx). added an integration example here

goodboy · 2026-04-10T22:33:42Z

@mahmoudhas maybe one final thing is to rebase to main to ensure the tests that were erroneously being skipped prior but now are running (per #439 landing) are run a final time for our sanity? 🙏🏼

goodboy · 2026-04-10T22:35:03Z

I managed to repro the issue I was seeing using just mpi4py (no ucxx). added an integration example here

Ahh bootyful!

So i assume this is resolved with this patchset though right?

goodboy · 2026-04-10T22:39:44Z

examples/integration/mpi4py/inherit_parent_main.py

+    getting local rank failed
+      --> Returned value No permission (-17) instead of ORTE_SUCCESS
+
+Passing ``inherit_parent_main=False`` and placing RPC functions in a


Ahh nice so then this all works given the #434 patchset yah?

goodboy · 2026-04-10T22:49:17Z

examples/integration/mpi4py/_child.py

@@ -0,0 +1,5 @@
+import os


Ahh nice, it might be worth tasking us in follow up to be able dynamically write modules like this from an internal API much like what pytest offers?

something like their tmp_path fixture maybe,

https://docs.pytest.org/en/stable/reference/reference.html#std-fixture-tmp_path

https://docs.pytest.org/en/stable/how-to/tmp_path.html#tmp-path

Also relates to our wishlist for #360

Maybe give #441 a quick lookover to be sure i didn't miss/get-anything-wrong if you can as well 🙏🏼

goodboy · 2026-04-11T00:11:40Z

@mahmoudhas i think just a sanity rebase and then we'll land this 🏄🏼

i'm not gonna expect @guilledk to respond before weekend 😂

goodboy · 2026-04-11T00:55:39Z

Dunno if i trust that button 😂

Let actor callers skip replaying the parent __main__ during child startup so downstream integrations can avoid inheriting incompatible bootstrap state without changing the default spawn behavior.

Keep trio child bootstrap data in the spawn handshake instead of stashing it on Actor state so the replay opt-out stays explicit and avoids stale-looking runtime fields.

Keep actor-owned parent-main capture and let `_mp_figure_out_main()` decide whether to return `__main__` bootstrap data, avoiding the extra SpawnSpec plumbing while preserving the per-actor flag.

Use `inherit_parent_main` across the actor APIs and helper to better describe the behavior, and restore the reviewer note at child bootstrap where the inherited `__main__` data is copied from `SpawnSpec`.

Clean up mutable defaults, give parent-main bootstrap data a named type, and add direct start_actor coverage so the opt-out change is clearer to review.

Move the subprocess probe into dedicated spawn test support files so the inheritance tests cover the real __main__ replay path without monkeypatching or inline script strings.

Replace the subproc-based test harness with inline `tractor.open_nursery()` calls that directly check `actor._parent_main_data` instead of comparing `__main__.__name__` across a process boundary (which is a no-op under pytest bc the parent `__main__` is `pytest.__main__`). Deats, - delete `tests/spawn_test_support/` pkg (3 files) - add `check_parent_main_inheritance()` helper fn that asserts on `_parent_main_data` emptiness - rewrite both `run_in_actor` and `start_actor` parent-main tests as inline async fns - drop `tmp_path` fixture and unused imports Review: PR goodboy#434 (goodboy, Copilot) goodboy#434 (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code

Note the opt-out only applies to the trio spawn backend; `multiprocessing` `spawn`/`forkserver` reconstruct `__main__` via stdlib bootstrap. Review: PR goodboy#438 (Copilot) goodboy#438 (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code

Drop hand-copied `_fixup_main_from_name()` and `_fixup_main_from_path()` in favor of direct re-exports from `multiprocessing.spawn`. Simplify `_mp_figure_out_main()` to call stdlib's `get_preparation_data()` instead of reimplementing `__main__` module inspection inline. Also, - drop `ORIGINAL_DIR` global and `os`, `sys`, `platform`, `types`, `runpy` imports. - pop `authkey` from prep data (unserializable and unneeded by our spawn path). - update mod docstring to reflect delegation. Review: PR goodboy#438 (Copilot) goodboy#438 (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code

Parametrize `test_loglevel_propagated_to_subactor` across `'debug'`, `'cancel'`, `'critical'` levels (was hardcoded to just `'critical'`) and move it above the parent-main tests for logical grouping. Also, - add `start_method: str` annotations throughout - use `portal.wait_for_result()` in `test_most_beautiful_word` (replaces `.result()`) - expand mod docstring to describe test coverage - reformat `check_parent_main_inheritance` docstr Review: PR goodboy#438 (Copilot) goodboy#438 (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code

Use walrus `:=` to combine the assignment and truthiness check for `_parent_main_data` into the `if` condition, cleanly skipping the fixup block when `inherit_parent_main=False` yields `{}`. Review: PR goodboy#438 (Copilot) goodboy#438 (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code

- "spawing" → "spawning", close unbalanced backtick on `` `start_method='trio'` `` - "uneeded" → "unneeded", "deats" → "details" - Remove double `d` annotation; filter `get_preparation_data()` result into only `ParentMainData` keys before returning - Use `pop('authkey', None)` for safety Review: PR #1 (Copilot) #1 (review) (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code

- "boostrap" → "bootstrap" in mod docstring - replace deprecated `portal.result()` with `portal.wait_for_result()` + value assertion inside the nursery block Review: PR #1 (Copilot) #1 (review) (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code

goodboy · 2026-04-11T01:03:21Z

Final sanity CI 🐎

goodboy

Yeah for me this is definitely more then good enough 😎

THANKS again @mahmoudhas for struggling through those final refinements with me 🫣 🤣

mahmoudhas changed the title ~~Add per-actor parent-main replay opt-out~~ Add per-actor parent-main inheritance opt-out Apr 6, 2026

goodboy requested a review from Copilot April 6, 2026 20:44

Copilot started reviewing on behalf of goodboy April 6, 2026 20:45 View session

goodboy self-requested a review April 6, 2026 20:48

goodboy added the spawning of processes, (shm) threads, tasks on varied (OS-specific) backends label Apr 6, 2026

Copilot AI reviewed Apr 6, 2026

View reviewed changes