feat: isolation serialization, CUDA IPC hardening, and CUDA wheel resolution (v0.9.2)#6
feat: isolation serialization, CUDA IPC hardening, and CUDA wheel resolution (v0.9.2)#6pollockjj wants to merge 4 commits intoComfy-Org:mainfrom
Conversation
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughAdds CUDA wheel resolution and integration into dependency installation, expands benchmark harness and runners with torch-aware modes and async extension APIs, extends serialization registry with data_type semantics, raises RPC JSON transport limit to 2GB with warnings, and updates tests and project metadata (v0.9.2). Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client Code
participant Resolver as resolve_cuda_wheel_url()
participant Runtime as get_cuda_wheel_runtime()
participant Config as Config Normalizer
participant Fetcher as _fetch_index_html()
participant Parser as _parse_index_links()
participant Evaluator as Wheel Candidate Evaluator
Client->>Resolver: request resolution(requirement, config)
Resolver->>Runtime: query torch/CUDA/python runtime
Runtime-->>Resolver: CUDAWheelRuntime
Resolver->>Config: normalize/validate cuda_wheels config
Config-->>Resolver: normalized config
Resolver->>Fetcher: fetch index HTML
Fetcher-->>Resolver: HTML content
Resolver->>Parser: extract wheel links
Parser-->>Resolver: wheel URLs
Resolver->>Evaluator: filter/rank candidates by runtime, tags, version
Evaluator-->>Resolver: best wheel URL
Resolver-->>Client: resolved wheel URL
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
There was a problem hiding this comment.
Pull request overview
This PR extends PyIsolate’s serialization/RPC infrastructure and dependency installation to better handle large payloads and CUDA-specific wheel resolution, with accompanying unit tests and a version bump.
Changes:
- Add custom “CUDA wheels” resolution via a simple-index HTML fetch/parse flow, and thread CUDA-runtime info into dependency-cache invalidation.
- Add
data_typesupport to the serializer registry/protocol and add guardrails to avoid double-deserialization indeserialize_from_isolation. - Enforce a 2GB hard receive limit (with warnings above 100MB) for JSON socket RPC messages, and add targeted tests for these behaviors.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_serialization_registry.py | Expands coverage for new data_type registry behavior and protocol surface. |
| tests/test_rpc_transports.py | New tests covering JSONSocketTransport recv size limits, warnings, and connection errors. |
| tests/test_model_serialization.py | New tests validating dict-only deserializer application and passthrough behavior. |
| tests/test_cuda_wheels.py | New tests for CUDA wheel URL resolution and dependency-cache invalidation behavior. |
| pyproject.toml | Bumps version to 0.9.2 and adds packaging dependency. |
| pyisolate/interfaces.py | Extends SerializerRegistryProtocol.register with kw-only data_type and adds is_data_type. |
| pyisolate/config.py | Introduces CUDAWheelConfig + cuda_wheels extension config wiring. |
| pyisolate/_internal/tensor_serializer.py | Adjusts atexit cleanup behavior to avoid shutdown race/double-unlink issues. |
| pyisolate/_internal/serialization_registry.py | Implements data_type tracking and is_data_type() with clear/reset behavior. |
| pyisolate/_internal/rpc_transports.py | Adds 2GB hard limit and 100MB warning threshold to recv path. |
| pyisolate/_internal/rpc_protocol.py | Minor refactor to store prepared response before send. |
| pyisolate/_internal/model_serialization.py | Adds dict-only guard for adapter deserializers and tweaks RemoteObjectHandle handling. |
| pyisolate/_internal/environment.py | Integrates CUDA wheel resolution into dependency install flow + cache descriptor. |
| pyisolate/_internal/cuda_wheels.py | New module implementing runtime detection and wheel URL resolution from a simple index. |
| benchmarks/simple_benchmark.py | Minor formatting change. |
| benchmarks/memory_benchmark.py | Import cleanup/typing tweaks and runner class addition. |
| benchmarks/benchmark.py | Formatting/refactor improvements and safer dependency presence checks. |
| benchmarks/benchmark_harness.py | Refactor harness loading and torch multiprocessing setup handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| @@ -28,6 +29,7 @@ keywords = ["virtual environment", "venv", "development"] | |||
| dependencies = [ | |||
| "uv>=0.1.0", | |||
| "tomli>=2.0.1; python_version < '3.11'", | |||
There was a problem hiding this comment.
pyisolate/config.py now imports typing_extensions.NotRequired, but typing-extensions is not listed in the project dependencies. This will raise ModuleNotFoundError for users on Python 3.10 (and any envs without the transitive dep). Add an explicit typing-extensions dependency (optionally environment-marked for Python < 3.11).
| "tomli>=2.0.1; python_version < '3.11'", | |
| "tomli>=2.0.1; python_version < '3.11'", | |
| "typing-extensions>=4.0.0; python_version < '3.11'", |
| def _fetch_index_html(url: str) -> str | None: | ||
| try: | ||
| with urlopen(url) as response: # noqa: S310 - URL is explicit extension config | ||
| content: bytes = response.read() | ||
| return content.decode("utf-8") | ||
| except (HTTPError, URLError, FileNotFoundError): | ||
| return None |
There was a problem hiding this comment.
urlopen() is called without a timeout, which can cause install_dependencies() to hang indefinitely if the wheel index is slow/unresponsive. Add a reasonable default timeout (and consider restricting schemes to http/https if this config can come from untrusted sources).
tests/test_rpc_transports.py
Outdated
|
|
||
|
|
||
| def _make_transport() -> JSONSocketTransport: | ||
| a, _ = socket.socketpair() |
There was a problem hiding this comment.
_make_transport() creates a socketpair() but drops the peer socket without closing it, leaking file descriptors across the test run. Close the unused peer (or return both ends and ensure both transports/sockets are closed).
| a, _ = socket.socketpair() | |
| a, b = socket.socketpair() | |
| b.close() |
tests/test_rpc_transports.py
Outdated
| @pytest.fixture() | ||
| def socket_pair() -> tuple[JSONSocketTransport, JSONSocketTransport]: | ||
| a, b = socket.socketpair() | ||
| return JSONSocketTransport(a), JSONSocketTransport(b) |
There was a problem hiding this comment.
socket_pair yields two transports but never closes them, which can leak sockets/FDs and make the test suite flaky under low ulimit settings. Convert this to a yielding fixture with a teardown that calls close() on both transports (and/or closes the underlying sockets).
| return JSONSocketTransport(a), JSONSocketTransport(b) | |
| transport_a = JSONSocketTransport(a) | |
| transport_b = JSONSocketTransport(b) | |
| try: | |
| yield transport_a, transport_b | |
| finally: | |
| transport_a.close() | |
| transport_b.close() |
a484333 to
9350f9f
Compare
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment Thanks for integrating Codecov - We've got you covered ☂️ |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
benchmarks/memory_benchmark.py (2)
374-377:⚠️ Potential issue | 🟠 MajorAlways clean up extensions and tensors in a
finallyblock.Any exception before the tail cleanup path leaves isolated child processes and CUDA/IPC state alive, which can skew later measurements in the same run and leak resources after a failed benchmark iteration.
Suggested pattern
manager = ExtensionManager( MemoryBenchmarkExtensionBase, ExtensionManagerConfig(venv_root_path=str(self.test_base.test_root / "extension-venvs")), ) + test_tensor = None - # Clean up and reset baseline before measuring - gc.collect() - if CUDA_AVAILABLE: - torch.cuda.empty_cache() - torch.cuda.synchronize() # Ensure all operations complete + try: + # Clean up and reset baseline before measuring + gc.collect() + if CUDA_AVAILABLE: + torch.cuda.empty_cache() + torch.cuda.synchronize() # Ensure all operations complete - # Reset GPU memory baseline for this test - self.memory_tracker.reset_baseline() + # Reset GPU memory baseline for this test + self.memory_tracker.reset_baseline() - # Wait a moment for memory to settle - await asyncio.sleep(1) + # Wait a moment for memory to settle + await asyncio.sleep(1) - ... + ... - manager.stop_all_extensions() - del test_tensor - gc.collect() - if CUDA_AVAILABLE: - torch.cuda.empty_cache() - torch.cuda.synchronize() - - # Wait for cleanup - await asyncio.sleep(2) + finally: + manager.stop_all_extensions() + if test_tensor is not None: + del test_tensor + gc.collect() + if CUDA_AVAILABLE: + torch.cuda.empty_cache() + torch.cuda.synchronize() + await asyncio.sleep(2)Apply the same structure in
run_large_tensor_sharing_testforlarge_tensor.Also applies to: 397-550, 584-689
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/memory_benchmark.py` around lines 374 - 377, The ExtensionManager instantiation and large tensor allocation in run_large_tensor_sharing_test (and similar blocks) must be wrapped in try/finally so resources are always released; specifically, ensure the ExtensionManager (created from MemoryBenchmarkExtensionBase and ExtensionManagerConfig) is shut down in a finally block and that any allocated large_tensor is freed/closed in the same finally, mirroring the suggested pattern referenced in the comment so child processes and CUDA/IPC state are always cleaned up even on exceptions.
367-368:⚠️ Potential issue | 🟡 MinorReject
num_extensions <= 0before computing per-extension metrics.
run_scaling_test()divides bynum_extensions, so--counts 0currently crashes withZeroDivisionError. The large-tensor path already guards the equivalent calculation, so this method should validate or guard too.Minimal fix
for num_extensions in num_extensions_list: + if num_extensions <= 0: + raise ValueError("num_extensions must be greater than 0") + print(f"\n{'=' * 60}") print(f"Testing with {num_extensions} extensions (share_torch={share_torch})") print("=" * 60)Also applies to: 488-499
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/memory_benchmark.py` around lines 367 - 368, run_scaling_test() currently divides by num_extensions when computing per-extension metrics and crashes on num_extensions <= 0; before doing the per-extension calculations (the loop over num_extensions_list and the computations later around the per-extension metrics block near the other guarded large-tensor path), validate num_extensions and either raise a clear ValueError for num_extensions <= 0 or skip/continue that iteration (e.g., if num_extensions == 0, set per-extension metrics to None or skip printing) so you never perform a division-by-zero; update the checks where per-extension metrics are computed (refer to run_scaling_test and the per-extension metric calculation block) to include this guard.benchmarks/simple_benchmark.py (1)
39-52: 🧹 Nitpick | 🔵 TrivialDuplicate import statement.
osis imported twice withinmeasure_rpc_overhead: once at line 39 and again at line 52. The second import is redundant.🧹 Proposed fix
import os + from typing import TypedDict, cast if sys.platform == "linux" and os.environ.get("TMPDIR") != "/dev/shm": print("WARNING: TMPDIR is not set to /dev/shm on Linux.") @@ -49,8 +50,6 @@ print("Setting up extensions (this may take a moment)...") # Use the same setup as the example try: - import os - from typing import TypedDict, cast import yaml🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/simple_benchmark.py` around lines 39 - 52, The function measure_rpc_overhead contains a duplicate import of the os module (import os appears at the top of the function and again later); remove the redundant second import (the one inside the try block) so os is only imported once (prefer top-level imports), and run lint/tests to confirm no other duplicate imports remain — references: measure_rpc_overhead, the redundant "import os" inside the try block.pyisolate/_internal/tensor_serializer.py (1)
168-195:⚠️ Potential issue | 🟡 MinorImprove docstring to document the force parameter's race condition risk.
purge_orphan_sender_shm_filesis intentionally exported for external/manual invocation. However, the docstring lacks critical safety information: usingforce=Trueraces with the C++RefcountedMapAllocatordestructor and can causeENOENT → SIGABRTat process exit if orphan files are unlinked while the C++ mmap is still open. Add this warning to the docstring so users understand when it's safe to call withforce=True.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pyisolate/_internal/tensor_serializer.py` around lines 168 - 195, Update the docstring for purge_orphan_sender_shm_files to document that using force=True can race with the C++ RefcountedMapAllocator destructor and may cause an ENOENT→SIGABRT at process exit if the underlying mmap is still open; explicitly state when it is safe to use force (e.g., only when you can guarantee the C++ allocator/mmap is no longer active or from a supervisory process after the originating process has exited) and advise preferring the default guarded behavior (PYISOLATE_PURGE_SENDER_SHM or force=False) otherwise so callers understand the risk and safe usage patterns.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@benchmarks/benchmark_harness.py`:
- Around line 68-76: The ExtensionConfig TypedDict is missing required keys
(share_cuda_ipc, sandbox, sandbox_mode, env) when constructed before calling
self.manager.load_extension; add those fields with sensible defaults (e.g.,
share_cuda_ipc=False, sandbox=False, sandbox_mode="default" or the module's
default, and env={} or None as appropriate) in the config passed to
ExtensionConfig so the TypedDict contract is satisfied (alternatively update
ExtensionConfig to mark those keys NotRequired if you prefer), and ensure this
aligns with how Extension.__init__ consumes .get() defaults.
In `@pyisolate/_internal/cuda_wheels.py`:
- Around line 234-238: resolve_cuda_wheel_requirements currently calls
get_cuda_wheel_runtime() eagerly which raises on CPU-only hosts even when no
requirements need rewriting; instead defer probing until a specific requirement
actually requires CUDA handling. Remove the early runtime =
get_cuda_wheel_runtime() call and call get_cuda_wheel_runtime() lazily inside
the loop that iterates over requirements (only when req.package in
configured_packages and its marker does not skip it), cache the runtime result
for subsequent rewrites, and ensure any exception is only raised when a rewrite
is actually attempted; update the same pattern in the related helper code
referenced around the later block handling cuda_wheels.packages.
In `@pyisolate/_internal/environment.py`:
- Around line 313-317: The code eagerly calls
get_cuda_wheel_runtime_descriptor() whenever cuda_wheels_config is present,
causing unnecessary host CUDA probes; change the logic so cuda_wheel_runtime
remains None unless the current install set contains at least one
resolver-targeted dependency that could be rewritten for CUDA (i.e., detect
resolver-targeted deps in the install set before probing). Specifically, keep
the existing cuda_wheels_config and cuda_wheel_runtime variables but defer
invoking get_cuda_wheel_runtime_descriptor() until after checking the install
set for resolver-targeted dependencies; only then assign cuda_wheel_runtime =
get_cuda_wheel_runtime_descriptor().
- Around line 366-373: The logs currently print raw dependency URLs (e.g.,
original_dep, resolved_dep and any install target lists or uv output) which may
contain credentials; before calling logger.info in the CUDA_WHEEL_RESOLVED site
(the loop over safe_deps/resolved_deps using zip), and the other two logging
sites you flagged, sanitize values by parsing any string that looks like a URL
and replacing it with only stable metadata (host/netloc and wheel filename/path
basename) while stripping query strings and userinfo; fallback to the original
string when parsing fails. Update the logger.info calls that reference
original_dep, resolved_dep, install target lists or raw uv output to pass the
sanitized versions instead so only host and filename (no credentials/query)
appear in logs.
In `@pyisolate/config.py`:
- Line 6: The import of NotRequired from typing_extensions in config.py causes
runtime failures because typing_extensions is not declared as a dependency;
either add "typing_extensions" to pyproject.toml dependencies, or change the
import in config.py to a safe conditional import: try to import NotRequired from
typing (for Python versions that include it) and fall back to importing from
typing_extensions, ensuring the symbol NotRequired is always defined at runtime.
In `@pyproject.toml`:
- Around line 29-33: The dependency list is missing typing_extensions for older
Python versions which pyisolate/config.py relies on (it imports NotRequired);
update the dependencies array in pyproject.toml to include typing_extensions
(e.g., "typing_extensions>=4.0; python_version < '3.11'") alongside tomli and
packaging so NotRequired is available on Python <3.11.
In `@tests/test_cuda_wheels.py`:
- Around line 250-253: The test always creates venv/bin/python but the
environment lookup (pyisolate/_internal/environment.py search for
Scripts/python.exe) expects the Windows layout; update the fixture that sets
venv_path/python_exe to create the platform-specific venv layout instead of
hard-coding "bin/python": detect the platform (e.g., sys.platform or os.name)
and create "Scripts/python.exe" on Windows (with appropriate executable content
and extension) and "bin/python" on POSIX, or create both paths for reliability;
reference venv_path and python_exe when making the path change so the test
mirrors the real environment layout used by the code under test.
---
Outside diff comments:
In `@benchmarks/memory_benchmark.py`:
- Around line 374-377: The ExtensionManager instantiation and large tensor
allocation in run_large_tensor_sharing_test (and similar blocks) must be wrapped
in try/finally so resources are always released; specifically, ensure the
ExtensionManager (created from MemoryBenchmarkExtensionBase and
ExtensionManagerConfig) is shut down in a finally block and that any allocated
large_tensor is freed/closed in the same finally, mirroring the suggested
pattern referenced in the comment so child processes and CUDA/IPC state are
always cleaned up even on exceptions.
- Around line 367-368: run_scaling_test() currently divides by num_extensions
when computing per-extension metrics and crashes on num_extensions <= 0; before
doing the per-extension calculations (the loop over num_extensions_list and the
computations later around the per-extension metrics block near the other guarded
large-tensor path), validate num_extensions and either raise a clear ValueError
for num_extensions <= 0 or skip/continue that iteration (e.g., if num_extensions
== 0, set per-extension metrics to None or skip printing) so you never perform a
division-by-zero; update the checks where per-extension metrics are computed
(refer to run_scaling_test and the per-extension metric calculation block) to
include this guard.
In `@benchmarks/simple_benchmark.py`:
- Around line 39-52: The function measure_rpc_overhead contains a duplicate
import of the os module (import os appears at the top of the function and again
later); remove the redundant second import (the one inside the try block) so os
is only imported once (prefer top-level imports), and run lint/tests to confirm
no other duplicate imports remain — references: measure_rpc_overhead, the
redundant "import os" inside the try block.
In `@pyisolate/_internal/tensor_serializer.py`:
- Around line 168-195: Update the docstring for purge_orphan_sender_shm_files to
document that using force=True can race with the C++ RefcountedMapAllocator
destructor and may cause an ENOENT→SIGABRT at process exit if the underlying
mmap is still open; explicitly state when it is safe to use force (e.g., only
when you can guarantee the C++ allocator/mmap is no longer active or from a
supervisory process after the originating process has exited) and advise
preferring the default guarded behavior (PYISOLATE_PURGE_SENDER_SHM or
force=False) otherwise so callers understand the risk and safe usage patterns.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1aef4ea9-b241-4ccb-9008-a02deeb2d35d
📒 Files selected for processing (18)
benchmarks/benchmark.pybenchmarks/benchmark_harness.pybenchmarks/memory_benchmark.pybenchmarks/simple_benchmark.pypyisolate/_internal/cuda_wheels.pypyisolate/_internal/environment.pypyisolate/_internal/model_serialization.pypyisolate/_internal/rpc_protocol.pypyisolate/_internal/rpc_transports.pypyisolate/_internal/serialization_registry.pypyisolate/_internal/tensor_serializer.pypyisolate/config.pypyisolate/interfaces.pypyproject.tomltests/test_cuda_wheels.pytests/test_model_serialization.pytests/test_rpc_transports.pytests/test_serialization_registry.py
| def resolve_cuda_wheel_requirements(requirements: list[str], config: CUDAWheelConfig) -> list[str]: | ||
| normalized_config = _normalize_cuda_wheel_config(config) | ||
| configured_packages = set(normalized_config["packages"]) | ||
| environment = cast(dict[str, str], default_environment()) | ||
| runtime = get_cuda_wheel_runtime() |
There was a problem hiding this comment.
Defer CUDA runtime probing until a dependency actually needs rewriting.
get_cuda_wheel_runtime() runs before the loop, so this helper raises on CPU-only hosts even when none of the input requirements are in cuda_wheels.packages or every matching entry is skipped by its marker. That turns a no-op rewrite pass into an unnecessary install blocker.
Also applies to: 256-270
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pyisolate/_internal/cuda_wheels.py` around lines 234 - 238,
resolve_cuda_wheel_requirements currently calls get_cuda_wheel_runtime() eagerly
which raises on CPU-only hosts even when no requirements need rewriting; instead
defer probing until a specific requirement actually requires CUDA handling.
Remove the early runtime = get_cuda_wheel_runtime() call and call
get_cuda_wheel_runtime() lazily inside the loop that iterates over requirements
(only when req.package in configured_packages and its marker does not skip it),
cache the runtime result for subsequent rewrites, and ensure any exception is
only raised when a rewrite is actually attempted; update the same pattern in the
related helper code referenced around the later block handling
cuda_wheels.packages.
9350f9f to
94a29d4
Compare
…dexes Add opt-in CUDA wheel resolution for isolated extensions based on the host Python/torch/CUDA tuple, rewrite selected requirements to direct wheel URLs, and include resolver inputs in dependency cache invalidation. Also add resolver tests and clean up benchmark files so repo checks pass.
…serialization guard
94a29d4 to
7e407d0
Compare
Summary
purge_orphan_sender_shm_filesracing the C++RefcountedMapAllocatordestructor — SIGKILL'd child peers leave the IPC refcount file at count > 1, so the host's C++close()decrements but does not unlink;purge_orphanthen unlinks the file while the C++ mmap is still open, causing a double-unlink → ENOENT → SIGABRTTest plan
pytest tests/test_*.py --ignore=tests/test_sandbox_integration.pypytest tests/integration_v2/test_tensors.pyruff check,ruff format --check,mypy pyisolate