feat: isolation serialization, CUDA IPC hardening, and CUDA wheel resolution (v0.9.2)#7
Conversation
…dexes Add opt-in CUDA wheel resolution for isolated extensions based on the host Python/torch/CUDA tuple, rewrite selected requirements to direct wheel URLs, and include resolver inputs in dependency cache invalidation. Also add resolver tests and clean up benchmark files so repo checks pass.
…serialization guard
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
📝 WalkthroughWalkthroughThis pull request introduces a CUDA wheel resolution subsystem for custom PyTorch installations, refactors benchmarking system improvements including async database methods and standardized timing collection, enhances serialization handling with a data-type flag mechanism, increases RPC message limits to 2GB with size warnings, and adds comprehensive test coverage across new and modified functionality. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Resolver
participant Config as Config Parser
participant Runtime as Runtime Detector
participant Index as Wheel Index
participant Parser as Candidate Parser
participant Selector as Wheel Selector
participant Handler as Result Handler
Client->>Config: load CUDAWheelConfig
Client->>Runtime: get_cuda_wheel_runtime()
Runtime-->>Client: torch_version, cuda_version, python_tags
Client->>Client: resolve_cuda_wheel_requirements()
loop for each requirement
Client->>Index: fetch index HTML
Index-->>Client: wheel links & metadata
Client->>Parser: parse wheel filenames & tags
Parser-->>Client: candidate wheels
Client->>Selector: filter by compatibility<br/>(CUDA, torch, python)
Selector-->>Client: matching wheels
Client->>Selector: select highest version<br/>& best platform
Selector-->>Client: chosen wheel URL
end
Client->>Handler: resolve_cuda_wheel_url()
Handler-->>Client: concrete wheel URL or error
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR prepares the v0.9.2 release by improving isolation-time serialization support, hardening RPC/CUDA IPC behavior, and adding a CUDA wheel resolution mechanism for isolated venv installs.
Changes:
- Add CUDA wheel resolution utilities plus dependency-install integration and unit tests.
- Extend
SerializerRegistry/protocols with adata_typeconcept and add deserialization guard tests. - Harden RPC transport limits/logging and adjust CUDA IPC cleanup to avoid exit-time aborts.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
pyproject.toml |
Bumps version to 0.9.2 and adds typing_extensions/packaging deps. |
pyisolate/interfaces.py |
Extends registry protocol with data_type and is_data_type(). |
pyisolate/config.py |
Adds CUDAWheelConfig and ExtensionConfig.cuda_wheels. |
pyisolate/_internal/cuda_wheels.py |
New CUDA wheel runtime probing + simple-index wheel selection logic. |
pyisolate/_internal/environment.py |
Integrates CUDA wheel resolution into dependency installation and cache keying. |
pyisolate/_internal/serialization_registry.py |
Adds _data_types tracking and is_data_type(). |
pyisolate/_internal/model_serialization.py |
Adds dict-only guard for adapter deserializers and tweaks handle resolution. |
pyisolate/_internal/rpc_transports.py |
Adds 2GB hard recv limit and >100MB warning logging. |
pyisolate/_internal/rpc_protocol.py |
Minor refactor to store prepared response before sending. |
pyisolate/_internal/tensor_serializer.py |
Removes exit-time SHM purge to avoid destructor race/SIGABRT. |
tests/test_serialization_registry.py |
Expands registry tests (data_type semantics + protocol checks). |
tests/test_model_serialization.py |
New tests for dict-guard deserialization behavior. |
tests/test_rpc_transports.py |
New tests for JSON socket recv size limits and connection errors. |
tests/test_cuda_wheels.py |
New unit tests for CUDA wheel resolution and dependency cache invalidation. |
benchmarks/* |
Formatting/import hygiene and benchmark harness loading implementation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| def _make_transport() -> JSONSocketTransport: | ||
| a, b = socket.socketpair() | ||
| b.close() | ||
| return JSONSocketTransport(a) | ||
|
|
There was a problem hiding this comment.
_make_transport() creates a socketpair and returns a JSONSocketTransport, but the transport/socket is never closed in the size-limit and connection-error tests that call it. This can leak file descriptors across the test run; consider making _make_transport() a contextmanager/fixture that yields the transport and closes it in finally, or ensure each test calls transport.close().
| logger.debug("Dependency cache read failed: %s", exc) | ||
|
|
||
| resolved_deps = safe_deps | ||
| if cuda_wheels_config: |
There was a problem hiding this comment.
resolve_cuda_wheel_requirements(...) is called whenever cuda_wheels_config is set, even if none of the current safe_deps match the configured CUDA wheel packages (you already computed needs_cuda_probe above). Consider only running resolution when a match was detected (e.g., gate on needs_cuda_probe / cuda_wheel_runtime is not None) to avoid unnecessary host probing and unexpected failures.
| if cuda_wheels_config: | |
| if cuda_wheels_config and cuda_wheel_runtime is not None: |
| def resolve_cuda_wheel_requirements(requirements: list[str], config: CUDAWheelConfig) -> list[str]: | ||
| normalized_config = _normalize_cuda_wheel_config(config) | ||
| configured_packages = set(normalized_config["packages"]) | ||
| environment = cast(dict[str, str], default_environment()) | ||
| runtime = get_cuda_wheel_runtime() |
There was a problem hiding this comment.
resolve_cuda_wheel_requirements() probes get_cuda_wheel_runtime() unconditionally before checking whether any requirement actually needs CUDA wheel resolution. This can raise (missing host torch / non-CUDA torch) even when no configured packages are present or when markers skip them. Consider lazily probing runtime only once you encounter a configured package, or accept an optional precomputed runtime similar to resolve_cuda_wheel_url.
| if isinstance(data, RemoteObjectHandle): | ||
| if _nested or extension is None: | ||
| return data | ||
| if registry.has_handler(data.type_name): | ||
| return data |
There was a problem hiding this comment.
The docstring says top-level RemoteObjectHandle values are resolved when an extension proxy is available, but this new registry.has_handler(data.type_name) branch skips resolution for some top-level handles. If this is intentional, update the docstring/behavior contract; if not, consider using a more specific signal than has_handler (e.g., is_data_type) so serializer registration alone doesn’t change handle-resolution semantics.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@benchmarks/benchmark.py`:
- Around line 349-354: Update the dependency check message to only list truly
required packages (remove "tabulate" from the printed string) and propagate the
exit code returned by run_benchmarks: call result =
asyncio.run(run_benchmarks(args.quick, args.no_torch, args.no_gpu,
args.torch_mode)) and return that result instead of unconditionally returning 0.
Locate the check using find_spec and the run_benchmarks invocation to make these
changes; ensure run_benchmarks indeed returns an int and keep TABULATE_AVAILABLE
handling unchanged.
In `@pyisolate/_internal/cuda_wheels.py`:
- Around line 143-149: The _fetch_index_html function currently swallows network
errors and returns None; update it to log debug-level failure details before
returning None. Catch the exceptions as e (e.g., except (HTTPError, URLError,
FileNotFoundError) as e:) and emit a debug log via the module logger
(logging.getLogger(__name__) or the existing module-level logger) that includes
the URL, timeout, and exception message/traceback to aid troubleshooting, then
return None as before.
In `@pyisolate/_internal/rpc_transports.py`:
- Around line 148-154: Replace the magic numeric literals with descriptive
module-level constants: define e.g. MAX_RPC_MESSAGE_SIZE = 2 * 1024 * 1024 *
1024 and LARGE_MESSAGE_WARNING_THRESHOLD = 100 * 1024 * 1024 at the top of the
module, then update the checks that currently use the literals (the comparisons
against msg_len in the block that raises ValueError and calls logger.warning) to
use these constants, and update the logger.warning message to reference the
named constant where helpful; ensure references to msg_len, logger.warning, and
the ValueError remain unchanged except for using the constants.
In `@pyisolate/_internal/tensor_serializer.py`:
- Around line 199-207: The current change removed the only automatic cleanup of
sender-side SHM and left orphan files unmanaged; restore a safe automatic owner
or make the opt-in behavior explicit: either (A) reinstate a guarded cleanup
call to purge_orphan_sender_shm_files() executed by a safe internal janitor
(e.g., a background thread/process or context manager) that runs only if
os.getenv('PYISOLATE_PURGE_SENDER_SHM') == '1' to avoid the double-unlink race
with the C++ allocator, or (B) keep automatic suppression but add explicit
documentation and runtime guidance and add a clear opt-in check here (near
flush_tensor_keeper) so that when PYISOLATE_PURGE_SENDER_SHM is enabled the code
will call purge_orphan_sender_shm_files() (with appropriate
exception-suppression and locking to avoid races); reference
purge_orphan_sender_shm_files, flush_tensor_keeper and the env var
PYISOLATE_PURGE_SENDER_SHM in the change.
In `@tests/test_cuda_wheels.py`:
- Around line 256-317: Move the local import of os in the
test_install_dependencies_cache_invalidation_tracks_cuda_runtime function to the
top-level imports for the test module; update the module imports so os is
imported alongside other test imports and remove the in-function import,
ensuring the test function still references os for path logic (venv_path /
"Scripts" / "python.exe" branch) without changing any test behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 14e81aa2-c0ce-4b3a-9dd7-5eb041fbee72
📒 Files selected for processing (18)
benchmarks/benchmark.pybenchmarks/benchmark_harness.pybenchmarks/memory_benchmark.pybenchmarks/simple_benchmark.pypyisolate/_internal/cuda_wheels.pypyisolate/_internal/environment.pypyisolate/_internal/model_serialization.pypyisolate/_internal/rpc_protocol.pypyisolate/_internal/rpc_transports.pypyisolate/_internal/serialization_registry.pypyisolate/_internal/tensor_serializer.pypyisolate/config.pypyisolate/interfaces.pypyproject.tomltests/test_cuda_wheels.pytests/test_model_serialization.pytests/test_rpc_transports.pytests/test_serialization_registry.py
| if find_spec("numpy") is None or find_spec("psutil") is None: | ||
| print("Please install dependencies: pip install numpy psutil tabulate") | ||
| return 1 | ||
|
|
||
| asyncio.run(run_benchmarks(args.quick, args.no_torch, args.no_gpu, args.torch_mode)) | ||
| return 0 |
There was a problem hiding this comment.
Error message lists optional dependency; run_benchmarks return value is ignored.
Two minor issues:
-
Line 350 mentions
tabulatein the required dependencies message, buttabulateis optional (handled gracefully viaTABULATE_AVAILABLE). Consider updating the message to only list truly required dependencies. -
Line 353 calls
run_benchmarks()which returns0, but this return value is ignored. Line 354 unconditionally returns0. Ifrun_benchmarksis intended to signal success/failure via its return value, propagate it.
Proposed fix
if find_spec("numpy") is None or find_spec("psutil") is None:
- print("Please install dependencies: pip install numpy psutil tabulate")
+ print("Please install dependencies: pip install numpy psutil")
+ print("Optional: pip install tabulate (for formatted output)")
return 1
- asyncio.run(run_benchmarks(args.quick, args.no_torch, args.no_gpu, args.torch_mode))
- return 0
+ return asyncio.run(run_benchmarks(args.quick, args.no_torch, args.no_gpu, args.torch_mode))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if find_spec("numpy") is None or find_spec("psutil") is None: | |
| print("Please install dependencies: pip install numpy psutil tabulate") | |
| return 1 | |
| asyncio.run(run_benchmarks(args.quick, args.no_torch, args.no_gpu, args.torch_mode)) | |
| return 0 | |
| if find_spec("numpy") is None or find_spec("psutil") is None: | |
| print("Please install dependencies: pip install numpy psutil") | |
| print("Optional: pip install tabulate (for formatted output)") | |
| return 1 | |
| return asyncio.run(run_benchmarks(args.quick, args.no_torch, args.no_gpu, args.torch_mode)) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@benchmarks/benchmark.py` around lines 349 - 354, Update the dependency check
message to only list truly required packages (remove "tabulate" from the printed
string) and propagate the exit code returned by run_benchmarks: call result =
asyncio.run(run_benchmarks(args.quick, args.no_torch, args.no_gpu,
args.torch_mode)) and return that result instead of unconditionally returning 0.
Locate the check using find_spec and the run_benchmarks invocation to make these
changes; ensure run_benchmarks indeed returns an int and keep TABULATE_AVAILABLE
handling unchanged.
| def _fetch_index_html(url: str) -> str | None: | ||
| try: | ||
| with urlopen(url, timeout=30) as response: # noqa: S310 - URL is explicit extension config | ||
| content: bytes = response.read() | ||
| return content.decode("utf-8") | ||
| except (HTTPError, URLError, FileNotFoundError): | ||
| return None |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Network timeout is appropriate; consider logging fetch failures.
The 30-second timeout is reasonable for index fetches. The function silently returns None on failure, which is handled by callers. Consider whether debug-level logging would help troubleshoot network issues.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pyisolate/_internal/cuda_wheels.py` around lines 143 - 149, The
_fetch_index_html function currently swallows network errors and returns None;
update it to log debug-level failure details before returning None. Catch the
exceptions as e (e.g., except (HTTPError, URLError, FileNotFoundError) as e:)
and emit a debug log via the module logger (logging.getLogger(__name__) or the
existing module-level logger) that includes the URL, timeout, and exception
message/traceback to aid troubleshooting, then return None as before.
| if msg_len > 2 * 1024 * 1024 * 1024: # 2GB hard limit | ||
| raise ValueError(f"Message too large: {msg_len} bytes") | ||
| if msg_len > 100 * 1024 * 1024: # 100MB — flag large payloads | ||
| logger.warning( | ||
| "Large RPC message: %.1fMB — consider SHM-backed transfer for this type", | ||
| msg_len / (1024 * 1024), | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
LGTM! Consider extracting named constants for clarity.
The limit increase to 2GB and the warning threshold at 100MB are sensible. The boundary tests in test_rpc_transports.py properly verify these limits.
Minor suggestion: extract magic numbers as module-level constants for self-documentation:
♻️ Optional refactor
+_MSG_SIZE_WARN_THRESHOLD = 100 * 1024 * 1024 # 100MB
+_MSG_SIZE_HARD_LIMIT = 2 * 1024 * 1024 * 1024 # 2GB
+
# In recv():
- if msg_len > 2 * 1024 * 1024 * 1024: # 2GB hard limit
+ if msg_len > _MSG_SIZE_HARD_LIMIT:
raise ValueError(f"Message too large: {msg_len} bytes")
- if msg_len > 100 * 1024 * 1024: # 100MB — flag large payloads
+ if msg_len > _MSG_SIZE_WARN_THRESHOLD:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pyisolate/_internal/rpc_transports.py` around lines 148 - 154, Replace the
magic numeric literals with descriptive module-level constants: define e.g.
MAX_RPC_MESSAGE_SIZE = 2 * 1024 * 1024 * 1024 and
LARGE_MESSAGE_WARNING_THRESHOLD = 100 * 1024 * 1024 at the top of the module,
then update the checks that currently use the literals (the comparisons against
msg_len in the block that raises ValueError and calls logger.warning) to use
these constants, and update the logger.warning message to reference the named
constant where helpful; ensure references to msg_len, logger.warning, and the
ValueError remain unchanged except for using the constants.
| # purge_orphan_sender_shm_files is intentionally NOT called here. | ||
| # Calling it with force=True races with the C++ RefcountedMapAllocator | ||
| # destructor: SIGKILL'd child peers leave the refcount file at count > 1, | ||
| # so the host's C++ close() decrements but does NOT unlink (refcount still > 0). | ||
| # purge_orphan would then unlink the file while the C++ mmap is still open, | ||
| # causing a double-unlink → ENOENT → SIGABRT at process exit. | ||
| # Files are cleaned up by the C++ destructor when the last consumer closes. | ||
| with contextlib.suppress(Exception): | ||
| flush_tensor_keeper() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Internal call sites for purge_orphan_sender_shm_files =="
rg -n -C2 '\bpurge_orphan_sender_shm_files\s*\('
echo
echo "== Cleanup-related hooks and env flags =="
rg -n -C2 'PYISOLATE_PURGE_SENDER_SHM|PYISOLATE_SIGNAL_CLEANUP|atexit\.register|signal\.signal'
echo
echo "== User-facing docs mentioning manual orphan SHM cleanup =="
fd -t f 'README(\..*)?$|.*\.md$' . | xargs -r rg -n -C2 'purge_orphan_sender_shm_files|PYISOLATE_PURGE_SENDER_SHM|orphan.*shm|shared memory'Repository: Comfy-Org/pyisolate
Length of output: 5413
Restore a safe orphan-SHM cleanup owner, or explicitly document manual invocation.
This removes the only automatic cleanup path for sender-side torch_<pid>_* SHM files. The SIGABRT fix (avoiding the double-unlink race) is necessary, but removes the safeguard against unbounded /dev/shm leakage in long-lived hosts when children terminate abnormally. Cleanup is now manual—either by setting the opt-in environment variable PYISOLATE_PURGE_SENDER_SHM=1 (disabled by default) or by explicitly calling purge_orphan_sender_shm_files() after abnormal termination—but user-facing documentation does not surface this trade-off. Either move orphan cleanup to a safe internal owner (for example a context manager or periodic janitor) or clearly document that applications must invoke purge_orphan_sender_shm_files() themselves or set PYISOLATE_PURGE_SENDER_SHM=1 when children may be killed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pyisolate/_internal/tensor_serializer.py` around lines 199 - 207, The current
change removed the only automatic cleanup of sender-side SHM and left orphan
files unmanaged; restore a safe automatic owner or make the opt-in behavior
explicit: either (A) reinstate a guarded cleanup call to
purge_orphan_sender_shm_files() executed by a safe internal janitor (e.g., a
background thread/process or context manager) that runs only if
os.getenv('PYISOLATE_PURGE_SENDER_SHM') == '1' to avoid the double-unlink race
with the C++ allocator, or (B) keep automatic suppression but add explicit
documentation and runtime guidance and add a clear opt-in check here (near
flush_tensor_keeper) so that when PYISOLATE_PURGE_SENDER_SHM is enabled the code
will call purge_orphan_sender_shm_files() (with appropriate
exception-suppression and locking to avoid races); reference
purge_orphan_sender_shm_files, flush_tensor_keeper and the env var
PYISOLATE_PURGE_SENDER_SHM in the change.
| def test_install_dependencies_cache_invalidation_tracks_cuda_runtime(monkeypatch, tmp_path): | ||
| import os | ||
|
|
||
| venv_path = tmp_path / "venv" | ||
| python_exe = venv_path / "Scripts" / "python.exe" if os.name == "nt" else venv_path / "bin" / "python" | ||
| python_exe.parent.mkdir(parents=True, exist_ok=True) | ||
| python_exe.write_text("#!/usr/bin/env python\n", encoding="utf-8") | ||
|
|
||
| monkeypatch.setattr(environment.shutil, "which", lambda binary: "/usr/bin/uv") | ||
| monkeypatch.setattr( | ||
| environment, | ||
| "exclude_satisfied_requirements", | ||
| lambda config, requirements, python_exe: requirements, | ||
| ) | ||
| monkeypatch.setattr( | ||
| environment, | ||
| "resolve_cuda_wheel_requirements", | ||
| lambda requirements, config: ["https://example.invalid/flash_attn.whl"], | ||
| ) | ||
|
|
||
| current_runtime = {"value": {"torch": "2.8", "cuda": "12.8", "python_tags": ["cp312"]}} | ||
| monkeypatch.setattr( | ||
| environment, | ||
| "get_cuda_wheel_runtime_descriptor", | ||
| lambda: current_runtime["value"], | ||
| ) | ||
|
|
||
| popen_calls: list[list[str]] = [] | ||
|
|
||
| class MockPopen: | ||
| def __init__(self, cmd, **kwargs): | ||
| popen_calls.append(cmd) | ||
| self.stdout = io.StringIO("installed\n") | ||
|
|
||
| def wait(self): | ||
| return 0 | ||
|
|
||
| def __enter__(self): | ||
| return self | ||
|
|
||
| def __exit__(self, exc_type, exc, tb): | ||
| return False | ||
|
|
||
| monkeypatch.setattr(environment.subprocess, "Popen", MockPopen) | ||
|
|
||
| config = { | ||
| "dependencies": ["flash-attn>=1.0"], | ||
| "share_torch": True, | ||
| "cuda_wheels": { | ||
| "index_url": "https://example.invalid/cuda-wheels/", | ||
| "packages": ["flash-attn"], | ||
| "package_map": {}, | ||
| }, | ||
| } | ||
|
|
||
| environment.install_dependencies(venv_path, config, "demo") | ||
| environment.install_dependencies(venv_path, config, "demo") | ||
|
|
||
| current_runtime["value"] = {"torch": "2.8", "cuda": "12.9", "python_tags": ["cp312"]} | ||
| environment.install_dependencies(venv_path, config, "demo") | ||
|
|
||
| assert len(popen_calls) == 2 |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider importing os at module level for consistency.
The os import on line 257 could be moved to the module level with other imports. This is a minor style concern but would improve consistency.
The test logic itself is excellent - it properly validates cache invalidation by:
- Running install twice with same runtime (should only call Popen once)
- Changing CUDA version and running again (should trigger second Popen call)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_cuda_wheels.py` around lines 256 - 317, Move the local import of
os in the test_install_dependencies_cache_invalidation_tracks_cuda_runtime
function to the top-level imports for the test module; update the module imports
so os is imported alongside other test imports and remove the in-function
import, ensuring the test function still references os for path logic (venv_path
/ "Scripts" / "python.exe" branch) without changing any test behavior.
Summary
purge_orphan_sender_shm_filesracing the C++RefcountedMapAllocatordestructor — SIGKILL'd child peers leave the IPC refcount file at count > 1, so the host's C++close()decrements but does not unlink;purge_orphanthen unlinks the file while the C++ mmap is still open, causing a double-unlink → ENOENT → SIGABRTTest plan
pytest tests/test_*.py --ignore=tests/test_sandbox_integration.pypytest tests/integration_v2/test_tensors.pyruff check,ruff format --check,mypy pyisolate