test(core,bindings): add mini parallel plugins in test conftests by seberg · Pull Request #2228 · NVIDIA/cuda-python

seberg · 2026-06-16T08:47:33Z

Add pytest plugins in bindings/core conftest files to wrap parallel worker tests with CUDA context setup and fixture-aware initialization paths used by mini plugin execution.

This can wait until the end, but I thought I'd split it out now since it is the most tedious part about pytest-run-parallel testing (beyond just churn) and more likely to need some iterations.

(The plugin part runs fine locally so I think it can be merged; there is just a very small chance of follow-ups eventually since there are many test fixes. -- without using pytest-run-parallel this is dead-code beyond some fixture tweaks, though.)

Add lightweight pytest plugins in bindings/core conftest files to wrap parallel worker tests with CUDA context setup and fixture-aware initialization paths used by mini plugin execution.

copy-pr-bot · 2026-06-16T08:47:38Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

rwgk · 2026-06-30T01:30:48Z

gpt-5.5

PR 2228 Findings

Reviewed PR 2228 (test(core,bindings): add mini parallel plugins in test conftests) at head 2d94fc4.

Findings

High: per-worker CUDA fixture replacement mutates shared pytest-run-parallel kwargs

Files:

cuda_bindings/tests/conftest.py:62
cuda_core/tests/conftest.py:125
pytest-run-parallel reference: /wrk/clone/pytest-run-parallel/src/pytest_run_parallel/plugin.py:94

PR 2228 mutates kwargs in place before calling the test function. pytest-run-parallel passes the same kwargs dict to every worker thread (worker_kwargs = kwargs), so one worker can overwrite another worker's ctx, device, init_cuda, or mempool fixture value. That defeats the purpose of per-thread CUDA setup and can cause tests to run with another worker's context/device.

Suggested fix if keeping this approach: copy first, mutate the copy, and call the test with the copy:

call_kwargs = dict(kwargs)
# mutate call_kwargs only
return func(*args, **call_kwargs)

Medium: bindings coverage misses `ctx`-only cuFile tests

Files:

cuda_bindings/tests/conftest.py:76
cuda_bindings/tests/test_cufile.py:1435
cuda_bindings/tests/test_cufile.py:1472
cuda_bindings/tests/test_cufile.py:1513

The bindings mini-plugin wraps tests only when device or driver appears in fixturenames. Some cuFile tests request ctx and cufile_env_json but not device or driver, so they are not wrapped. Their ctx fixture sets a current context in the pytest setup thread, while pytest-run-parallel executes the test body in worker threads. These tests may still run without a current CUDA context in the worker thread.

Suggested fix: include an explicit cuFile trigger such as cufile_env_json, or otherwise include ctx users while excluding the cuDLA subtree, which intentionally overrides ctx as a no-op to avoid CUDA initialization.

Low: global-option check misses marker-only parallelism

Files:

cuda_bindings/tests/conftest.py:31
cuda_core/tests/conftest.py:95

The mini-plugins register only when the global --parallel-threads option is auto or >1. pytest-run-parallel can also force parallelism per test via markers such as @pytest.mark.force_parallel_threads(n) or the deprecated @pytest.mark.parallel_threads(n). In marker-only parallel runs, the CUDA context workaround would not register.

Suggested fix: install/register the workaround whenever pytest-run-parallel is importable, or delay the decision to collection time and inspect per-item worker count/markers.

pytest-run-parallel Support Assessment

The claim from the team discussion is basically correct: pytest-run-parallel does not currently expose a first-class hook for "after worker thread creation" or "before each worker invokes the test".

Relevant implementation points:

Worker threads are created inside wrap_function_parallel() in /wrk/clone/pytest-run-parallel/src/pytest_run_parallel/plugin.py.
The worker target is a nested closure() in that function.
The plugin does not define custom hookspecs, pytest_addhooks, or an add-hooks registration path.
The README explicitly documents that fixture values are shared between threads.

Because of that, PR 2228's broad approach of wrapping item.obj before pytest-run-parallel wraps it is a reasonable workaround direction, but the shared-kwargs mutation bug needs fixing before it is safe.

Fixture Timing Clarification

The CUDA fixtures are not executed at import time. They run during pytest setup. pytest-run-parallel then takes the fixture-produced kwargs and shares them across worker threads. The thread-local issue is the CUDA current context, not cuInit() itself.

Verification Performed During Review

Fetched seberg/ft-testing-mini-plugin and fast-forward pulled the current branch; it was already up to date.
Checked PR 2228 metadata and diff via the GitHub connector.
Inspected changed files:
- cuda_bindings/tests/conftest.py
- cuda_core/tests/conftest.py
Inspected local pytest-run-parallel source under /wrk/clone/pytest-run-parallel.
Ran git diff --check for the PR range; no whitespace errors were reported.
Did not run CUDA pytest at that review point because TestVenv/ was absent then and repo guidance says to ask how to run tests when TestVenv/ is missing.

rwgk · 2026-06-30T01:33:41Z

@seberg I used gpt-5.5 to create the alternative under #2282

I was hoping based on the monkeypatch the conftest.py changes would turn out smaller than they are here in this PR, but the complexity seems pretty similar at first sight.

What direction do you prefer?

seberg · 2026-06-30T07:03:54Z

FWIW, I opened an issue on pytest-run-parallel, but have to see if I can just implement it myself: Quansight-Labs/pytest-run-parallel#189

I don't think the code will get much shorter with that, but nicer/cleaner (and the threading part of the tests actually better).
If we don't get the exact fixture information super easily, I am not sure it matters, as I suspect we can just always init cuda. The end goal there would be something like (I admit, I am not sure I am remembering pytest hooks quite right):

@pytest.hookimpl("parallel_thread_setup")
def parallel_thread_setup(*args, **kwargs):
    init_cuda()
    if "ctx" in kwargs:
         kwargs[ctx] = ...
    return args, kwargs

Although, now that I write that. A hook that runs during item collection and (may) return such a setup function may also work and would have the fixture information more naturally, I think.

I suppose w.r.t. to this: I would probably rather stick with this unless we figure out a nicer patch to upstream. But we could do that.

TST: add mini parallel plugins in test conftests

2d94fc4

Add lightweight pytest plugins in bindings/core conftest files to wrap parallel worker tests with CUDA context setup and fixture-aware initialization paths used by mini plugin execution.

seberg added this to the cuda.core v1.1.0 milestone Jun 16, 2026

seberg self-assigned this Jun 16, 2026

seberg added P0 High priority - Must do! test Improvements or additions to tests cuda.bindings Everything related to the cuda.bindings module cuda.core Everything related to the cuda.core module labels Jun 16, 2026

seberg added P1 Medium priority - Should do PR review get-together Mark PRs you'd like the team to review at the weekly PR review get-together. and removed P0 High priority - Must do! labels Jun 16, 2026

rwgk mentioned this pull request Jun 30, 2026

test: patch pytest-run-parallel worker context setup #2282

Draft

rwgk self-requested a review June 30, 2026 01:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(core,bindings): add mini parallel plugins in test conftests#2228

test(core,bindings): add mini parallel plugins in test conftests#2228
seberg wants to merge 1 commit into
NVIDIA:mainfrom
seberg:ft-testing-mini-plugin

seberg commented Jun 16, 2026

Uh oh!

copy-pr-bot Bot commented Jun 16, 2026

Uh oh!

rwgk commented Jun 30, 2026

Uh oh!

rwgk commented Jun 30, 2026

Uh oh!

seberg commented Jun 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

seberg commented Jun 16, 2026

Uh oh!

copy-pr-bot Bot commented Jun 16, 2026

Uh oh!

rwgk commented Jun 30, 2026

PR 2228 Findings

Findings

High: per-worker CUDA fixture replacement mutates shared pytest-run-parallel kwargs

Medium: bindings coverage misses ctx-only cuFile tests

Low: global-option check misses marker-only parallelism

pytest-run-parallel Support Assessment

Fixture Timing Clarification

Verification Performed During Review

Uh oh!

rwgk commented Jun 30, 2026

Uh oh!

seberg commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Medium: bindings coverage misses `ctx`-only cuFile tests

seberg commented Jun 30, 2026 •

edited

Loading