Simplify getCUDAStream #5744

wujingyue · 2026-01-02T02:27:27Z

No description provided.

wujingyue · 2026-01-02T02:27:33Z

!test

github-actions · 2026-01-02T02:28:22Z

Review updated until commit b9b2c19

Description

Simplify getCUDAStream implementation using try_emplace and removing explicit device index
Add new AssignStreams pass for stream parallelism in host IR
Remove HostIrLowering option from C++ StreamTest and move end-to-end tests to Python
Clean up includes and remove unused dependencies across multiple files

Changes walkthrough

Relevant files

Enhancement

7 files

assign_streams.cpp `Implement AssignStreams pass for stream parallelism`	+63/-0
evaluator.cpp `Simplify getCUDAStream with try_emplace`	+7/-14
passes.cpp `Add AssignStreams to optimization passes`	+2/-0
allocate_and_deallocate.h `Remove unnecessary include`	+0/-1
assign_streams.h `Define AssignStreams optimization pass`	+26/-0
ir.h `Remove scheduler heuristic include`	+0/-1
communicator.h `Remove exceptions include`	+0/-1

Tests

3 files

test_multidevice_lower_communication_cuda.cpp `Remove large memory size test cases`	+1/-3
test_stream.cpp `Remove HostIrLowering option and add documentation`	+4/-6
test_stream.py `Remove nvfuser_direct_test parameter from tests`	+3/-3

Documentation

1 files

internal_nodes.h `Add documentation comment for Scope::insert`	+1/-1

Configuration changes

1 files

CMakeLists.txt `Add new source files to build`	+2/-1

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review
Stream Creation Simplification The getCUDAStream method has been significantly simplified by using try_emplace and removing explicit device index specification when creating streams. This changes the stream creation behavior from explicitly specifying device index to using the default device. While this simplifies the code, it should be verified that this doesn't break multi-GPU scenarios where streams need to be created on specific devices. c10::cuda::CUDAStream HostIrEvaluator::getCUDAStream(Stream* stream) { StreamKey stream_key = stream; // if stream points to an index, it represents the dynamic value of that index if (Val* index = stream->index(); index != nullptr) { auto value = expr_evaluator_.evaluate(index); NVF_ERROR(value.hasValue() && value.is<int64_t>()); stream_key = value.as<int64_t>(); } auto [it, inserted] = streams_.try_emplace(stream_key, c10::cuda::getStreamFromPool()); return it->second; } Reduced Test Coverage Test parameters were reduced from 4 values (2MB, 8MB, 32MB, 128MB, 256MB) to 3 values (2MB, 8MB, 32MB), removing the 128MB and 256MB test cases. This reduces test coverage for larger memory sizes and should be justified or the removed test cases should be restored if they provide important coverage. INSTANTIATE_TEST_SUITE_P( , LowerCollectiveCudaAndNcclTest, testing::Combine( testing::Values( 2 * 1024 * 1024LL, // 2 MB 8 * 1024 * 1024LL, // 8 MB 32 * 1024 * 1024LL // 32 MB ), testing::Values( CommunicationProtocol::kMemcpy, CommunicationProtocol::kNccl,

It's never used by these tests.

Simplify getCUDAStream

50ce473

wujingyue added 8 commits January 1, 2026 19:45

New pass

df27d0b

WIP

5f1fef9

Minor

db4ef53

StreamTest no longer needs the HostIrLowering knob.

b39eca3

WIP

7ae684d

Remove dependency on nvfuser_direct_test

f9b986e

It's never used by these tests.

WIP

1e53c7c

Minor cleanup to Communicator

b9b2c19

wujingyue force-pushed the wjy/stream branch from ebe950f to b9b2c19 Compare January 2, 2026 22:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplify getCUDAStream #5744

Simplify getCUDAStream #5744

wujingyue commented Jan 2, 2026

Uh oh!

wujingyue commented Jan 2, 2026

Uh oh!

github-actions bot commented Jan 2, 2026 •

edited

Loading

Changes walkthrough

PR Reviewer Guide

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Simplify getCUDAStream #5744

Are you sure you want to change the base?

Simplify getCUDAStream #5744

Conversation

wujingyue commented Jan 2, 2026

Uh oh!

wujingyue commented Jan 2, 2026

Uh oh!

github-actions bot commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Jan 2, 2026 •

edited

Loading