Skip to content

Conversation

@wujingyue
Copy link
Collaborator

@wujingyue wujingyue commented Jan 2, 2026

... to speed up CI and local runs

The way forward could be to reduce warmup_iters and timing_iters and move this to benchmarks/cpp so it doesn't run by default.


TEST_P(LowerCollectiveCudaAndNcclTest, Allgather) {
const auto& [msg_size_bytes, protocol_enum] = GetParam();
const int64_t kMsgSize = msg_size_bytes / sizeof(float);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at::Tensor runBenchmark(
MultiDeviceExecutor& executor,
const std::vector<c10::IValue>& inputs,
int64_t msg_size_bytes,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

if (message_size_bytes > 32LL * 1024 * 1024) {
GTEST_SKIP() << "Takes >30 seconds to run in CI: http://nv/e.)";
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skips here

}

if (message_size_bytes > 32LL * 1024 * 1024) {
GTEST_SKIP() << "Takes >5 seconds to run in CI: http://nv/e.)";
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and here

@wujingyue wujingyue requested a review from mdavis36 January 2, 2026 02:09
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 2, 2026

Greptile Summary

This PR reduces CI runtime by skipping large message size tests (128MB and 256MB) in LowerCollectiveCudaAndNcclTest. The skip logic is added to both Allgather and Broadcast tests, filtering out configurations where message_size_bytes > 32MB.

  • Skips 2 out of 5 message sizes (128MB, 256MB) for both test cases
  • Each skipped configuration would test 4 protocols (kMemcpy, kNccl, kMultimem, kBatchedMemcpy)
  • Total: 16 test configurations skipped (2 tests × 2 sizes × 4 protocols)
  • Refactors variable names from msg_size_bytes/kMsgSize to message_size_bytes/message_size for consistency
  • Contains malformed URLs in skip messages that should be corrected

Confidence Score: 4/5

  • Safe to merge after fixing the malformed URLs in skip messages
  • The change achieves its goal of reducing CI time by skipping slow-running large message size tests. The logic is sound and properly preserves test coverage for smaller sizes (2MB, 8MB, 32MB). Variable renaming improves code consistency. Minor syntax issue with incomplete URLs in skip messages needs correction before merge.
  • Fix malformed URLs in tests/cpp/test_multidevice_lower_communication_cuda.cpp at lines 197 and 265

Important Files Changed

Filename Overview
tests/cpp/test_multidevice_lower_communication_cuda.cpp Skips large message size tests (>32MB) for both Allgather and Broadcast to improve CI time, renames msg_size_bytes to message_size_bytes for consistency. Contains malformed URLs in skip messages.

Sequence Diagram

sequenceDiagram
    participant Test as Test Harness
    participant Allgather as Allgather Test
    participant Broadcast as Broadcast Test
    participant Skip as GTEST_SKIP
    
    Test->>Allgather: Execute with params (message_size, protocol)
    Allgather->>Allgather: Check message_size_bytes > 32MB?
    alt message_size > 32MB
        Allgather->>Skip: Skip test (128MB, 256MB configs)
        Note over Skip: "Takes >30 seconds in CI"
    else message_size <= 32MB
        Allgather->>Allgather: Run test (2MB, 8MB, 32MB)
    end
    
    Test->>Broadcast: Execute with params (message_size, protocol)
    Broadcast->>Broadcast: Check message_size_bytes > 32MB?
    alt message_size > 32MB
        Broadcast->>Skip: Skip test (128MB, 256MB configs)
        Note over Skip: "Takes >5 seconds in CI"
    else message_size <= 32MB
        Broadcast->>Broadcast: Run test (2MB, 8MB, 32MB)
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

}

if (message_size_bytes > 32LL * 1024 * 1024) {
GTEST_SKIP() << "Takes >30 seconds to run in CI: http://nv/e.)";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: malformed URL in skip message - http://nv/e.) appears incomplete

Suggested change
GTEST_SKIP() << "Takes >30 seconds to run in CI: http://nv/e.)";
GTEST_SKIP() << "Takes >30 seconds to run in CI";

}

if (message_size_bytes > 32LL * 1024 * 1024) {
GTEST_SKIP() << "Takes >5 seconds to run in CI: http://nv/e.)";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: malformed URL in skip message - http://nv/e.) appears incomplete

Suggested change
GTEST_SKIP() << "Takes >5 seconds to run in CI: http://nv/e.)";
GTEST_SKIP() << "Takes >5 seconds to run in CI";

@github-actions
Copy link

github-actions bot commented Jan 2, 2026

Description

  • Rename variables for clarity: msg_size_bytesmessage_size_bytes, kMsgSizemessage_size

  • Add skip conditions for large message sizes (>32MB) in Allgather and Broadcast tests

  • Skip Allgather tests taking >30 seconds and Broadcast tests taking >5 seconds in CI

  • Minor formatting improvements to break long lines for better readability

Changes walkthrough

Relevant files

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review
Incomplete skip reason URL

The skip messages reference "http://nv/e.)" which appears to be an incomplete or placeholder URL. This should be replaced with a proper reference or documentation link explaining why these specific message size thresholds were chosen.

GTEST_SKIP() << "Takes >30 seconds to run in CI: http://nv/e.)";
Inconsistent skip thresholds

Both Allgather and Broadcast tests use the same 32MB threshold but have different time estimates (>30s vs >5s). This inconsistency should be verified to ensure the thresholds are appropriate for each operation type.

  if (message_size_bytes > 32LL * 1024 * 1024) {
    GTEST_SKIP() << "Takes >30 seconds to run in CI: http://nv/e.)";
  }

  // cudaMemcpyBatchAsync requires a non-default stream
  c10::cuda::CUDAStream stream =
      c10::cuda::getStreamFromPool(/*isHighPriority=*/false);
  c10::cuda::setCurrentCUDAStream(stream);

  EnableOptionsGuard guard;
  setupProtocolOptions(protocol_enum, guard);

  auto fusion = std::make_unique<Fusion>();
  FusionGuard fg(fusion.get());

  const auto num_devices = communicator_->size();
  TensorView* in = makeContigTensor(2);
  TensorView* out = set(in);
  fusion->addInput(in);
  fusion->addOutput(out);

  if (backend_type == CommunicatorBackend::kCuda) {
    out->setMemoryType(MemoryType::Symmetric);
  }

  auto mesh = DeviceMesh::createForNumDevices(num_devices);
  in->setDeviceMesh(mesh);
  out->setDeviceMesh(mesh);
  in->axis(0)->parallelize(ParallelType::DIDx);

  at::Tensor unsharded_tensor =
      at::randn({num_devices, message_size}, tensor_options_);
  at::Tensor in_tensor = shardTensor(unsharded_tensor, in);

  MultiDeviceExecutorParams params;
  params.lower.communicator_backend = backend_type;
  params.executor.use_allocation_cache = true;
  MultiDeviceExecutor executor(
      std::move(fusion), Communicator::getInstance(), params);

  // Run benchmark and validate correctness
  at::Tensor out_tensor = runBenchmark(
      executor,
      {in_tensor},
      message_size_bytes,
      backend_type,
      "Allgather/" + protocol_str,
      static_cast<float>(communicator_->size()));

  EXPECT_TRUE(at::allclose(out_tensor, unsharded_tensor));
}

TEST_P(LowerCollectiveCudaAndNcclTest, Broadcast) {
  const auto& [message_size_bytes, protocol_enum] = GetParam();
  const CommunicatorBackend backend_type = getBackend(protocol_enum);
  const std::string protocol_str = getProtocolString(protocol_enum);
  const int64_t message_size = message_size_bytes / sizeof(float);

  if (!communicator_->is_available() || communicator_->size() < 2) {
    GTEST_SKIP() << "This test needs at least 2 ranks.";
  }

  if (!isMulticastSupported() &&
      (protocol_enum == CommunicationProtocol::kMemcpy ||
       protocol_enum == CommunicationProtocol::kMultimem)) {
    GTEST_SKIP() << "Device does not support Multicast; skipping.";
  }

  if (message_size_bytes > 32LL * 1024 * 1024) {
    GTEST_SKIP() << "Takes >5 seconds to run in CI: http://nv/e.)";
  }

@wujingyue wujingyue changed the title Skip certain configs in LowerCollectiveCudaAndNcclTest to speed up CI Skip certain configs in LowerCollectiveCudaAndNcclTest Jan 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants