Skip to content

Conversation

@Rachmanino
Copy link
Collaborator

@Rachmanino Rachmanino commented Dec 23, 2025

Tasks

  • Let CI pass
  • Use pip to install NVSHMEM instead of 3rdparty repo, and modify relative env vars
  • Use python multiprocess instead of the launching script
  • Sort legacy NVSHMEM distributed examples

Summary by CodeRabbit

Release Notes

  • Documentation

    • Simplified NVSHMEM installation process with support for optional pre-built wheels as an alternative to source-based builds.
  • Chores

    • Removed distributed communication benchmark scripts and related utilities.
    • Updated build configuration to dynamically discover NVSHMEM installation paths from environment.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 23, 2025

📝 Walkthrough

Walkthrough

This PR removes multiple distributed benchmark scripts, utilities, and build infrastructure while refactoring NVSHMEM discovery to support both explicit source paths and pre-installed wheel-based configurations. The changes eliminate AG-GEMM, all-gather, all-to-all, GEMM-RS, and reduce-scatter benchmark implementations, along with IPC benchmarks and related documentation.

Changes

Cohort / File(s) Summary
Distributed Benchmarks
benchmark/distributed/benchmark_ag_gemm.py, benchmark/distributed/benchmark_all_gather.py, benchmark/distributed/benchmark_all_to_all.py, benchmark/distributed/benchmark_gemm_rs.py, benchmark/distributed/benchmark_reduce_scatter.py
Removed five comprehensive distributed benchmark modules implementing AG-GEMM, all-gather, all-to-all, GEMM with reduce-scatter, and reduce-scatter operations across Torch, Triton-dist, and TileLang-dist backends with kernel implementations, distributed initialization, and correctness validation.
IPC Benchmarks
benchmark/distributed/ipc_impls/benchmark_nvshmem_p2p.py, benchmark/distributed/ipc_impls/benchmark_unrolledcp_p2p.py, benchmark/distributed/ipc_impls/README.md
Removed NVSHMEM and TileLang-based point-to-point IPC bandwidth benchmarks and associated documentation describing push/pull operations and unrolled-copy approaches.
Distributed Utilities
benchmark/distributed/utils.py, benchmark/distributed/README.md
Removed AllToAllContext class for managing shared memory buffers and distributed communication setup, dtype mapping utilities, and benchmark directory documentation.
Build Configuration
tilelang/distributed/build_nvshmem.sh
Removed the NVSHMEM build automation script that handled source download, extraction, CMake configuration, and compilation with feature flags.
NVSHMEM CMake Integration
tilelang/distributed/pynvshmem/CMakeLists.txt
Refactored to dynamically discover NVSHMEM include and library paths via Python environment variables (NVSHMEM_INCLUDE_DIRS, NVSHMEM_LIB_DIR) instead of relying on find_package, enabling better compatibility with pre-installed wheels.
NVSHMEM Setup & Discovery
tilelang/distributed/pynvshmem/setup.py, tilelang/env.py
Added symlink creation utility and wheel-based NVSHMEM discovery fallback; when NVSHMEM_SRC is unset, now discovers NVIDIA NVSHMEM wheel via importlib and extracts include/lib paths automatically with warning. Modified environment path derivation to support both source-based and wheel-based configurations.
Engine & Compiler
tilelang/engine/lower.py
Updated NVSHMEM path resolution to source from environment variables (NVSHMEM_INCLUDE_DIR, NVSHMEM_LIB_PATH) instead of constructing from NVSHMEM_SRC.
Documentation
docs/get_started/Installation.md, examples/distributed/README.md
Updated NVSHMEM installation instructions to support pre-installed wheels and environment variable configuration; removed distributed examples documentation describing build prerequisites and setup steps.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • chengyupku
  • tzj-fxz

Poem

🐰 The benchmarks hop away with a fond farewell,
While wheels roll in where sources once did dwell,
NVSHMEM now finds its path with grace,
No build scripts needed—discovery takes their place! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[Refactor] Refactor NVSHMEM dependency' directly describes the main objective of the PR, which is refactoring how NVSHMEM dependencies are managed (moving from 3rdparty repo to pip installation and updating environment variables).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tilelang/engine/lower.py (1)

91-96: Critical: Missing comma causes incorrect linker flags concatenation.

Line 94-95 is missing a comma between the linker library flags and -rdc=true. This will concatenate the strings into "-lnvshmem_host -lnvshmem_device-rdc=true" instead of being separate options, causing compilation failures.

🐛 Proposed fix
             options += [
                 "-I" + nvshmem_include_path,
                 "-L" + nvshmem_lib_path,
-                "-lnvshmem_host -lnvshmem_device"
-                "-rdc=true",
+                "-lnvshmem_host",
+                "-lnvshmem_device",
+                "-rdc=true",
             ]
docs/get_started/Installation.md (1)

71-75: Incorrect path and LD_LIBRARY_PATH instructions.

  1. The path ./pynvshmem appears incorrect—based on the repository structure, it should be ./tilelang/distributed/pynvshmem.
  2. LD_LIBRARY_PATH uses $NVSHMEM_SRC which won't be set when using the wheel fallback. Consider documenting both scenarios.
📝 Proposed fix
 You also need to install the `pynvshmem` package, which provides wrapped host-side Python API for NVSHMEM.

 ```bash
-cd ./pynvshmem
+cd ./tilelang/distributed/pynvshmem
 python setup.py install
-export LD_LIBRARY_PATH="$NVSHMEM_SRC/build/src/lib:$LD_LIBRARY_PATH"

+If using NVSHMEM built from source:
+bash +export LD_LIBRARY_PATH="$NVSHMEM_SRC/build/src/lib:$LD_LIBRARY_PATH" +
+
+If using the default NVSHMEM wheel, the library path is automatically resolved.

</details>

</blockquote></details>

</blockquote></details>
🤖 Fix all issues with AI agents
In `@tilelang/env.py`:
- Line 268: The assertion uses an unnecessary f-string: update the assert
statement that checks nvshmem_wheel_path.exists() to use a normal string for the
message (remove the leading 'f' before the quoted message) so it reads as a
plain assertion message for nvshmem_wheel_path.exists().
🧹 Nitpick comments (7)
tilelang/engine/lower.py (1)

89-98: Redundant NVSHMEM_SRC check.

The check at line 90 is redundant—if env.NVSHMEM_SRC were falsy, line 74 would have already raised the error. Consider removing the second conditional or consolidating the logic.

♻️ Proposed simplification
     if env.USE_DISTRIBUTED:
-        if env.NVSHMEM_SRC:
-            options += [
-                "-I" + nvshmem_include_path,
-                "-L" + nvshmem_lib_path,
-                "-lnvshmem_host",
-                "-lnvshmem_device",
-                "-rdc=true",
-            ]
-        else:
-            raise ValueError("NVSHMEM_SRC is not set")
+        options += [
+            "-I" + nvshmem_include_path,
+            "-L" + nvshmem_lib_path,
+            "-lnvshmem_host",
+            "-lnvshmem_device",
+            "-rdc=true",
+        ]
tilelang/env.py (1)

262-271: Add error handling for missing nvidia-nvshmem-cu12 wheel.

If USE_DISTRIBUTED is enabled but neither NVSHMEM_SRC is set nor the nvidia-nvshmem-cu12 wheel is installed, distribution() will raise PackageNotFoundError without a helpful message.

♻️ Proposed fix with clearer error handling
         else:
             # installed from wheel
-            from importlib.metadata import distribution
+            from importlib.metadata import distribution, PackageNotFoundError
 
-            root = pathlib.Path(distribution("nvidia-nvshmem-cu12").locate_file(""))
+            try:
+                root = pathlib.Path(distribution("nvidia-nvshmem-cu12").locate_file(""))
+            except PackageNotFoundError:
+                raise RuntimeError(
+                    "NVSHMEM_SRC is not set and nvidia-nvshmem-cu12 wheel is not installed. "
+                    "Please either set NVSHMEM_SRC or install: pip install nvidia-nvshmem-cu12"
+                ) from None
             nvshmem_wheel_path = root / "nvidia" / "nvshmem"
tilelang/distributed/pynvshmem/setup.py (3)

101-106: Consider catching a more specific exception.

Catching bare Exception can mask unexpected errors. For symlink creation, typical failures are OSError (permission denied, filesystem issues) or FileExistsError.

♻️ Proposed fix
         try:
             # Create the symlink
             symlink_path.symlink_to(target)
             print(f"Created symlink: {symlink_path} -> {target}")
-        except Exception as e:
+        except OSError as e:
             print(f"Warning: Could not create symlink {symlink_path}: {e}")

93-99: Non-deterministic version selection when multiple versioned libraries exist.

The comment mentions "latest if multiple" but glob() doesn't guarantee ordering. If multiple versions exist (e.g., libnvshmem_host.so.2, libnvshmem_host.so.3), selection is arbitrary.

♻️ Proposed fix to select the latest version
         # Find the versioned library file
         versioned_libs = list(lib_path.glob(pattern))
         if not versioned_libs:
             continue
 
-        # Use the first match (or latest if multiple)
-        target = versioned_libs[0].name
+        # Use the latest version if multiple exist
+        target = sorted(versioned_libs)[-1].name

110-117: Potential None values passed to build system.

If USE_DISTRIBUTED is not enabled in the environment, env.NVSHMEM_INCLUDE_DIR and env.NVSHMEM_LIB_PATH will be None, causing the build to fail with an unclear error. Consider adding validation.

♻️ Proposed fix
 `@pathlib_wrapper`
 def nvshmem_deps():
     # Ensure symlinks exist before returning dependencies
     ensure_nvshmem_symlinks()
 
+    if env.NVSHMEM_INCLUDE_DIR is None or env.NVSHMEM_LIB_PATH is None:
+        raise RuntimeError(
+            "NVSHMEM paths not configured. Ensure TILELANG_USE_DISTRIBUTED=1 is set "
+            "and either NVSHMEM_SRC is set or nvidia-nvshmem-cu12 is installed."
+        )
+
     include_dirs = [env.NVSHMEM_INCLUDE_DIR]
     library_dirs = [env.NVSHMEM_LIB_PATH]
     libraries = ["nvshmem_host", "nvshmem_device"]
     return include_dirs, library_dirs, libraries
tilelang/distributed/pynvshmem/CMakeLists.txt (1)

47-57: Add error handling for NVSHMEM path discovery.

If TILELANG_USE_DISTRIBUTED is not enabled, the Python command will output None as strings, causing find_library to fail with an unclear error. Consider adding validation.

♻️ Proposed fix
 # find NVSHMEM from tilelang.env
 execute_process(
   COMMAND ${PYTHON_EXECUTABLE} "-c"
-          "from tilelang.env import env; print(env.NVSHMEM_INCLUDE_DIR); print(env.NVSHMEM_LIB_PATH)"
-  OUTPUT_VARIABLE NVSHMEM_PATHS)
+          "from tilelang.env import env; print(env.NVSHMEM_INCLUDE_DIR or ''); print(env.NVSHMEM_LIB_PATH or '')"
+  OUTPUT_VARIABLE NVSHMEM_PATHS
+  RESULT_VARIABLE _NVSHMEM_SUCCESS)
+if(NOT _NVSHMEM_SUCCESS MATCHES 0)
+  message(FATAL_ERROR "Failed to get NVSHMEM paths from tilelang.env")
+endif()
 string(REGEX REPLACE "\n" ";" NVSHMEM_PATHS "${NVSHMEM_PATHS}")
 list(GET NVSHMEM_PATHS 0 NVSHMEM_INCLUDE_DIR)
 list(GET NVSHMEM_PATHS 1 NVSHMEM_LIB_DIR)
+if(NOT NVSHMEM_INCLUDE_DIR OR NOT NVSHMEM_LIB_DIR)
+  message(FATAL_ERROR "NVSHMEM paths not configured. Set TILELANG_USE_DISTRIBUTED=1 and ensure NVSHMEM is available.")
+endif()
 set(NVSHMEM_INCLUDE_DIRS ${NVSHMEM_INCLUDE_DIR})
docs/get_started/Installation.md (1)

61-68: Consider restructuring for clarity.

The current flow could be clearer. Consider organizing as two distinct options:

📝 Suggested restructure
-Before running the examples using NVSHMEM APIs (e.g., [example_allgather.py](../../examples/distributed/example_allgather.py)), you need to install NVSHMEM dependencies.
-You can set environment variable `NVSHMEM_SRC` to force using NVSHMEM built from source.
+Before running examples using NVSHMEM APIs (e.g., [example_allgather.py](../../examples/distributed/example_allgather.py)), you need to install NVSHMEM dependencies.
+
+**Option 1: Use pre-installed NVSHMEM wheel (recommended)**
+
+No additional setup required—the default `nvidia-nvshmem-cu12` wheel will be used automatically.
 
-```bash 
+**Option 2: Use NVSHMEM built from source**
+
+Set the environment variable to point to your custom NVSHMEM directory:
+```bash
 export NVSHMEM_SRC="your_custom_nvshmem_dir"

-You can also skip this and use the default pre-installed NVSHMEM wheel.

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between eccbd61c37671beccb5208e65088a49987496dc2 and 9f96ff2ad60cd7b9e60875278fdae3a049e8b483.

</details>

<details>
<summary>📒 Files selected for processing (30)</summary>

* `benchmark/distributed/README.md`
* `benchmark/distributed/benchmark_ag_gemm.py`
* `benchmark/distributed/benchmark_all_gather.py`
* `benchmark/distributed/benchmark_all_to_all.py`
* `benchmark/distributed/benchmark_gemm_rs.py`
* `benchmark/distributed/benchmark_reduce_scatter.py`
* `benchmark/distributed/ipc_impls/README.md`
* `benchmark/distributed/ipc_impls/benchmark_nvshmem_p2p.py`
* `benchmark/distributed/ipc_impls/benchmark_unrolledcp_p2p.py`
* `benchmark/distributed/utils.py`
* `docs/get_started/Installation.md`
* `examples/distributed/README.md`
* `examples/distributed/nvshmem_legacy/example_all_to_all.py`
* `examples/distributed/nvshmem_legacy/example_allgather.py`
* `examples/distributed/nvshmem_legacy/example_allgather_gemm.py`
* `examples/distributed/nvshmem_legacy/example_cannon.py`
* `examples/distributed/nvshmem_legacy/example_nvshmem.py`
* `examples/distributed/nvshmem_legacy/example_overlapping_allgather.py`
* `examples/distributed/nvshmem_legacy/example_post_attn_all2all_transpose.py`
* `examples/distributed/nvshmem_legacy/example_pre_attn_all2all.py`
* `examples/distributed/nvshmem_legacy/example_pre_attn_all2all_transpose.py`
* `examples/distributed/nvshmem_legacy/example_simple_shift.py`
* `examples/distributed/nvshmem_legacy/example_summa.py`
* `examples/distributed/nvshmem_legacy/gemm_rs_utils.py`
* `testing/python/distributed/test_tilelang_language_ldst_options.py`
* `tilelang/distributed/build_nvshmem.sh`
* `tilelang/distributed/pynvshmem/CMakeLists.txt`
* `tilelang/distributed/pynvshmem/setup.py`
* `tilelang/engine/lower.py`
* `tilelang/env.py`

</details>

<details>
<summary>💤 Files with no reviewable changes (12)</summary>

* tilelang/distributed/build_nvshmem.sh
* benchmark/distributed/README.md
* benchmark/distributed/ipc_impls/benchmark_unrolledcp_p2p.py
* benchmark/distributed/benchmark_ag_gemm.py
* benchmark/distributed/ipc_impls/benchmark_nvshmem_p2p.py
* benchmark/distributed/utils.py
* examples/distributed/README.md
* benchmark/distributed/benchmark_reduce_scatter.py
* benchmark/distributed/benchmark_all_gather.py
* benchmark/distributed/benchmark_all_to_all.py
* benchmark/distributed/benchmark_gemm_rs.py
* benchmark/distributed/ipc_impls/README.md

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🪛 LanguageTool</summary>

<details>
<summary>docs/get_started/Installation.md</summary>

[style] ~68-~68: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...he default pre-installed NVSHMEM wheel. You also need to install the `pynvshmem` pa...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

</details>

</details>
<details>
<summary>🪛 Ruff (0.14.11)</summary>

<details>
<summary>tilelang/distributed/pynvshmem/setup.py</summary>

105-105: Do not catch blind exception: `Exception`

(BLE001)

</details>
<details>
<summary>tilelang/env.py</summary>

268-268: f-string without any placeholders

Remove extraneous `f` prefix

(F541)

</details>

</details>

</details>

<details>
<summary>🔇 Additional comments (2)</summary><blockquote>

<details>
<summary>tilelang/env.py (1)</summary><blockquote>

`255-274`: **LGTM on the NVSHMEM path discovery logic.**

The dual-path approach (source build vs. wheel fallback) is well-structured. The paths for source builds (`build/src/include`, `build/src/lib`) align with typical NVSHMEM build outputs.

</blockquote></details>
<details>
<summary>tilelang/distributed/pynvshmem/CMakeLists.txt (1)</summary><blockquote>

`70-75`: **LGTM on the target linking changes.**

Using CMake variables for library paths is cleaner and more maintainable than hardcoding.

</blockquote></details>

</blockquote></details>

<sub>✏️ Tip: You can disable this entire section by setting `review_details` to `false` in your review settings.</sub>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

root = pathlib.Path(distribution("nvidia-nvshmem-cu12").locate_file(""))
nvshmem_wheel_path = root / "nvidia" / "nvshmem"
logger.warning(NVSHMEM_USE_DEFAULT_WHEEL_MESSAGE % nvshmem_wheel_path)
assert nvshmem_wheel_path.exists(), f"NVSHMEM wheel path does not exist"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove extraneous f prefix from string.

This f-string has no placeholders. Remove the f prefix.

🐛 Proposed fix
-            assert nvshmem_wheel_path.exists(), f"NVSHMEM wheel path does not exist"
+            assert nvshmem_wheel_path.exists(), "NVSHMEM wheel path does not exist"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert nvshmem_wheel_path.exists(), f"NVSHMEM wheel path does not exist"
assert nvshmem_wheel_path.exists(), "NVSHMEM wheel path does not exist"
🧰 Tools
🪛 Ruff (0.14.11)

268-268: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
In `@tilelang/env.py` at line 268, The assertion uses an unnecessary f-string:
update the assert statement that checks nvshmem_wheel_path.exists() to use a
normal string for the message (remove the leading 'f' before the quoted message)
so it reads as a plain assertion message for nvshmem_wheel_path.exists().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants