Open
Conversation
c41a2c5 to
fce9ca5
Compare
fce9ca5 to
8a8ea81
Compare
ipanfilo
reviewed
Mar 26, 2026
| --- | ||
| name: ifu-merge | ||
| description: > | ||
| Guide for performing IFU (Internal Feature Update) merges on the TransformerEngine ROCm fork. |
Collaborator
There was a problem hiding this comment.
IFU stands for intergrate from upstream
| - Preprocessor guards (`#ifndef USE_ROCM`, `#ifdef __HIP_PLATFORM_AMD__`). This means adding guards to source `.cpp` files will propagate into the generated `_hip.cpp` output. Use this to exclude CUDA-only code paths from ROCm builds. | ||
|
|
||
| **Rules that follow:** | ||
| - Never edit `*_hip.cpp` or `.hip` files — they are regenerated from source files |
Collaborator
There was a problem hiding this comment.
We have one exception of .hip file in the repo. Maybe we can rename it for consistency
| |---|---|---| | ||
| | PyTorch CSRC (`.cpp` source files) | `#ifdef USE_ROCM` / `#ifndef USE_ROCM` | DeviceGuard, scale swizzling | | ||
| | Common layer (`.cu` files that get hipified) | `#ifdef __HIP_PLATFORM_AMD__` | Warp masks, kernel dispatch | | ||
| | Python code | `IS_HIP_EXTENSION` (from `torch.utils.cpp_extension`) | Workspace sizing, feature flags | |
Collaborator
There was a problem hiding this comment.
Also guard for JAX Python code
| git diff <rocm-parent>..<upstream-parent> --stat | ||
|
|
||
| # Check for removed guards | ||
| git diff <rocm-parent>..<upstream-parent> -- <file> | grep -E "^-.*(__HIP_PLATFORM_AMD__|USE_ROCM|IS_HIP_EXTENSION)" |
Collaborator
There was a problem hiding this comment.
I would also add "ROCm" and "upstream" - those are comments that indicate some changes made by us
|
|
||
| 5. **Convention Changes**: Upstream changes a data format, tensor shape, or API contract without any code conflict. Every downstream consumer of that convention must be updated manually — the compiler won't catch these. | ||
|
|
||
| **How to systematically audit:** |
Collaborator
There was a problem hiding this comment.
Should "what to pay attention to" points be here?
We have two big semantic differences:
__shfl vs __shfl_sync and other lane communication built-ins
fp8 data types: i.e. torch.float8_e4m3fn vs get_torch_e4m3_type, etc.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Includes an initial addition of repository-level AI agent instructions/context via
CLAUDE.mdas well as example skills via.claude/**/SKILL.md. This mainly serves as a demonstration of how to add additional context to AI coding agents, as well as how to develop a reasonably-complex skill.TODO: Back-test against old cases and refine as needed
Type of change
Changes
Please list the changes introduced in this PR:
CLAUDE.md.claude/ck-debugging/SKILL.md.claude/ifu-merge/SKILL.mdChecklist: