-
Notifications
You must be signed in to change notification settings - Fork 9
feat(kernelgen): import NKIPyKernelGen as a subfolder #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
shaojiex-aws
wants to merge
1
commit into
aws-neuron:feat/kernelgen
from
shaojiex-aws:feat/kernelgen
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| --- | ||
| name: build_nkipykernelgen | ||
| description: Rebuild NKIPyKernelGen (C++ passes and Python package) | ||
| user-invocable: true | ||
| --- | ||
|
|
||
| ## Usage | ||
|
|
||
| `/build_nkipykernelgen` | ||
|
|
||
| ## Instructions | ||
|
|
||
| Run the build script. Use `bash` (not `sh`) since it uses `source`. Use a timeout of 300000ms. | ||
|
|
||
| ```bash | ||
| bash .claude/skills/build_nkipykernelgen/scripts/build.sh | ||
| ``` | ||
|
|
||
| Note: Run this from the NKIPyKernelGen repo root. | ||
|
|
||
| ## Important | ||
|
|
||
| `pip install -e .` builds BOTH the C++ passes (nkipy-opt binary) AND the Python package in one step. There is NO need to run cmake separately — the pyproject.toml build system handles the full C++ compilation via cmake internally. | ||
12 changes: 12 additions & 0 deletions
12
kernelgen/.claude/skills/build_nkipykernelgen/scripts/build.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| #!/bin/bash | ||
| # Rebuild NKIPyKernelGen (C++ passes and Python package). | ||
| set -e | ||
|
|
||
| # Derive repo root from script location: scripts/ -> build_nkipykernelgen/ -> skills/ -> .claude/ -> repo root | ||
| REPO_ROOT="$(cd "$(dirname "$0")/../../../.." && pwd)" | ||
|
|
||
| cd "$REPO_ROOT" | ||
|
|
||
| echo "=== Rebuilding NKIPyKernelGen ===" | ||
| pip install -e . 2>&1 | tail -5 | ||
| echo "=== Build complete ===" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,121 @@ | ||
| --- | ||
| name: debug_nisa_ir | ||
| description: Debug NISA MLIR that fails BIRSim. Creates a debug case under tests/debug/ with buggy.mlir, kernel.py, iterative fixes, and a README proposing compiler pass changes. | ||
| user-invocable: true | ||
| --- | ||
|
|
||
| ## Usage | ||
|
|
||
| `/debug_nisa_ir <bug_name> [kernel.py path] [buggy NISA MLIR path or inline]` | ||
|
|
||
| - `bug_name`: Short snake_case name for the debug case (e.g., `rope_partition_oob`) | ||
| - `kernel.py path`: Path to the Python source that was fed into `nkipy_opt`. If omitted, ask the user. | ||
| - `buggy NISA MLIR`: Path to the `.mlir` file that `nkipy_opt` produced, or the user may paste it inline. If omitted, ask the user. | ||
|
|
||
| ## Instructions | ||
|
|
||
| You are debugging a NISA-level MLIR kernel that `nkipy_opt` generated but that fails BIRSim verification or produces incorrect numerical results. Follow this systematic workflow. | ||
|
|
||
| ### Step 1: Set up the debug case directory | ||
|
|
||
| Create `tests/debug/<bug_name>/` with: | ||
|
|
||
| ``` | ||
| tests/debug/<bug_name>/ | ||
| kernel.py # Copy of the input Python kernel | ||
| buggy.mlir # The failing NISA MLIR from nkipy_opt | ||
| README.md # Will be populated in Step 6 | ||
| ``` | ||
|
|
||
| Copy the user-provided `kernel.py` and `buggy.mlir` into this directory. Ensure `kernel.py` contains a function whose name matches the `sym_name` in the MLIR (this is required by `run_sim.py`). | ||
|
|
||
| ### Step 2: Reproduce the failure | ||
|
|
||
| Run the buggy MLIR through BIRSim: | ||
|
|
||
| ```bash | ||
| cd tests/debug && source ./run.sh <bug_name>/buggy.mlir | ||
| ``` | ||
|
|
||
| Record the exact error output. Common failure modes: | ||
| - **BIR verification error**: `Invalid access of N partitions starting at partition M` or `Access pattern out of bounds` | ||
| - **BIRSim runtime error**: `NCC_ISIM*` errors (e.g., uninitialized PSUM read) | ||
| - **Numerical mismatch**: `SIMULATION FAILED (max_diff=...)` -- BIRSim runs but output doesn't match kernel.py | ||
|
|
||
| ### Step 3: Analyze the bug | ||
|
|
||
| Read the MLIR carefully and identify the root cause. Common patterns: | ||
|
|
||
| 1. **Multi-partition SBUF with vector engine**: `tensor_tensor_arith` (engine=vector) reading from a loop-indexed partition of a multi-partition SBUF tensor. The vector engine processes all 128 partitions simultaneously and cannot address partition N selectively. | ||
|
|
||
| 2. **Wrong reshape/transpose lowering**: Column-by-column transposes that conflate head and head_dim dimensions. Often manifests as `<128|2>` tile on a dim of size 2 (OOB), or silent numerical corruption. | ||
|
|
||
| 3. **Missing accumulate flags**: Matmul K-loops without `psum_accumulate_flags`, causing PSUM overwrite instead of accumulate. | ||
|
|
||
| 4. **SBUF OOM**: Too many live SBUF tensors. Check if intermediates can be fused or freed earlier. | ||
|
|
||
| Focus on understanding: | ||
| - Which MLIR lines are problematic (cite line numbers) | ||
| - What the pass *intended* to generate vs what it actually generated | ||
| - Why the hardware rejects it (BIR rules violated) | ||
|
|
||
| ### Step 4: Create iterative fixes | ||
|
|
||
| For each fix attempt, create a new MLIR file: | ||
|
|
||
| ``` | ||
| fix_<number>_<what_was_fixed>.mlir | ||
| ``` | ||
|
|
||
| For example: | ||
| - `fix_01_fuse_rope_elementwise.mlir` | ||
| - `fix_02_reshape_head_granularity.mlir` | ||
|
|
||
| Edit the MLIR by hand to correct the identified issue. Then run: | ||
|
|
||
| ```bash | ||
| cd tests/debug && source ./run.sh <bug_name>/fix_01_<description>.mlir | ||
| ``` | ||
|
|
||
| If it still fails, analyze the new error, create another fix file, and iterate. Keep each attempt as a separate file so the progression is visible. | ||
|
|
||
| ### Step 5: Verify the final fix | ||
|
|
||
| The last `fix_*.mlir` should produce: | ||
|
|
||
| ``` | ||
| BIRSim PASSED | ||
| SIMULATION PASSED | ||
| ``` | ||
|
|
||
| Confirm that the numerical output matches `kernel.py` within tolerance (atol=1e-2, rtol=1e-2). | ||
|
|
||
| ### Step 6: Write the README | ||
|
|
||
| Create `tests/debug/<bug_name>/README.md` documenting: | ||
|
|
||
| 1. **Overview**: One paragraph summarizing what `buggy.mlir` is (which kernel, what it does) and what goes wrong. | ||
|
|
||
| 2. **How to reproduce**: The exact `source ../run.sh` commands for buggy and fixed versions. | ||
|
|
||
| 3. **Bug analysis**: For each bug found: | ||
| - **Symptom**: The exact error message | ||
| - **Location in MLIR**: Line numbers and what the code does | ||
| - **What happens**: Why the hardware rejects it or produces wrong results | ||
| - **Fix**: What was changed in the MLIR (with code snippets) | ||
|
|
||
| 4. **Root cause summary**: Table mapping each bug to the compiler pass responsible and whether it causes a compilation error or silent corruption. | ||
|
|
||
| 5. **Proposed compiler pass fixes**: For each bug, describe: | ||
| - Which pass to fix (e.g., `simplify-linalg`, `linalg-to-nisa`, tiling) | ||
| - The root cause *in the pass* (not just the MLIR symptom) | ||
| - A concrete proposed change (pseudocode or description of the algorithm change) | ||
|
|
||
| Use the format from existing debug cases (see `tests/debug/qwen3_layer/README.md` for reference). | ||
|
|
||
| ### Tips | ||
|
|
||
| - The debug harness (`run.sh` / `run_sim.py`) automatically sets up the NKI environment, generates random inputs (seed=42), compiles to NEFF with BIRSim, and compares against `kernel.py`. | ||
| - Artifacts (NEFF, BIR) are written to `artifacts_<stem>/` next to each MLIR file (git-ignored). | ||
| - When editing MLIR, keep changes minimal and targeted. Change only the ops/loops related to the bug. | ||
| - If you're unsure which pass generated a problematic pattern, check the pass pipeline in `nkipy_opt` or ask the user. |
28 changes: 28 additions & 0 deletions
28
kernelgen/.claude/skills/run_nkipykernelgen_tests/SKILL.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| --- | ||
| name: run_nkipykernelgen_tests | ||
| description: Run NKIPyKernelGen tests (without rebuilding) | ||
| user-invocable: true | ||
| --- | ||
|
|
||
| ## Usage | ||
|
|
||
| `/run_nkipykernelgen_tests [scope]` | ||
|
|
||
| Where `scope` is: `all` (default), `passes`, `e2e`, or a specific path like `passes/infer_layout` or `e2e/nkipy_tests`. | ||
|
|
||
| ## Instructions | ||
|
|
||
| 1. Run the script at `~/.claude/skills/run_nkipykernelgen_tests/scripts/run_tests.sh` with the requested scope as the argument. Use `bash` to invoke it (not `sh`) since it uses `source`. Use a timeout of 600000ms. | ||
|
|
||
| ```bash | ||
| bash .claude/skills/run_nkipykernelgen_tests/scripts/run_tests.sh <scope> | ||
| ``` | ||
|
|
||
| Note: Run this from the NKIPyKernelGen repo root. | ||
|
|
||
| 2. The script saves full test output to `/tmp/nkipykernelgen_test_results.txt`. After the script finishes, use the Read tool to read that file for the complete results. This avoids context window issues with long test output. | ||
|
|
||
| 3. When reporting results, summarize: | ||
| - Total passed/failed/xfailed/xpassed/skipped counts | ||
| - List any unexpected failures (FAILED, not XFAIL) | ||
| - Note any XPASS (unexpected passes) that indicate xfail markers should be removed |
36 changes: 36 additions & 0 deletions
36
kernelgen/.claude/skills/run_nkipykernelgen_tests/scripts/run_tests.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| #!/bin/bash | ||
| # Run NKIPyKernelGen tests with proper environment setup. | ||
| # Usage: run_tests.sh [scope] | ||
| # scope: all (default), passes, e2e, or a specific path like passes/infer_layout | ||
|
|
||
| SCOPE="${1:-all}" | ||
| RESULTS_FILE="/tmp/nkipykernelgen_test_results.txt" | ||
|
|
||
| # Derive repo root from script location: scripts/ -> run_nkipykernelgen_tests/ -> skills/ -> .claude/ -> repo root | ||
| REPO_ROOT="$(cd "$(dirname "$0")/../../../.." && pwd)" | ||
|
|
||
| cd "$REPO_ROOT" | ||
|
|
||
| # Run tests, capturing full output to file | ||
| echo "=== Running tests (scope: $SCOPE) ===" | ||
| echo "Results will be saved to: $RESULTS_FILE" | ||
|
|
||
| case "$SCOPE" in | ||
| all) | ||
| python -m pytest tests/ -v --tb=short 2>&1 | tee "$RESULTS_FILE" | ||
| ;; | ||
| passes) | ||
| python -m pytest tests/passes/ -v --tb=short 2>&1 | tee "$RESULTS_FILE" | ||
| ;; | ||
| e2e) | ||
| python -m pytest tests/e2e/ -v --tb=short 2>&1 | tee "$RESULTS_FILE" | ||
| ;; | ||
| *) | ||
| python -m pytest "tests/$SCOPE" -v --tb=short 2>&1 | tee "$RESULTS_FILE" | ||
| ;; | ||
| esac | ||
| EXIT_CODE=${PIPESTATUS[0]} | ||
|
|
||
| echo "" | ||
| echo "=== Full results saved to: $RESULTS_FILE ===" | ||
| exit $EXIT_CODE |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| # Override parent nkipy/.gitignore's `lib/` rule so MLIR C++ sources in | ||
| # mlir/lib/ are tracked (the parent rule is aimed at Python venv lib/ dirs). | ||
| !mlir/lib/ | ||
| !mlir/lib/** | ||
|
|
||
| # Python | ||
| __pycache__/ | ||
| *.py[cod] | ||
| *.so | ||
|
|
||
| # Distribution / packaging | ||
| build/ | ||
| dist/ | ||
| *.egg-info/ | ||
| .eggs/ | ||
| *.whl | ||
|
|
||
| # Built MLIR bindings (generated during build) | ||
| nkipy_kernelgen/_mlir/ | ||
|
|
||
| # Virtual environments | ||
| venv/ | ||
| .env | ||
|
|
||
| # Testing | ||
| .pytest_cache/ | ||
| .coverage | ||
| tests/**/outputs/ | ||
| tests/**/artifacts/ | ||
|
|
||
| # IDE | ||
| .vscode/ | ||
| .idea/ | ||
|
|
||
| # OS | ||
| .DS_Store | ||
| Thumbs.db | ||
|
|
||
| # Logs | ||
| *.log | ||
|
|
||
| # LLVM lit test outputs | ||
| .lit_test_times.txt | ||
| Output/ | ||
|
|
||
| # Compiler Explorer (cloned repo) | ||
| compiler_explorer/compiler-explorer/ |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use uv to manage build and avoid these scripts/skill