Add incore-profiling skill: msprof op-simulator kernel profiler by Hzfengsy · Pull Request #1386 · hw-native-sys/pypto

Hzfengsy · 2026-05-17T08:48:21Z

Summary

Add an incore-profiling project skill: cycle-accurate, single-AI-core profiling of every kernel in a PyPTO build via the Ascend msprof op simulator.
New self-contained .claude/skills/incore-profiling/ folder — incore_profile.py (the profiler driver), a vendored generate_testcase.py harness generator with its .in templates, and SKILL.md.
Environment-agnostic: a single --target {a2a3,a5} knob plus CANN and camodel-SoC auto-discovery, so the tool runs on any Ascend setup without hardcoded paths. No PTOAS source checkout required (the generator is vendored).

Testing

Full 17/17 kernel export verified on a Qwen3 decode build (--target a2a3, CANN / SoC / arch all auto-resolved).
All pre-commit hooks pass — ruff, ruff-format, pyright, check-headers, markdownlint.

Related Issues

None.

Relocate the msprof op-simulator kernel profiler into a self-contained .claude/skills/incore-profiling/ folder: incore_profile.py plus a vendored generate_testcase.py harness generator. Add a --target {a2a3,a5} knob with CANN and camodel-SoC auto-discovery so the tool runs on any Ascend setup.

coderabbitai · 2026-05-17T08:48:37Z

📝 Walkthrough

Walkthrough

This PR introduces a complete in-core profiling skill for PyPTO kernels using CANN's msprof op simulator. It includes skill documentation, a Python orchestration script that discovers kernels and runs msprof collection, and supporting code-generation templates for golden reference generation, output comparison, and build automation.

Changes

In-Core Profiling Skill for PyPTO Kernels

Layer / File(s)	Summary
Skill Documentation `.claude/skills/incore-profiling/SKILL.md`	Explains when to use the profiling skill, lists prerequisites (CANN environment, case layout, ptoas-bin, testcase generator), provides quick-start CLI examples with `--target`/`--build-dir`/`--case`/`--func` options, describes output artifacts (trace.json, per-core CSV), troubleshoots common issues (target mismatch, missing camodel, environment paths), and documents the per-kernel workflow.
CLI Parsing and Toolchain Validation `.claude/skills/incore-profiling/incore_profile.py` (lines 617–760)	Defines argument groups for build input, toolchain selection, output naming, and profiling parameters; parses extra args after `--` for case script passthrough; validates msprof and generate_testcase availability after sourcing CANN environment; implements build directory resolution via case command or existing directory.
Environment and Camodel Discovery `.claude/skills/incore-profiling/incore_profile.py` (lines 1–187)	Establishes target-family-to-SoC mapping, auto-discovers CANN set_env.sh and camodel SoC versions, provides SoC selection with explicit override, introduces StepError for error signaling, and supplies foundational utilities for logging, path handling, environment sourcing, command execution with logging, and safe shell quoting.
Build and Output Directory Orchestration `.claude/skills/incore-profiling/incore_profile.py` (lines 189–266)	Wraps command invocation with combined stdout/stderr logging and optional exception raising; constructs run commands from `--run-cmd` or `--case` with `--run-env` prefixes; discovers existing build output directories; detects valid case-build layouts; selects newest matching build post-run.
PTOAS Source and Kernel Symbol Discovery `.claude/skills/incore-profiling/incore_profile.py` (lines 268–366)	Discovers PTOAS `.cpp` sources from explicit or default locations; extracts kernel entrypoint names via regex; detects host triplets from ASCEND libraries; constructs LD_LIBRARY_PATH with devlib and simulator paths; resolves kernel symbols from compiled libraries via `nm`/`c++filt` with preferred-name matching and detailed failure reporting.
Artifact Collection and Golden Reference `.claude/skills/incore-profiling/incore_profile.py` (lines 368–414)	Runs golden.py in case directory; locates msprof export dump directories; propagates pc_start_addr.txt when needed; collects artifact counts and first-found paths from simulator export trace/visualization directories.
Per-Function Export Pipeline `.claude/skills/incore-profiling/incore_profile.py` (lines 416–615)	Iterates over kernel sources to generate testcases, build in isolation, resolve symbols, run golden baseline, execute msprof collection, conditionally run msprof export when collection skipped artifacts, collect artifact metadata, log step failures with tails and return codes, and convert uncaught exceptions into failed results.
Main Entry Point and Orchestration `.claude/skills/incore-profiling/incore_profile.py` (lines 762–864)	Validates AICore arch defaults; decides toolchain validation scope (skipping for list-only ops); resolves build directory; discovers and filters sources; optionally lists kernel names; creates run root with pointer file; writes README with metadata; iterates export_one with per-func indexing; writes outputs after each iteration; enforces early-stop or keep-going on failure; converts StepError to exit status.
Code Generation Templates `.claude/skills/incore-profiling/vendor/templates/compare_template.py.in`, `golden_template.py.in`, `main_template.cpp.in`, `run_sh_template.sh.in`	compare_template.py.in: BF16 conversion and array comparison with ULP-distance and NaN/zero handling; generic numeric comparison via `np.allclose` with configurable tolerance; packed predicate mask exact comparison; strict/non-strict failure gating. golden_template.py.in: Python template with deterministic RNG seed and placeholder for generated input code. main_template.cpp.in: ACL-based C++ runtime with device selection, stream lifecycle, error logging, and cleanup. run_sh_template.sh.in: Build orchestration, CANN environment sourcing, simulator/NPU library path handling, CMake/Make execution, golden mode control flow (sim/npu/skip), and strict comparison enforcement.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

lyfne123

Poem

🐇 Behold! A skill to profile kernels bright,
With msprof traces shining in the night,
From source to symbols, testcases to gold,
Each cycle counted—stories to be told! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.59% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: adding an incore-profiling skill for msprof op-simulator kernel profiling, which is the primary focus of this PR.
Description check	✅ Passed	The description clearly relates to the changeset, explaining the new incore-profiling skill, its components (incore_profile.py, vendored generate_testcase.py, templates, SKILL.md), design approach (environment-agnostic with --target option), and testing verification.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a new in-core kernel profiling skill for PyPTO using the Ascend msprof op-simulator, including an automation script and several test harness templates. Feedback focused on improving the robustness of environment variable parsing to avoid shell noise and addressing security vulnerabilities related to command injection and insecure library path configurations in both the Python script and shell templates.

gemini-code-assist · 2026-05-17T08:51:23Z

+    cmd = f"source {sh_quote(str(set_env_path))} >/dev/null && env -0"
+    cp = subprocess.run(
+        ["bash", "-lc", cmd],
+        capture_output=True,
+        check=False,
+    )
+    if cp.returncode != 0:
+        raise StepError(cp.stderr.decode("utf-8", errors="replace"))
+    env = base_env.copy()
+    for item in cp.stdout.split(b"\0"):
+        if not item or b"=" not in item:
+            continue
+        key, val = item.split(b"=", 1)
+        env[key.decode()] = val.decode("utf-8", errors="replace")


Using bash -lc to source the environment can capture unexpected output from shell profiles (e.g., .bashrc or /etc/profile) into stdout. This noise can corrupt the environment parsing logic, especially if the noise contains = or if it merges with the first environment variable. Using a unique marker to separate the shell noise from the env -0 output makes the parsing more robust. Additionally, ensure that failures in the external process are handled explicitly with informative error messages to avoid silent failures and aid debugging.

Suggested change

cmd = f"source {sh_quote(str(set_env_path))} >/dev/null && env -0"

cp = subprocess.run(

["bash", "-lc", cmd],

capture_output=True,

check=False,

)

if cp.returncode != 0:

raise StepError(cp.stderr.decode("utf-8", errors="replace"))

env = base_env.copy()

for item in cp.stdout.split(b"\0"):

if not item or b"=" not in item:

continue

key, val = item.split(b"=", 1)

env[key.decode()] = val.decode("utf-8", errors="replace")

marker = b"---ENV_START---"

cmd = f"source {sh_quote(str(set_env_path))} >/dev/null && echo -n {marker.decode()} && env -0"

cp = subprocess.run(

["bash", "-lc", cmd],

capture_output=True,

check=False,

)

if cp.returncode != 0:

raise StepError(cp.stderr.decode("utf-8", errors="replace"))

if marker not in cp.stdout:

raise StepError(f"CANN set_env.sh failed or produced no environment: {set_env_path}")

env = base_env.copy()

_, env_data = cp.stdout.split(marker, 1)

for item in env_data.split(b"\0"):

if not item or b"=" not in item:

continue

key, val = item.split(b"=", 1)

env[key.decode()] = val.decode("utf-8", errors="replace")

References

When running external processes, handle cases where the process fails with a non-zero exit code but produces no output. Log an informative error message to prevent silent failures and aid debugging.

gemini-code-assist · 2026-05-17T08:51:23Z

+            if "=" not in item:
+                raise StepError(f"--run-env expects KEY=VALUE, got: {item}")
+            prefix.append(item)
+        base_cmd = " ".join(prefix + ["python", sh_quote(str(case))] + [sh_quote(x) for x in case_args])


The items in prefix (derived from --run-env) are joined into base_cmd without quoting. If a user provides an environment variable with spaces (e.g., --run-env "VAR=val with spaces"), the resulting shell command will be malformed. Quoting each item ensures the environment variables are correctly interpreted by the shell and prevents potential command injection.

Suggested change

base_cmd = " ".join(prefix + ["python", sh_quote(str(case))] + [sh_quote(x) for x in case_args])

base_cmd = " ".join([sh_quote(p) for p in prefix] + ["python", sh_quote(str(case))] + [sh_quote(x) for x in case_args])

gemini-code-assist · 2026-05-17T08:51:23Z

+
+# Improve runtime linking robustness.
+if [[ -n "${ASCEND_HOME_PATH:-}" ]]; then
+  export LD_LIBRARY_PATH="${ASCEND_HOME_PATH}/lib64:${LD_LIBRARY_PATH:-}"


Appending :${LD_LIBRARY_PATH:-} when LD_LIBRARY_PATH is empty or unset results in a trailing colon. In many shells, a trailing colon in LD_LIBRARY_PATH is interpreted as including the current directory (.), which is a security risk and can lead to unexpected behavior. Use the ${VAR:+:${VAR}} pattern to safely append the separator only when the variable is non-empty.

export LD_LIBRARY_PATH="${ASCEND_HOME_PATH}/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"

gemini-code-assist · 2026-05-17T08:51:23Z

+    "${ASCEND_HOME_PATH}/aarch64-linux/simulator/${SIM_SOC_VERSION}/lib" \
+    "${ASCEND_HOME_PATH}/simulator/${SIM_SOC_VERSION}/lib" \
+    "${ASCEND_HOME_PATH}/tools/simulator/${SIM_SOC_VERSION}/lib"; do
+    [[ -d "$d" ]] && LD_LIBRARY_PATH_SIM="$d:${LD_LIBRARY_PATH_SIM}"


Similar to the previous issue, prepending $d: when LD_LIBRARY_PATH_SIM is empty results in a trailing colon, which includes the current directory in the library search path. Use the ${VAR:+:${VAR}} pattern for safe concatenation.

[[ -d "$d" ]] && LD_LIBRARY_PATH_SIM="$d${LD_LIBRARY_PATH_SIM:+:${LD_LIBRARY_PATH_SIM}}"

Copilot

Pull request overview

Adds a new .claude project skill for profiling PyPTO PTOAS kernels with Ascend msprof op simulator, including the profiling driver, documentation, and a vendored testcase generator/templates so profiling can run without a PTOAS source checkout.

Changes:

Adds incore_profile.py to discover builds/kernels, generate standalone simulator cases, run msprof, and collect trace artifacts.
Adds SKILL.md usage documentation for the new in-core profiling workflow.
Vendors generate_testcase.py plus CMake/run/compare/golden/main templates used to build and validate simulator testcases.

Reviewed changes

Copilot reviewed 2 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`.claude/skills/incore-profiling/SKILL.md`	Documents when and how to use the new profiling skill.
`.claude/skills/incore-profiling/incore_profile.py`	Implements build discovery, toolchain setup, testcase generation, simulator profiling, and artifact collection.
`.claude/skills/incore-profiling/vendor/generate_testcase.py`	Vendored testcase generator adapted to local templates and profiling use cases.
`.claude/skills/incore-profiling/vendor/templates/compare_template.py.in`	Generates output comparison helpers for simulator/NPU validation.
`.claude/skills/incore-profiling/vendor/templates/golden_template.py.in`	Generates input/golden data creation script.
`.claude/skills/incore-profiling/vendor/templates/main_template.cpp.in`	Generates ACL host runner source for standalone kernels.
`.claude/skills/incore-profiling/vendor/templates/run_sh_template.sh.in`	Generates shell runner for building and validating standalone cases.

+        candidates += [Path(ascend_home) / "set_env.sh", Path(ascend_home).parent / "set_env.sh"]
+    for root in ("/usr/local/Ascend", str(Path.home() / "Ascend"), "/opt/Ascend"):
+        candidates += [
+            Path(root) / "ascend-toolkit/set_env.sh",
+            Path(root) / "ascend-toolkit/latest/set_env.sh",


+        for p in [ascend_home / "lib64", ascend_home / "devlib"]:
+            if p.exists():
+                parts.append(str(p))
+        for triplet in detect_host_triplets(ascend_home):
+            for p in [
+                ascend_home / triplet / "devlib",
+                ascend_home / triplet / "simulator" / soc_version / "lib",
+            ]:
+                if p.exists():
+                    parts.append(str(p))


+    if golden.shape != output.shape:
+        print(f"[ERROR] Shape mismatch: {golden_path} {golden.shape} vs {output_path} {output.shape}")
+        return False
+    if not np.allclose(golden, output, atol=eps, rtol=eps, equal_nan=True):


+# Maps a user-facing NPU target to its compile arch and the SoC-name keywords
+# used to pick a camodel simulator directory.
+TARGET_PROFILES: dict[str, dict[str, object]] = {
+    "a2a3": {"aicore_arch": "dav-c220", "soc_keywords": ("910b",)},


coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.claude/skills/incore-profiling/incore_profile.py:
- Around line 108-112: The code in discover_camodel_socs treats a missing
ASCEND_HOME_PATH as Path(""), which resolves to the current directory; change
the logic to first read the raw value = env.get("ASCEND_HOME_PATH") and if not
value (None or empty string) return [] immediately, otherwise construct
ascend_home = Path(value). Update the same pattern in the other functions that
currently use Path(env.get("ASCEND_HOME_PATH", "")) (the block around lines
326-339) so they all check for a truthy env value before creating a Path.
- Around line 683-687: The current --aicore-arch argument uses
os.environ.get("AICORE_ARCH") as its default which lets a stale env var override
the target-derived default; change the CLI handling so the add_argument for
"--aicore-arch" sets default=None (or no env fallback) and after parsing compute
args.aicore_arch from args.target using the existing target->arch mapping, only
using os.environ.get("AICORE_ARCH") as a last-resort fallback when target is not
provided; apply the same change where the other occurrence (lines around the
second add_argument) sets the default.

In @.claude/skills/incore-profiling/SKILL.md:
- Around line 88-90: The doc text claiming "`--target` must currently be passed
explicitly" is out of sync with the CLI which defaults --target to "a2a3";
update the SKILL.md text to state that --target is optional and defaults to
"a2a3" (and remains an override), or alternatively change the CLI default in the
argument parser in incore_profile.py (the add_argument for "--target" /
parse_args) to require explicit passing; pick one fix so the documentation and
the --target behavior (incore_profile.py's argument definition) match.

In @.claude/skills/incore-profiling/vendor/templates/compare_template.py.in:
- Around line 315-370: The code confuses whether cols is "predicate bit count"
or "buffer byte width" in compare_packed_pred_mask; fix by treating cols as the
number of predicate bits: compute row_bytes = ((cols + 63) // 64) * 8 (remove
the min(row_bytes, cols) cap), then set need = rows * row_bytes (instead of rows
* cols), read/reshape using that need and slice golden/output with [:row_bytes]
(golden_sel/output_sel use those bytes); update the docstring to state cols is
predicate bit count and keep variable names rows and cols, or rename cols to
num_pred_bits if you prefer.

In @.claude/skills/incore-profiling/vendor/templates/main_template.cpp.in:
- Around line 9-16: Remove the redundant copyright block: delete the second
documentation comment block (the /** ... */ block that contains the 2025 Huawei
copyright text) in main_template.cpp.in so only the original 2026 header
remains; locate the duplicate block by searching for the /** ... */ comment
containing the 2025 date and remove it, leaving the first header intact.

In @.claude/skills/incore-profiling/vendor/templates/run_sh_template.sh.in:
- Around line 122-133: The npu branch currently runs golden.py, the NPU
executable, copy_outputs_as_golden, then golden.py + executable again before
running compare.py, which makes the two runs use different regenerated inputs;
decide the intent and adjust: for determinism testing remove the second
invocation of golden.py so both runs use identical inputs (change the sequence
around the golden.py call referenced as golden.py and the second golden.py
call), or for golden-vs-test ensure copy_outputs_as_golden runs after the
initial golden.py/generation step and before regenerating inputs (i.e., move or
reorder the copy_outputs_as_golden call relative to the golden.py calls), and
add a short comment describing the chosen intent near GOLDEN_MODE=npu, RUN_MODE
check and around copy_outputs_as_golden, golden.py and compare.py invocations to
make the behavior explicit.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 18ce266c-bbf3-453c-9af5-5f68f34dd206

📥 Commits

Reviewing files that changed from the base of the PR and between 95f793f and 0e6f34b.

📒 Files selected for processing (7)

.claude/skills/incore-profiling/SKILL.md
.claude/skills/incore-profiling/incore_profile.py
.claude/skills/incore-profiling/vendor/generate_testcase.py
.claude/skills/incore-profiling/vendor/templates/compare_template.py.in
.claude/skills/incore-profiling/vendor/templates/golden_template.py.in
.claude/skills/incore-profiling/vendor/templates/main_template.cpp.in
.claude/skills/incore-profiling/vendor/templates/run_sh_template.sh.in

coderabbitai · 2026-05-17T08:54:28Z

+def discover_camodel_socs(env: dict[str, str]) -> list[str]:
+    """List simulator SoC names that ship libruntime_camodel.so in the CANN install."""
+    ascend_home = Path(env.get("ASCEND_HOME_PATH", ""))
+    if not ascend_home.is_dir():
+        return []


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Treat missing ASCEND_HOME_PATH as unset instead of current directory.

Using Path("") makes discovery/path assembly operate on . when ASCEND_HOME_PATH is absent. That can pick unintended local directories and produce invalid SoC/LD paths.

Suggested fix

def discover_camodel_socs(env: dict[str, str]) -> list[str]: """List simulator SoC names that ship libruntime_camodel.so in the CANN install.""" - ascend_home = Path(env.get("ASCEND_HOME_PATH", "")) - if not ascend_home.is_dir(): + ascend_home_raw = env.get("ASCEND_HOME_PATH") + if not ascend_home_raw: + return [] + ascend_home = Path(ascend_home_raw) + if not ascend_home.is_dir(): return [] @@ def make_ld_library_path(build_dir: Path, env: dict[str, str], soc_version: str) -> str: - ascend_home = Path(env.get("ASCEND_HOME_PATH", "")) + ascend_home_raw = env.get("ASCEND_HOME_PATH") parts = [str(build_dir)] - if ascend_home: + if ascend_home_raw: + ascend_home = Path(ascend_home_raw) for p in [ascend_home / "lib64", ascend_home / "devlib"]: if p.exists(): parts.append(str(p))

Also applies to: 326-339

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/skills/incore-profiling/incore_profile.py around lines 108 - 112, The code in discover_camodel_socs treats a missing ASCEND_HOME_PATH as Path(""), which resolves to the current directory; change the logic to first read the raw value = env.get("ASCEND_HOME_PATH") and if not value (None or empty string) return [] immediately, otherwise construct ascend_home = Path(value). Update the same pattern in the other functions that currently use Path(env.get("ASCEND_HOME_PATH", "")) (the block around lines 326-339) so they all check for a truthy env value before creating a Path.

coderabbitai · 2026-05-17T08:54:28Z

+        "--aicore-arch",
+        default=os.environ.get("AICORE_ARCH"),
+        help="AICore arch for generate_testcase.py; defaults from --target "
+        "(a2a3 -> dav-c220, a5 -> dav-c310)",
+    )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

AICORE_ARCH currently overrides target-derived defaults unexpectedly.

Help text says the arch defaults from --target, but current defaults pull from AICORE_ARCH first. If that env var is stale, kernels can be generated for the wrong arch.

Suggested fix

- tool_group.add_argument( - "--aicore-arch", - default=os.environ.get("AICORE_ARCH"), - help="AICore arch for generate_testcase.py; defaults from --target " - "(a2a3 -> dav-c220, a5 -> dav-c310)", - ) + tool_group.add_argument( + "--aicore-arch", + default=None, + help="AICore arch for generate_testcase.py; defaults from --target " + "(a2a3 -> dav-c220, a5 -> dav-c310)", + ) ... - if not args.aicore_arch: + if args.aicore_arch is None: args.aicore_arch = TARGET_PROFILES[args.target]["aicore_arch"]

Also applies to: 770-771

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/skills/incore-profiling/incore_profile.py around lines 683 - 687, The current --aicore-arch argument uses os.environ.get("AICORE_ARCH") as its default which lets a stale env var override the target-derived default; change the CLI handling so the add_argument for "--aicore-arch" sets default=None (or no env fallback) and after parsing compute args.aicore_arch from args.target using the existing target->arch mapping, only using os.environ.get("AICORE_ARCH") as a last-resort fallback when target is not provided; apply the same change where the other occurrence (lines around the second add_argument) sets the default.

coderabbitai · 2026-05-17T08:54:28Z

+`--target` must currently be passed explicitly. Once the build pipeline records
+the backend arch into the build folder, `incore_profile.py` can auto-detect it
+and `--build-dir` alone will suffice. `--target` will remain as an override.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align --target requirement text with actual CLI behavior.

This section says --target must be passed explicitly, but the CLI currently defaults --target to a2a3. Please update either the docs or argument default so they match.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/skills/incore-profiling/SKILL.md around lines 88 - 90, The doc text claiming "`--target` must currently be passed explicitly" is out of sync with the CLI which defaults --target to "a2a3"; update the SKILL.md text to state that --target is optional and defaults to "a2a3" (and remains an override), or alternatively change the CLI default in the argument parser in incore_profile.py (the add_argument for "--target" / parse_args) to require explicit passing; pick one fix so the documentation and the --target behavior (incore_profile.py's argument definition) match.

coderabbitai · 2026-05-17T08:54:28Z

+def compare_packed_pred_mask(golden_path, output_path, rows, cols):
+    """
+    Compare outputs of pto.tcmp / pto.tcmps.
+
+    These ops produce a *packed predicate mask* and do not define every byte in
+    the logical u8 tile buffer. In practice, only the first N bytes of each row
+    are meaningful (packed as 64-bit chunks). Ignore the rest to avoid flaky
+    compares caused by undefined bytes.
+    """
+    if not os.path.exists(output_path):
+        print(f"[ERROR] Output missing: {output_path}")
+        return False
+    if not os.path.exists(golden_path):
+        print(f"[ERROR] Golden missing: {golden_path}")
+        return False
+    try:
+        rows = int(rows)
+        cols = int(cols)
+    except Exception:
+        print(f"[ERROR] Invalid rows/cols for packed mask compare: rows={rows} cols={cols}")
+        return False
+    if rows <= 0 or cols <= 0:
+        print(f"[ERROR] Invalid rows/cols for packed mask compare: rows={rows} cols={cols}")
+        return False
+
+    golden = np.fromfile(golden_path, dtype=np.uint8)
+    output = np.fromfile(output_path, dtype=np.uint8)
+
+    need = rows * cols
+    if golden.size < need or output.size < need:
+        print(
+            f"[ERROR] Packed mask buffer too small: need={need} bytes, "
+            f"golden={golden.size}, out={output.size}"
+        )
+        return False
+
+    golden = golden[:need].reshape(rows, cols)
+    output = output[:need].reshape(rows, cols)
+
+    # Packed mask layout: 1 predicate bit per element, packed into 64-bit words
+    # per row (so 8 bytes per 64 columns). For cols <= 64 we still use one word.
+    row_bytes = ((cols + 63) // 64) * 8
+    row_bytes = min(row_bytes, cols)
+
+    golden_sel = golden[:, :row_bytes].reshape(-1)
+    output_sel = output[:, :row_bytes].reshape(-1)
+
+    if not np.array_equal(golden_sel, output_sel):
+        diff = np.nonzero(golden_sel != output_sel)[0]
+        idx = int(diff[0]) if diff.size else 0
+        print(
+            f"[ERROR] Mismatch (packed mask): {golden_path} vs {output_path}, first diff at idx={idx} "
+            f"(golden={int(golden_sel[idx])}, out={int(output_sel[idx])}, rows={rows}, cols={cols}, row_bytes={row_bytes})"
+        )
+        return False
+    return True


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify the rows/cols parameters in compare_packed_pred_mask.

The function signature and internal logic create ambiguity about what cols represents:

Line 343 (need = rows * cols) treats cols as the buffer byte-width per row

Line 356 (row_bytes = ((cols + 63) // 64) * 8) treats cols as the number of predicate bits

The comment on line 355 mentions "8 bytes per 64 columns" but doesn't clarify the unit

For small cols values (e.g., cols=5 predicate bits), line 356 computes row_bytes = 8 (one 64-bit word), but line 357 caps it to min(8, 5) = 5, which would under-read the packed data.

Consider either:

Renaming parameters to distinguish buffer dimensions from predicate count (e.g., buffer_cols_bytes and num_predicate_bits), or

Adding a docstring clarifying that cols must equal both the buffer byte-width and the number of predicate bits, or

Removing line 357 if cols strictly represents predicate bits and the buffer is always sized accordingly

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/skills/incore-profiling/vendor/templates/compare_template.py.in around lines 315 - 370, The code confuses whether cols is "predicate bit count" or "buffer byte width" in compare_packed_pred_mask; fix by treating cols as the number of predicate bits: compute row_bytes = ((cols + 63) // 64) * 8 (remove the min(row_bytes, cols) cap), then set need = rows * row_bytes (instead of rows * cols), read/reshape using that need and slice golden/output with [:row_bytes] (golden_sel/output_sel use those bytes); update the docstring to state cols is predicate bit count and keep variable names rows and cols, or rename cols to num_pred_bits if you prefer.

coderabbitai · 2026-05-17T08:54:28Z

+/**
+Copyright (c) 2025 Huawei Technologies Co., Ltd.
+This program is free software, you can redistribute it and/or modify it under the terms and conditions of
+Please refer to the License for details. You may not use this file except in compliance with the License.
+THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED,
+INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.
+See LICENSE in the root of the software repository for the full text of the License.
+*/


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove duplicate copyright header.

Lines 1-8 already contain a 2026 copyright header. The second header on lines 9-16 (with a 2025 date) is redundant and should be removed.

🧹 Proposed fix

// See LICENSE in the root of the software repository for the full text of the License. -/** -Copyright (c) 2025 Huawei Technologies Co., Ltd. -This program is free software, you can redistribute it and/or modify it under the terms and conditions of -Please refer to the License for details. You may not use this file except in compliance with the License. -THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, -INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. -See LICENSE in the root of the software repository for the full text of the License. -*/ - `#include` "test_common.h" `#include` "acl/acl.h"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/skills/incore-profiling/vendor/templates/main_template.cpp.in around lines 9 - 16, Remove the redundant copyright block: delete the second documentation comment block (the /** ... */ block that contains the 2025 Huawei copyright text) in main_template.cpp.in so only the original 2026 header remains; locate the duplicate block by searching for the /** ... */ comment containing the 2025 date and remove it, leaving the first header intact.

coderabbitai · 2026-05-17T08:54:28Z

+  npu)
+    if [[ "${RUN_MODE}" != "npu" ]]; then
+      echo "[ERROR] GOLDEN_MODE=npu requires RUN_MODE=npu" >&2
+      exit 2
+    fi
+    python3 "${ROOT_DIR}/golden.py"
+    LD_LIBRARY_PATH="${LD_LIBRARY_PATH_NPU}" "${ROOT_DIR}/${BUILD_DIR}/@EXECUTABLE@"
+    copy_outputs_as_golden
+    python3 "${ROOT_DIR}/golden.py"
+    LD_LIBRARY_PATH="${LD_LIBRARY_PATH_NPU}" "${ROOT_DIR}/${BUILD_DIR}/@EXECUTABLE@"
+    COMPARE_STRICT=1 python3 "${ROOT_DIR}/compare.py"
+    ;;


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify the redundant execution sequence in GOLDEN_MODE=npu.

The npu mode runs the following sequence:

golden.py (line 20, script startup)

golden.py (line 127)

NPU executable (line 128)

copy_outputs_as_golden (line 129)

golden.py (line 130)

NPU executable (line 131)

compare.py (line 132)

This copies the first run's outputs (step 3) as golden (step 4), then regenerates inputs (step 5) and runs again (step 6), effectively comparing two separate NPU runs with potentially different inputs. If the intent is determinism testing (same inputs, two runs), step 5 should be removed. If the intent is golden-vs-test comparison, step 4 should happen before step 2.

Consider adding a comment explaining the rationale, or restructuring to make the intent clearer.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/skills/incore-profiling/vendor/templates/run_sh_template.sh.in around lines 122 - 133, The npu branch currently runs golden.py, the NPU executable, copy_outputs_as_golden, then golden.py + executable again before running compare.py, which makes the two runs use different regenerated inputs; decide the intent and adjust: for determinism testing remove the second invocation of golden.py so both runs use identical inputs (change the sequence around the golden.py call referenced as golden.py and the second golden.py call), or for golden-vs-test ensure copy_outputs_as_golden runs after the initial golden.py/generation step and before regenerating inputs (i.e., move or reorder the copy_outputs_as_golden call relative to the golden.py calls), and add a short comment describing the chosen intent near GOLDEN_MODE=npu, RUN_MODE check and around copy_outputs_as_golden, golden.py and compare.py invocations to make the behavior explicit.

Hzfengsy added 4 commits May 17, 2026 16:47

docs: add in-core profiling skill design spec

8c13f02

docs(skill): add incore-profiling skill guide

e76c8c5

chore: remove transient incore-profiling design docs

0e6f34b

Copilot AI review requested due to automatic review settings May 17, 2026 08:48

github-project-automation Bot added this to pto project May 17, 2026

Copilot started reviewing on behalf of Hzfengsy May 17, 2026 08:48 View session

Hzfengsy marked this pull request as draft May 17, 2026 08:50

gemini-code-assist Bot reviewed May 17, 2026

View reviewed changes

Copilot AI reviewed May 17, 2026

View reviewed changes

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

	base_cmd = " ".join(prefix + ["python", sh_quote(str(case))] + [sh_quote(x) for x in case_args])
	base_cmd = " ".join([sh_quote(p) for p in prefix] + ["python", sh_quote(str(case))] + [sh_quote(x) for x in case_args])

Conversation

Hzfengsy commented May 17, 2026

Summary

Testing

Related Issues

Uh oh!

coderabbitai Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 17, 2026 •

edited

Loading