Add incore-profiling skill: msprof op-simulator kernel profiler#1386
Add incore-profiling skill: msprof op-simulator kernel profiler#1386Hzfengsy wants to merge 4 commits into
Conversation
Relocate the msprof op-simulator kernel profiler into a self-contained
.claude/skills/incore-profiling/ folder: incore_profile.py plus a vendored
generate_testcase.py harness generator. Add a --target {a2a3,a5} knob with
CANN and camodel-SoC auto-discovery so the tool runs on any Ascend setup.
📝 WalkthroughWalkthroughThis PR introduces a complete in-core profiling skill for PyPTO kernels using CANN's msprof op simulator. It includes skill documentation, a Python orchestration script that discovers kernels and runs msprof collection, and supporting code-generation templates for golden reference generation, output comparison, and build automation. ChangesIn-Core Profiling Skill for PyPTO Kernels
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a new in-core kernel profiling skill for PyPTO using the Ascend msprof op-simulator, including an automation script and several test harness templates. Feedback focused on improving the robustness of environment variable parsing to avoid shell noise and addressing security vulnerabilities related to command injection and insecure library path configurations in both the Python script and shell templates.
| cmd = f"source {sh_quote(str(set_env_path))} >/dev/null && env -0" | ||
| cp = subprocess.run( | ||
| ["bash", "-lc", cmd], | ||
| capture_output=True, | ||
| check=False, | ||
| ) | ||
| if cp.returncode != 0: | ||
| raise StepError(cp.stderr.decode("utf-8", errors="replace")) | ||
| env = base_env.copy() | ||
| for item in cp.stdout.split(b"\0"): | ||
| if not item or b"=" not in item: | ||
| continue | ||
| key, val = item.split(b"=", 1) | ||
| env[key.decode()] = val.decode("utf-8", errors="replace") |
There was a problem hiding this comment.
Using bash -lc to source the environment can capture unexpected output from shell profiles (e.g., .bashrc or /etc/profile) into stdout. This noise can corrupt the environment parsing logic, especially if the noise contains = or if it merges with the first environment variable. Using a unique marker to separate the shell noise from the env -0 output makes the parsing more robust. Additionally, ensure that failures in the external process are handled explicitly with informative error messages to avoid silent failures and aid debugging.
| cmd = f"source {sh_quote(str(set_env_path))} >/dev/null && env -0" | |
| cp = subprocess.run( | |
| ["bash", "-lc", cmd], | |
| capture_output=True, | |
| check=False, | |
| ) | |
| if cp.returncode != 0: | |
| raise StepError(cp.stderr.decode("utf-8", errors="replace")) | |
| env = base_env.copy() | |
| for item in cp.stdout.split(b"\0"): | |
| if not item or b"=" not in item: | |
| continue | |
| key, val = item.split(b"=", 1) | |
| env[key.decode()] = val.decode("utf-8", errors="replace") | |
| marker = b"---ENV_START---" | |
| cmd = f"source {sh_quote(str(set_env_path))} >/dev/null && echo -n {marker.decode()} && env -0" | |
| cp = subprocess.run( | |
| ["bash", "-lc", cmd], | |
| capture_output=True, | |
| check=False, | |
| ) | |
| if cp.returncode != 0: | |
| raise StepError(cp.stderr.decode("utf-8", errors="replace")) | |
| if marker not in cp.stdout: | |
| raise StepError(f"CANN set_env.sh failed or produced no environment: {set_env_path}") | |
| env = base_env.copy() | |
| _, env_data = cp.stdout.split(marker, 1) | |
| for item in env_data.split(b"\0"): | |
| if not item or b"=" not in item: | |
| continue | |
| key, val = item.split(b"=", 1) | |
| env[key.decode()] = val.decode("utf-8", errors="replace") |
References
- When running external processes, handle cases where the process fails with a non-zero exit code but produces no output. Log an informative error message to prevent silent failures and aid debugging.
| if "=" not in item: | ||
| raise StepError(f"--run-env expects KEY=VALUE, got: {item}") | ||
| prefix.append(item) | ||
| base_cmd = " ".join(prefix + ["python", sh_quote(str(case))] + [sh_quote(x) for x in case_args]) |
There was a problem hiding this comment.
The items in prefix (derived from --run-env) are joined into base_cmd without quoting. If a user provides an environment variable with spaces (e.g., --run-env "VAR=val with spaces"), the resulting shell command will be malformed. Quoting each item ensures the environment variables are correctly interpreted by the shell and prevents potential command injection.
| base_cmd = " ".join(prefix + ["python", sh_quote(str(case))] + [sh_quote(x) for x in case_args]) | |
| base_cmd = " ".join([sh_quote(p) for p in prefix] + ["python", sh_quote(str(case))] + [sh_quote(x) for x in case_args]) |
|
|
||
| # Improve runtime linking robustness. | ||
| if [[ -n "${ASCEND_HOME_PATH:-}" ]]; then | ||
| export LD_LIBRARY_PATH="${ASCEND_HOME_PATH}/lib64:${LD_LIBRARY_PATH:-}" |
There was a problem hiding this comment.
Appending :${LD_LIBRARY_PATH:-} when LD_LIBRARY_PATH is empty or unset results in a trailing colon. In many shells, a trailing colon in LD_LIBRARY_PATH is interpreted as including the current directory (.), which is a security risk and can lead to unexpected behavior. Use the ${VAR:+:${VAR}} pattern to safely append the separator only when the variable is non-empty.
export LD_LIBRARY_PATH="${ASCEND_HOME_PATH}/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
| "${ASCEND_HOME_PATH}/aarch64-linux/simulator/${SIM_SOC_VERSION}/lib" \ | ||
| "${ASCEND_HOME_PATH}/simulator/${SIM_SOC_VERSION}/lib" \ | ||
| "${ASCEND_HOME_PATH}/tools/simulator/${SIM_SOC_VERSION}/lib"; do | ||
| [[ -d "$d" ]] && LD_LIBRARY_PATH_SIM="$d:${LD_LIBRARY_PATH_SIM}" |
There was a problem hiding this comment.
Similar to the previous issue, prepending $d: when LD_LIBRARY_PATH_SIM is empty results in a trailing colon, which includes the current directory in the library search path. Use the ${VAR:+:${VAR}} pattern for safe concatenation.
[[ -d "$d" ]] && LD_LIBRARY_PATH_SIM="$d${LD_LIBRARY_PATH_SIM:+:${LD_LIBRARY_PATH_SIM}}"
There was a problem hiding this comment.
Pull request overview
Adds a new .claude project skill for profiling PyPTO PTOAS kernels with Ascend msprof op simulator, including the profiling driver, documentation, and a vendored testcase generator/templates so profiling can run without a PTOAS source checkout.
Changes:
- Adds
incore_profile.pyto discover builds/kernels, generate standalone simulator cases, runmsprof, and collect trace artifacts. - Adds
SKILL.mdusage documentation for the new in-core profiling workflow. - Vendors
generate_testcase.pyplus CMake/run/compare/golden/main templates used to build and validate simulator testcases.
Reviewed changes
Copilot reviewed 2 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
.claude/skills/incore-profiling/SKILL.md |
Documents when and how to use the new profiling skill. |
.claude/skills/incore-profiling/incore_profile.py |
Implements build discovery, toolchain setup, testcase generation, simulator profiling, and artifact collection. |
.claude/skills/incore-profiling/vendor/generate_testcase.py |
Vendored testcase generator adapted to local templates and profiling use cases. |
.claude/skills/incore-profiling/vendor/templates/compare_template.py.in |
Generates output comparison helpers for simulator/NPU validation. |
.claude/skills/incore-profiling/vendor/templates/golden_template.py.in |
Generates input/golden data creation script. |
.claude/skills/incore-profiling/vendor/templates/main_template.cpp.in |
Generates ACL host runner source for standalone kernels. |
.claude/skills/incore-profiling/vendor/templates/run_sh_template.sh.in |
Generates shell runner for building and validating standalone cases. |
| candidates += [Path(ascend_home) / "set_env.sh", Path(ascend_home).parent / "set_env.sh"] | ||
| for root in ("/usr/local/Ascend", str(Path.home() / "Ascend"), "/opt/Ascend"): | ||
| candidates += [ | ||
| Path(root) / "ascend-toolkit/set_env.sh", | ||
| Path(root) / "ascend-toolkit/latest/set_env.sh", |
| for p in [ascend_home / "lib64", ascend_home / "devlib"]: | ||
| if p.exists(): | ||
| parts.append(str(p)) | ||
| for triplet in detect_host_triplets(ascend_home): | ||
| for p in [ | ||
| ascend_home / triplet / "devlib", | ||
| ascend_home / triplet / "simulator" / soc_version / "lib", | ||
| ]: | ||
| if p.exists(): | ||
| parts.append(str(p)) |
| if golden.shape != output.shape: | ||
| print(f"[ERROR] Shape mismatch: {golden_path} {golden.shape} vs {output_path} {output.shape}") | ||
| return False | ||
| if not np.allclose(golden, output, atol=eps, rtol=eps, equal_nan=True): |
| # Maps a user-facing NPU target to its compile arch and the SoC-name keywords | ||
| # used to pick a camodel simulator directory. | ||
| TARGET_PROFILES: dict[str, dict[str, object]] = { | ||
| "a2a3": {"aicore_arch": "dav-c220", "soc_keywords": ("910b",)}, |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.claude/skills/incore-profiling/incore_profile.py:
- Around line 108-112: The code in discover_camodel_socs treats a missing
ASCEND_HOME_PATH as Path(""), which resolves to the current directory; change
the logic to first read the raw value = env.get("ASCEND_HOME_PATH") and if not
value (None or empty string) return [] immediately, otherwise construct
ascend_home = Path(value). Update the same pattern in the other functions that
currently use Path(env.get("ASCEND_HOME_PATH", "")) (the block around lines
326-339) so they all check for a truthy env value before creating a Path.
- Around line 683-687: The current --aicore-arch argument uses
os.environ.get("AICORE_ARCH") as its default which lets a stale env var override
the target-derived default; change the CLI handling so the add_argument for
"--aicore-arch" sets default=None (or no env fallback) and after parsing compute
args.aicore_arch from args.target using the existing target->arch mapping, only
using os.environ.get("AICORE_ARCH") as a last-resort fallback when target is not
provided; apply the same change where the other occurrence (lines around the
second add_argument) sets the default.
In @.claude/skills/incore-profiling/SKILL.md:
- Around line 88-90: The doc text claiming "`--target` must currently be passed
explicitly" is out of sync with the CLI which defaults --target to "a2a3";
update the SKILL.md text to state that --target is optional and defaults to
"a2a3" (and remains an override), or alternatively change the CLI default in the
argument parser in incore_profile.py (the add_argument for "--target" /
parse_args) to require explicit passing; pick one fix so the documentation and
the --target behavior (incore_profile.py's argument definition) match.
In @.claude/skills/incore-profiling/vendor/templates/compare_template.py.in:
- Around line 315-370: The code confuses whether cols is "predicate bit count"
or "buffer byte width" in compare_packed_pred_mask; fix by treating cols as the
number of predicate bits: compute row_bytes = ((cols + 63) // 64) * 8 (remove
the min(row_bytes, cols) cap), then set need = rows * row_bytes (instead of rows
* cols), read/reshape using that need and slice golden/output with [:row_bytes]
(golden_sel/output_sel use those bytes); update the docstring to state cols is
predicate bit count and keep variable names rows and cols, or rename cols to
num_pred_bits if you prefer.
In @.claude/skills/incore-profiling/vendor/templates/main_template.cpp.in:
- Around line 9-16: Remove the redundant copyright block: delete the second
documentation comment block (the /** ... */ block that contains the 2025 Huawei
copyright text) in main_template.cpp.in so only the original 2026 header
remains; locate the duplicate block by searching for the /** ... */ comment
containing the 2025 date and remove it, leaving the first header intact.
In @.claude/skills/incore-profiling/vendor/templates/run_sh_template.sh.in:
- Around line 122-133: The npu branch currently runs golden.py, the NPU
executable, copy_outputs_as_golden, then golden.py + executable again before
running compare.py, which makes the two runs use different regenerated inputs;
decide the intent and adjust: for determinism testing remove the second
invocation of golden.py so both runs use identical inputs (change the sequence
around the golden.py call referenced as golden.py and the second golden.py
call), or for golden-vs-test ensure copy_outputs_as_golden runs after the
initial golden.py/generation step and before regenerating inputs (i.e., move or
reorder the copy_outputs_as_golden call relative to the golden.py calls), and
add a short comment describing the chosen intent near GOLDEN_MODE=npu, RUN_MODE
check and around copy_outputs_as_golden, golden.py and compare.py invocations to
make the behavior explicit.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 18ce266c-bbf3-453c-9af5-5f68f34dd206
📒 Files selected for processing (7)
.claude/skills/incore-profiling/SKILL.md.claude/skills/incore-profiling/incore_profile.py.claude/skills/incore-profiling/vendor/generate_testcase.py.claude/skills/incore-profiling/vendor/templates/compare_template.py.in.claude/skills/incore-profiling/vendor/templates/golden_template.py.in.claude/skills/incore-profiling/vendor/templates/main_template.cpp.in.claude/skills/incore-profiling/vendor/templates/run_sh_template.sh.in
| def discover_camodel_socs(env: dict[str, str]) -> list[str]: | ||
| """List simulator SoC names that ship libruntime_camodel.so in the CANN install.""" | ||
| ascend_home = Path(env.get("ASCEND_HOME_PATH", "")) | ||
| if not ascend_home.is_dir(): | ||
| return [] |
There was a problem hiding this comment.
Treat missing ASCEND_HOME_PATH as unset instead of current directory.
Using Path("") makes discovery/path assembly operate on . when ASCEND_HOME_PATH is absent. That can pick unintended local directories and produce invalid SoC/LD paths.
Suggested fix
def discover_camodel_socs(env: dict[str, str]) -> list[str]:
"""List simulator SoC names that ship libruntime_camodel.so in the CANN install."""
- ascend_home = Path(env.get("ASCEND_HOME_PATH", ""))
- if not ascend_home.is_dir():
+ ascend_home_raw = env.get("ASCEND_HOME_PATH")
+ if not ascend_home_raw:
+ return []
+ ascend_home = Path(ascend_home_raw)
+ if not ascend_home.is_dir():
return []
@@
def make_ld_library_path(build_dir: Path, env: dict[str, str], soc_version: str) -> str:
- ascend_home = Path(env.get("ASCEND_HOME_PATH", ""))
+ ascend_home_raw = env.get("ASCEND_HOME_PATH")
parts = [str(build_dir)]
- if ascend_home:
+ if ascend_home_raw:
+ ascend_home = Path(ascend_home_raw)
for p in [ascend_home / "lib64", ascend_home / "devlib"]:
if p.exists():
parts.append(str(p))Also applies to: 326-339
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/skills/incore-profiling/incore_profile.py around lines 108 - 112,
The code in discover_camodel_socs treats a missing ASCEND_HOME_PATH as Path(""),
which resolves to the current directory; change the logic to first read the raw
value = env.get("ASCEND_HOME_PATH") and if not value (None or empty string)
return [] immediately, otherwise construct ascend_home = Path(value). Update the
same pattern in the other functions that currently use
Path(env.get("ASCEND_HOME_PATH", "")) (the block around lines 326-339) so they
all check for a truthy env value before creating a Path.
| "--aicore-arch", | ||
| default=os.environ.get("AICORE_ARCH"), | ||
| help="AICore arch for generate_testcase.py; defaults from --target " | ||
| "(a2a3 -> dav-c220, a5 -> dav-c310)", | ||
| ) |
There was a problem hiding this comment.
AICORE_ARCH currently overrides target-derived defaults unexpectedly.
Help text says the arch defaults from --target, but current defaults pull from AICORE_ARCH first. If that env var is stale, kernels can be generated for the wrong arch.
Suggested fix
- tool_group.add_argument(
- "--aicore-arch",
- default=os.environ.get("AICORE_ARCH"),
- help="AICore arch for generate_testcase.py; defaults from --target "
- "(a2a3 -> dav-c220, a5 -> dav-c310)",
- )
+ tool_group.add_argument(
+ "--aicore-arch",
+ default=None,
+ help="AICore arch for generate_testcase.py; defaults from --target "
+ "(a2a3 -> dav-c220, a5 -> dav-c310)",
+ )
...
- if not args.aicore_arch:
+ if args.aicore_arch is None:
args.aicore_arch = TARGET_PROFILES[args.target]["aicore_arch"]Also applies to: 770-771
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/skills/incore-profiling/incore_profile.py around lines 683 - 687,
The current --aicore-arch argument uses os.environ.get("AICORE_ARCH") as its
default which lets a stale env var override the target-derived default; change
the CLI handling so the add_argument for "--aicore-arch" sets default=None (or
no env fallback) and after parsing compute args.aicore_arch from args.target
using the existing target->arch mapping, only using
os.environ.get("AICORE_ARCH") as a last-resort fallback when target is not
provided; apply the same change where the other occurrence (lines around the
second add_argument) sets the default.
| `--target` must currently be passed explicitly. Once the build pipeline records | ||
| the backend arch into the build folder, `incore_profile.py` can auto-detect it | ||
| and `--build-dir` alone will suffice. `--target` will remain as an override. |
There was a problem hiding this comment.
Align --target requirement text with actual CLI behavior.
This section says --target must be passed explicitly, but the CLI currently defaults --target to a2a3. Please update either the docs or argument default so they match.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/skills/incore-profiling/SKILL.md around lines 88 - 90, The doc text
claiming "`--target` must currently be passed explicitly" is out of sync with
the CLI which defaults --target to "a2a3"; update the SKILL.md text to state
that --target is optional and defaults to "a2a3" (and remains an override), or
alternatively change the CLI default in the argument parser in incore_profile.py
(the add_argument for "--target" / parse_args) to require explicit passing; pick
one fix so the documentation and the --target behavior (incore_profile.py's
argument definition) match.
| def compare_packed_pred_mask(golden_path, output_path, rows, cols): | ||
| """ | ||
| Compare outputs of pto.tcmp / pto.tcmps. | ||
|
|
||
| These ops produce a *packed predicate mask* and do not define every byte in | ||
| the logical u8 tile buffer. In practice, only the first N bytes of each row | ||
| are meaningful (packed as 64-bit chunks). Ignore the rest to avoid flaky | ||
| compares caused by undefined bytes. | ||
| """ | ||
| if not os.path.exists(output_path): | ||
| print(f"[ERROR] Output missing: {output_path}") | ||
| return False | ||
| if not os.path.exists(golden_path): | ||
| print(f"[ERROR] Golden missing: {golden_path}") | ||
| return False | ||
| try: | ||
| rows = int(rows) | ||
| cols = int(cols) | ||
| except Exception: | ||
| print(f"[ERROR] Invalid rows/cols for packed mask compare: rows={rows} cols={cols}") | ||
| return False | ||
| if rows <= 0 or cols <= 0: | ||
| print(f"[ERROR] Invalid rows/cols for packed mask compare: rows={rows} cols={cols}") | ||
| return False | ||
|
|
||
| golden = np.fromfile(golden_path, dtype=np.uint8) | ||
| output = np.fromfile(output_path, dtype=np.uint8) | ||
|
|
||
| need = rows * cols | ||
| if golden.size < need or output.size < need: | ||
| print( | ||
| f"[ERROR] Packed mask buffer too small: need={need} bytes, " | ||
| f"golden={golden.size}, out={output.size}" | ||
| ) | ||
| return False | ||
|
|
||
| golden = golden[:need].reshape(rows, cols) | ||
| output = output[:need].reshape(rows, cols) | ||
|
|
||
| # Packed mask layout: 1 predicate bit per element, packed into 64-bit words | ||
| # per row (so 8 bytes per 64 columns). For cols <= 64 we still use one word. | ||
| row_bytes = ((cols + 63) // 64) * 8 | ||
| row_bytes = min(row_bytes, cols) | ||
|
|
||
| golden_sel = golden[:, :row_bytes].reshape(-1) | ||
| output_sel = output[:, :row_bytes].reshape(-1) | ||
|
|
||
| if not np.array_equal(golden_sel, output_sel): | ||
| diff = np.nonzero(golden_sel != output_sel)[0] | ||
| idx = int(diff[0]) if diff.size else 0 | ||
| print( | ||
| f"[ERROR] Mismatch (packed mask): {golden_path} vs {output_path}, first diff at idx={idx} " | ||
| f"(golden={int(golden_sel[idx])}, out={int(output_sel[idx])}, rows={rows}, cols={cols}, row_bytes={row_bytes})" | ||
| ) | ||
| return False | ||
| return True |
There was a problem hiding this comment.
Clarify the rows/cols parameters in compare_packed_pred_mask.
The function signature and internal logic create ambiguity about what cols represents:
- Line 343 (
need = rows * cols) treatscolsas the buffer byte-width per row - Line 356 (
row_bytes = ((cols + 63) // 64) * 8) treatscolsas the number of predicate bits - The comment on line 355 mentions "8 bytes per 64 columns" but doesn't clarify the unit
For small cols values (e.g., cols=5 predicate bits), line 356 computes row_bytes = 8 (one 64-bit word), but line 357 caps it to min(8, 5) = 5, which would under-read the packed data.
Consider either:
- Renaming parameters to distinguish buffer dimensions from predicate count (e.g.,
buffer_cols_bytesandnum_predicate_bits), or - Adding a docstring clarifying that
colsmust equal both the buffer byte-width and the number of predicate bits, or - Removing line 357 if
colsstrictly represents predicate bits and the buffer is always sized accordingly
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/skills/incore-profiling/vendor/templates/compare_template.py.in
around lines 315 - 370, The code confuses whether cols is "predicate bit count"
or "buffer byte width" in compare_packed_pred_mask; fix by treating cols as the
number of predicate bits: compute row_bytes = ((cols + 63) // 64) * 8 (remove
the min(row_bytes, cols) cap), then set need = rows * row_bytes (instead of rows
* cols), read/reshape using that need and slice golden/output with [:row_bytes]
(golden_sel/output_sel use those bytes); update the docstring to state cols is
predicate bit count and keep variable names rows and cols, or rename cols to
num_pred_bits if you prefer.
| /** | ||
| Copyright (c) 2025 Huawei Technologies Co., Ltd. | ||
| This program is free software, you can redistribute it and/or modify it under the terms and conditions of | ||
| Please refer to the License for details. You may not use this file except in compliance with the License. | ||
| THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, | ||
| INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. | ||
| See LICENSE in the root of the software repository for the full text of the License. | ||
| */ |
There was a problem hiding this comment.
Remove duplicate copyright header.
Lines 1-8 already contain a 2026 copyright header. The second header on lines 9-16 (with a 2025 date) is redundant and should be removed.
🧹 Proposed fix
// See LICENSE in the root of the software repository for the full text of the License.
-/**
-Copyright (c) 2025 Huawei Technologies Co., Ltd.
-This program is free software, you can redistribute it and/or modify it under the terms and conditions of
-Please refer to the License for details. You may not use this file except in compliance with the License.
-THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED,
-INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.
-See LICENSE in the root of the software repository for the full text of the License.
-*/
-
`#include` "test_common.h"
`#include` "acl/acl.h"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/skills/incore-profiling/vendor/templates/main_template.cpp.in around
lines 9 - 16, Remove the redundant copyright block: delete the second
documentation comment block (the /** ... */ block that contains the 2025 Huawei
copyright text) in main_template.cpp.in so only the original 2026 header
remains; locate the duplicate block by searching for the /** ... */ comment
containing the 2025 date and remove it, leaving the first header intact.
| npu) | ||
| if [[ "${RUN_MODE}" != "npu" ]]; then | ||
| echo "[ERROR] GOLDEN_MODE=npu requires RUN_MODE=npu" >&2 | ||
| exit 2 | ||
| fi | ||
| python3 "${ROOT_DIR}/golden.py" | ||
| LD_LIBRARY_PATH="${LD_LIBRARY_PATH_NPU}" "${ROOT_DIR}/${BUILD_DIR}/@EXECUTABLE@" | ||
| copy_outputs_as_golden | ||
| python3 "${ROOT_DIR}/golden.py" | ||
| LD_LIBRARY_PATH="${LD_LIBRARY_PATH_NPU}" "${ROOT_DIR}/${BUILD_DIR}/@EXECUTABLE@" | ||
| COMPARE_STRICT=1 python3 "${ROOT_DIR}/compare.py" | ||
| ;; |
There was a problem hiding this comment.
Clarify the redundant execution sequence in GOLDEN_MODE=npu.
The npu mode runs the following sequence:
golden.py(line 20, script startup)golden.py(line 127)- NPU executable (line 128)
copy_outputs_as_golden(line 129)golden.py(line 130)- NPU executable (line 131)
compare.py(line 132)
This copies the first run's outputs (step 3) as golden (step 4), then regenerates inputs (step 5) and runs again (step 6), effectively comparing two separate NPU runs with potentially different inputs. If the intent is determinism testing (same inputs, two runs), step 5 should be removed. If the intent is golden-vs-test comparison, step 4 should happen before step 2.
Consider adding a comment explaining the rationale, or restructuring to make the intent clearer.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/skills/incore-profiling/vendor/templates/run_sh_template.sh.in
around lines 122 - 133, The npu branch currently runs golden.py, the NPU
executable, copy_outputs_as_golden, then golden.py + executable again before
running compare.py, which makes the two runs use different regenerated inputs;
decide the intent and adjust: for determinism testing remove the second
invocation of golden.py so both runs use identical inputs (change the sequence
around the golden.py call referenced as golden.py and the second golden.py
call), or for golden-vs-test ensure copy_outputs_as_golden runs after the
initial golden.py/generation step and before regenerating inputs (i.e., move or
reorder the copy_outputs_as_golden call relative to the golden.py calls), and
add a short comment describing the chosen intent near GOLDEN_MODE=npu, RUN_MODE
check and around copy_outputs_as_golden, golden.py and compare.py invocations to
make the behavior explicit.
Summary
incore-profilingproject skill: cycle-accurate, single-AI-core profiling of every kernel in a PyPTO build via the Ascendmsprof op simulator..claude/skills/incore-profiling/folder —incore_profile.py(the profiler driver), a vendoredgenerate_testcase.pyharness generator with its.intemplates, andSKILL.md.--target {a2a3,a5}knob plus CANN and camodel-SoC auto-discovery, so the tool runs on any Ascend setup without hardcoded paths. No PTOAS source checkout required (the generator is vendored).Testing
--target a2a3, CANN / SoC / arch all auto-resolved).Related Issues
None.