Skip to content

[Codecheck]refactor: split nbnc hotspot files#701

Closed
HecreReed wants to merge 9 commits into
hw-native-sys:mainfrom
HecreReed:codex/nbnc-hotspot-split-20260526
Closed

[Codecheck]refactor: split nbnc hotspot files#701
HecreReed wants to merge 9 commits into
hw-native-sys:mainfrom
HecreReed:codex/nbnc-hotspot-split-20260526

Conversation

@HecreReed
Copy link
Copy Markdown
Collaborator

@HecreReed HecreReed commented May 26, 2026

Summary

Rework the NBNC cleanup as real source decomposition. The earlier .def include-only approach was reverted.

Resolved in this branch:

  • tools/ptobc/generated/ptobc_opcodes_v0.h: split declarations into the header and definitions into compiled ptobc_opcodes_v0.cpp.
  • lib/PTO/Transforms/GraphSyncSolver/SyncSolver.cpp: split merge/backward-sync logic into compiled SyncSolverMerge.cpp.
  • lib/PTO/Transforms/PTOViewToMemref.cpp: split compute-op rewrite stage into compiled PTOViewToMemrefCompute.cpp.

In progress:

  • lib/PTO/Transforms/PTOToEmitC.cpp: Arith/Affine conversion patterns are now in compiled PTOToEmitCArith.cpp; more PTO op pattern groups still need to be split.

Remaining hotspots:

  • lib/PTO/IR/PTO.cpp
  • lib/PTO/Transforms/PTOToEmitC.cpp

Current NBNC snapshot:

  • SyncSolver.cpp: 1620
  • SyncSolverMerge.cpp: 646
  • PTOViewToMemref.cpp: 1466
  • PTOViewToMemrefCompute.cpp: 1419
  • ptobc_opcodes_v0.h: 48
  • ptobc_opcodes_v0.cpp: 657
  • PTOToEmitC.cpp: 9112
  • PTO.cpp: 11280

Validation

  • cmake -G Ninja -S . -B /tmp/ptoas-nbnc-build -DLLVM_DIR=/Users/laoda/llvm-workspace/llvm-project/llvm/build-shared/lib/cmake/llvm -DMLIR_DIR=/Users/laoda/llvm-workspace/llvm-project/llvm/build-shared/lib/cmake/mlir -DCMAKE_BUILD_TYPE=Release -DPTO_ENABLE_PYTHON_BINDING=OFF
  • ninja -C /tmp/ptoas-nbnc-build ptoas ptobc
  • ninja -C /tmp/ptoas-nbnc-build check-pto (244/244 passed)

Notes

Draft status is intentional until PTO.cpp and the remaining PTOToEmitC.cpp pattern groups are split below the NBNC thresholds.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the codebase by moving large implementation blocks from source and header files (SyncSolver.cpp and ptobc_opcodes_v0.h) into separate definition files (SyncSolver.def and ptobc_opcodes_v0.def) and including them. Feedback points out that ptobc_opcodes_v0.h is missing #pragma once or header guards, which could lead to multiple inclusion issues, and suggests updating the generator script to target the .def file directly.

}

} // namespace ptobc::v0
#include "ptobc_opcodes_v0.def"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The header file ptobc_opcodes_v0.h is missing #pragma once or header guards, which can lead to multiple inclusion issues. Additionally, this file is generated by docs/bytecode/tools/gen_v0_tables.py. The generator script should be updated to output to ptobc_opcodes_v0.def instead of ptobc_opcodes_v0.h to prevent future runs from overwriting this file and reverting the refactoring.

Suggested change
#include "ptobc_opcodes_v0.def"
#pragma once\n#include \"ptobc_opcodes_v0.def\"

@reedhecre
Copy link
Copy Markdown

reedhecre commented May 26, 2026

Codex Review

该评论由 review 机器人自动更新。

  • PR: [Codecheck]refactor: split nbnc hotspot files #701 refactor: split nbnc hotspot files
  • Author: HecreReed
  • Base/Head: main / codex/nbnc-hotspot-split-20260526
  • Head SHA: b535fdc4813e
  • Trigger: PR 有新提交
  • Generated At: 2026-05-26T12:21:24Z
  • Previous Head SHA: 3ce0de370726
  • Status: completed

Summary

未检查到 PR #701 存在问题

Findings

No issues found.

@HecreReed HecreReed marked this pull request as ready for review May 26, 2026 11:21
@HecreReed
Copy link
Copy Markdown
Collaborator Author

/review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ce0de3707

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if (v == 0xFFFF) return std::nullopt;
return v;
}
extern const OpInfo kOpTable[];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep opcode table visible to coverage checker

Replacing the inline kOpTable definition with an extern declaration means the header no longer contains any {0x..., "op"} entries, but tools/ptobc/tests/opcode_coverage_check.py (invoked by ptobc_opcode_coverage_check in tools/ptobc/tests/CMakeLists.txt) still parses this header text to validate opcode coverage. After this refactor the checker sees no table rows in the header, so this test no longer validates the generated opcode map and can miss opcode-table regressions.

Useful? React with 👍 / 👎.

@reedhecre
Copy link
Copy Markdown

Manual Codex Review

该评论由 /review 手动触发。

Summary

收到 /review,正在执行 Codex review。

Findings

Review in progress.

@reedhecre
Copy link
Copy Markdown

Manual Codex Review

该评论由 /review 手动触发。

Summary

ptobc opcode table splitting regresses the existing coverage check: CI still scans the header, but the actual table moved into the new .cpp file.

Findings

  1. P2 ptobc opcode coverage check is disconnected from the generated table tools/ptobc/generated/ptobc_opcodes_v0.h:56

This refactor moves the opcode table and lookup implementations out of ptobc_opcodes_v0.h into ptobc_opcodes_v0.cpp, but the existing ptobc_opcode_coverage_check test is still wired to scan the header for table entries (parse_header_names() looks for {0x..., "name"} records in ptobc_opcodes_v0.h). Before this PR the header contained the 179 generated entries; after this change it only contains declarations like extern const OpInfo kOpTable[];, so the coverage check no longer validates the compiled opcode table at all. That is a CI regression: missing/forgotten opcode mappings can now slip through without the intended guard catching them.

@HecreReed HecreReed changed the title refactor: split nbnc hotspot files [Codecheck]refactor: split nbnc hotspot files May 27, 2026
@HecreReed
Copy link
Copy Markdown
Collaborator Author

/run a3

@reedhecre
Copy link
Copy Markdown

已接收 /run a3,A3 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre
Copy link
Copy Markdown

A3 板测失败

  • 触发方式:manual
  • 源码提交:52843f0fb608
  • 结果汇总:OK 217 / FAIL 2 / SKIP 1
  • 日志:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260527_090605_manual_pr701.log
  • 手动指令:/run a3
  • 触发人:HecreReed
  • 触发评论:[Codecheck]refactor: split nbnc hotspot files #701 (comment)
  • 失败阶段:board-validation / exit=1

失败用例

  • syncall_binding (run, exit=1)
  • tprefetch_async_binding (run, exit=1)

@reedhecre
Copy link
Copy Markdown

A3 板测失败详情:PR #701

syncall_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507014 (/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260527_090605_manual_pr701/npu_validation/SyncAll/syncall_binding/main.cpp:84)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 2339786] 2026-05-27-09:48:48.293.501 (EZ9999):  The error from device(chipId:2, dieId:0), serial number is 144, there is an exception of aicore error, core id is 5, error code = 0, dump info: pc start: 0x124800000000, current: 0x124800000188, vec error info: 0, mte error info: 0x7a06000031, ifu error info: 0x212c20008c900, ccu error info: 0x40a01900778000d8, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100000000.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:645]
        TraceBack (most recent call last):
       The extend info: errcode:(0, 0, 0) errorStr: timeout or trap error. fixp_error0 info: 0x6000031, fixp_error1 info: 0x7a, fsmId:1, tslot:0, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:658]
       Kernel task happen error, retCode=0x25, [aicore timeout].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1729]
       AICORE Kernel task happen error, retCode=0x25.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [DFX_INFO]Aicore kernel execute failed, device_id=4, stream_id=46, report_stream_id=46, task_id=0, flip_num=0, fault kernel_name=_Z22syncall_binding_kernelPii, fault kernel info ext=_Z22syncall_binding_kernelPii, program id=0, hash=3129332313788381512.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       rtStreamSynchronize execution failed, reason=aicore timeout[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507014[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-27 09:48:50] ERROR: testcase failed (exit 1): syncall_binding
tprefetch_async_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260527_090605_manual_pr701/npu_validation/TPrefetchAsync/tprefetch_async_binding/main.cpp:91)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 2616324] 2026-05-27-09:49:27.959.095 (EZ9999):  The error from device(chipId:2, dieId:0), serial number is 145, there is an exception of aivec error, core id is 2, error code = 0, dump info: pc start: 0x124800000000, current: 0x124800000160, vec error info: 0x1e000000a8, mte error info: 0xa8060000cb, ifu error info: 0x3000000000000, ccu error info: 0x40a00e0000000052, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100000000.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:645]
        TraceBack (most recent call last):
       The extend info: errcode:(0, 0x200000000000000, 0) errorStr: The MPU address access is invalid. fixp_error0 info: 0x60000cb, fixp_error1 info: 0xa8, fsmId:1, tslot:0, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:658]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1729]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [DFX_INFO]Aicore kernel execute failed, device_id=4, stream_id=46, report_stream_id=46, task_id=0, flip_num=0, fault kernel_name=_Z30tprefetch_async_binding_kernelPfPa, fault kernel info ext=_Z30tprefetch_async_binding_kernelPfPa, program id=0, hash=8435686547367685641.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-27 09:49:29] ERROR: testcase failed (exit 1): tprefetch_async_binding

@HecreReed HecreReed closed this May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants