Skip to content

fix(codegen): Emit per-kernel ArgDirection signatures in kernel_config#1466

Open
luohuan19 wants to merge 2 commits into
hw-native-sys:mainfrom
luohuan19:issue-1458-codegen-kernel-signatures
Open

fix(codegen): Emit per-kernel ArgDirection signatures in kernel_config#1466
luohuan19 wants to merge 2 commits into
hw-native-sys:mainfrom
luohuan19:issue-1458-codegen-kernel-signatures

Conversation

@luohuan19
Copy link
Copy Markdown
Contributor

Summary

  • Codegen-generated kernels were registered with an empty CoreCallable signature: kernel_config.py's KERNELS entries carried no signature field, so the runtime built each callable with sig_count()==0.
  • The tensormap_and_ringbuffer tensor dump skips a task when the summed per-active-subtask tensor-arg count does not match the task payload tensor_count, so --dump-tensor captured nothing for a codegen matmul (count 0 != payload 3).
  • Orchestration codegen now records each kernel's runtime ArgDirection signature, taken from the same ParamEntry vector that emits the task payload (add_input/output/inout/scalar) calls — guaranteeing the signature's non-SCALAR entries line up 1:1 with the payload tensors.
  • The signature is threaded through OrchestrationResult and emitted into kernel_config.py as a signature field of simpler.task_interface ArgDirection members. Kernels without a recorded signature fall back to the prior empty-signature behavior.
  • Populating signatures affects only the tensor dump: the leaf CoreCallable signature is unconsumed by the dispatch / arg-marshalling path on both runtimes.

Testing

  • Adds regression tests for kernel_config.py signature emission and for OrchestrationResult.func_name_to_signature population.

Related Issues

Fixes #1458

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Records per-kernel runtime argument-direction lists during orchestration codegen, returns them in OrchestrationResult, and uses them to optionally emit per-kernel "signature" entries (ArgDirection enum members) in generated kernel_config.py.

Changes

Kernel Signature Data Flow

Layer / File(s) Summary
OrchestrationResult signature field and bindings
include/pypto/codegen/orchestration/orchestration_codegen.h, python/pypto/pypto_core/codegen.pyi, python/bindings/modules/codegen.cpp
Adds func_name_to_signature: std::map<std::string, std::vector<std::string>> to OrchestrationResult and exposes/document-docstrings the property in Python type stubs and nanobind docs.
C++ signature helpers and orchestration threading
src/codegen/orchestration/orchestration_codegen.cpp
Adds ArgDirection→runtime-name mapping and RecordKernelSignature, stores a shared func_name_to_signature_ in OrchestrationStmtCodegen, declares a local map in GenerateOrchestration, passes it into the statement codegen, and returns it in OrchestrationResult.
Signature recording at task emission sites
src/codegen/orchestration/orchestration_codegen.cpp
Calls RecordKernelSignature after building task parameters for normal function calls, SPMD wrappers, AIV-only Group dispatches, and MixedKernels Group dispatch (duplicate call observed for aic_name).
Config file generation with signature emission
python/pypto/backend/pto_backend.py
_generate_config_file gains optional func_name_to_signature, conditionally imports ArgDirection as _D when any signatures exist, and emits per-kernel "signature": [_D.IN, _D.OUT, ...] lists; call site in _generate_single_chip passes orch_result.func_name_to_signature.
Test coverage
tests/ut/backend/test_kernel_config_signature.py, tests/ut/codegen/test_orchestration_codegen.py
Adds backend tests validating conditional import and signature emission, and augments orchestration tests to assert func_name_to_signature contents and that scalar args are excluded from recorded signatures.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • lyfne123
  • Hzfengsy

Poem

🐰 I hopped through headers, maps, and lines,
I caught each kernel's arg-direction signs;
IN, OUT, INOUT — the order kept tight,
So dumps and payloads now match just right. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: emitting per-kernel ArgDirection signatures in kernel_config to fix the tensor dump issue for codegen matmul.
Description check ✅ Passed The description comprehensively explains the problem (empty CoreCallable signatures causing tensor dump to skip tasks), the solution (threading ArgDirection signatures through OrchestrationResult into kernel_config), and testing coverage.
Linked Issues check ✅ Passed The PR fully addresses the requirements in #1458: recording kernel ArgDirection signatures in orchestration codegen, threading them through OrchestrationResult, emitting them in kernel_config.py, excluding scalars from signatures, and adding comprehensive regression tests.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing per-kernel ArgDirection signatures: header declarations, C++ codegen logic, Python bindings, config generation, type stubs, and regression tests with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements per-kernel signature emission to resolve an issue where the runtime tensor dump failed to capture data due to empty signatures. The changes span the C++ orchestration codegen, Python bindings, and the backend configuration generator, ensuring that kernel argument directions (IN, OUT, INOUT, SCALAR) are correctly propagated to the runtime configuration. Feedback was provided regarding the RecordKernelSignature implementation in orchestration_codegen.cpp, noting that the 'first call wins' strategy using try_emplace may be inaccurate for Spmd and Group wrapper types; it is recommended to compute effective directions by inspecting inner kernel calls to ensure accurate dependency tracking.

Comment thread src/codegen/orchestration/orchestration_codegen.cpp
luohuan19 added a commit to luohuan19/pypto that referenced this pull request May 22, 2026
Exclude scalar args from the per-kernel CoreCallable signature.

The CoreCallable signature_[] array is sized to CORE_MAX_TENSOR_ARGS (16)
and is a per-tensor-arg direction list. RecordKernelSignature previously
recorded every ParamEntry, including scalars, so a kernel with more than
16 total params (tensors + scalars) overflowed make_callable's MaxSig
guard ("sig_count exceeds MaxSig") — observed building the Qwen3-14B
decode kernels.

Scalars live in a separate scalar-arg store (CORE_MAX_SCALAR_ARGS) and
the runtime tensor dump skips SCALAR entries anyway, so excluding them is
behaviorally identical for the dump and makes sig_count equal to the
payload tensor_count, bounded by the same CORE_MAX_TENSOR_ARGS cap that
check_add_tensor_valid enforces on the payload.

- orchestration_codegen.cpp: skip Scalar-direction params when recording
  the signature; update the explanatory comment.
- Update binding / stub / config-emitter docstrings to state scalars are
  excluded (tensor-arg directions only, tensors-first order).
- Add a regression test asserting a scalar-bearing kernel's signature
  contains only its tensor directions; tighten existing tests to assert
  no SCALAR members are emitted.
luohuan19 added a commit to luohuan19/pypto that referenced this pull request May 24, 2026
Exclude scalar args from the per-kernel CoreCallable signature.

The CoreCallable signature_[] array is sized to CORE_MAX_TENSOR_ARGS (16)
and is a per-tensor-arg direction list. RecordKernelSignature previously
recorded every ParamEntry, including scalars, so a kernel with more than
16 total params (tensors + scalars) overflowed make_callable's MaxSig
guard ("sig_count exceeds MaxSig") — observed building the Qwen3-14B
decode kernels.

Scalars live in a separate scalar-arg store (CORE_MAX_SCALAR_ARGS) and
the runtime tensor dump skips SCALAR entries anyway, so excluding them is
behaviorally identical for the dump and makes sig_count equal to the
payload tensor_count, bounded by the same CORE_MAX_TENSOR_ARGS cap that
check_add_tensor_valid enforces on the payload.

- orchestration_codegen.cpp: skip Scalar-direction params when recording
  the signature; update the explanatory comment.
- Update binding / stub / config-emitter docstrings to state scalars are
  excluded (tensor-arg directions only, tensors-first order).
- Add a regression test asserting a scalar-bearing kernel's signature
  contains only its tensor directions; tighten existing tests to assert
  no SCALAR members are emitted.
@luohuan19 luohuan19 force-pushed the issue-1458-codegen-kernel-signatures branch from 2b3db53 to 26fc0fb Compare May 24, 2026 12:55
luohuan19 added 2 commits May 25, 2026 09:24
Fixes hw-native-sys#1458

Codegen-generated kernels were registered with an empty CoreCallable
signature: kernel_config.py's KERNELS entries carried no "signature"
field, so the runtime built each callable with sig_count()==0. The
tensormap_and_ringbuffer tensor dump skips a task when the summed
per-active-subtask tensor-arg count does not match the task payload
tensor_count, so --dump-tensor captured nothing for a codegen matmul
(count 0 != payload 3).

Orchestration codegen now records each kernel's runtime ArgDirection
signature, taken from the same ParamEntry vector that emits the task
payload (add_input/output/inout/scalar) calls — guaranteeing the
signature's non-SCALAR entries line up 1:1 with the payload tensors.
The signature is threaded through OrchestrationResult and emitted into
kernel_config.py as a "signature" field of simpler.task_interface
ArgDirection members. Kernels without a recorded signature fall back to
the prior empty-signature behavior.

Populating signatures affects only the tensor dump: the leaf
CoreCallable signature is unconsumed by the dispatch / arg-marshalling
path on both runtimes.

Adds regression tests for kernel_config.py signature emission and for
OrchestrationResult.func_name_to_signature population.
Exclude scalar args from the per-kernel CoreCallable signature.

The CoreCallable signature_[] array is sized to CORE_MAX_TENSOR_ARGS (16)
and is a per-tensor-arg direction list. RecordKernelSignature previously
recorded every ParamEntry, including scalars, so a kernel with more than
16 total params (tensors + scalars) overflowed make_callable's MaxSig
guard ("sig_count exceeds MaxSig") — observed building the Qwen3-14B
decode kernels.

Scalars live in a separate scalar-arg store (CORE_MAX_SCALAR_ARGS) and
the runtime tensor dump skips SCALAR entries anyway, so excluding them is
behaviorally identical for the dump and makes sig_count equal to the
payload tensor_count, bounded by the same CORE_MAX_TENSOR_ARGS cap that
check_add_tensor_valid enforces on the payload.

- orchestration_codegen.cpp: skip Scalar-direction params when recording
  the signature; update the explanatory comment.
- Update binding / stub / config-emitter docstrings to state scalars are
  excluded (tensor-arg directions only, tensors-first order).
- Add a regression test asserting a scalar-bearing kernel's signature
  contains only its tensor directions; tighten existing tests to assert
  no SCALAR members are emitted.
@luohuan19 luohuan19 force-pushed the issue-1458-codegen-kernel-signatures branch from 26fc0fb to 6872487 Compare May 25, 2026 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[Bug] --dump-tensor captures nothing for codegen matmul on a2a3: TRB dump skips task (active callable tensor count 0 != payload 3)

1 participant