fix(codegen): Emit per-kernel ArgDirection signatures in kernel_config#1466
fix(codegen): Emit per-kernel ArgDirection signatures in kernel_config#1466luohuan19 wants to merge 2 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRecords per-kernel runtime argument-direction lists during orchestration codegen, returns them in OrchestrationResult, and uses them to optionally emit per-kernel "signature" entries (ArgDirection enum members) in generated kernel_config.py. ChangesKernel Signature Data Flow
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements per-kernel signature emission to resolve an issue where the runtime tensor dump failed to capture data due to empty signatures. The changes span the C++ orchestration codegen, Python bindings, and the backend configuration generator, ensuring that kernel argument directions (IN, OUT, INOUT, SCALAR) are correctly propagated to the runtime configuration. Feedback was provided regarding the RecordKernelSignature implementation in orchestration_codegen.cpp, noting that the 'first call wins' strategy using try_emplace may be inaccurate for Spmd and Group wrapper types; it is recommended to compute effective directions by inspecting inner kernel calls to ensure accurate dependency tracking.
Exclude scalar args from the per-kernel CoreCallable signature.
The CoreCallable signature_[] array is sized to CORE_MAX_TENSOR_ARGS (16)
and is a per-tensor-arg direction list. RecordKernelSignature previously
recorded every ParamEntry, including scalars, so a kernel with more than
16 total params (tensors + scalars) overflowed make_callable's MaxSig
guard ("sig_count exceeds MaxSig") — observed building the Qwen3-14B
decode kernels.
Scalars live in a separate scalar-arg store (CORE_MAX_SCALAR_ARGS) and
the runtime tensor dump skips SCALAR entries anyway, so excluding them is
behaviorally identical for the dump and makes sig_count equal to the
payload tensor_count, bounded by the same CORE_MAX_TENSOR_ARGS cap that
check_add_tensor_valid enforces on the payload.
- orchestration_codegen.cpp: skip Scalar-direction params when recording
the signature; update the explanatory comment.
- Update binding / stub / config-emitter docstrings to state scalars are
excluded (tensor-arg directions only, tensors-first order).
- Add a regression test asserting a scalar-bearing kernel's signature
contains only its tensor directions; tighten existing tests to assert
no SCALAR members are emitted.
Exclude scalar args from the per-kernel CoreCallable signature.
The CoreCallable signature_[] array is sized to CORE_MAX_TENSOR_ARGS (16)
and is a per-tensor-arg direction list. RecordKernelSignature previously
recorded every ParamEntry, including scalars, so a kernel with more than
16 total params (tensors + scalars) overflowed make_callable's MaxSig
guard ("sig_count exceeds MaxSig") — observed building the Qwen3-14B
decode kernels.
Scalars live in a separate scalar-arg store (CORE_MAX_SCALAR_ARGS) and
the runtime tensor dump skips SCALAR entries anyway, so excluding them is
behaviorally identical for the dump and makes sig_count equal to the
payload tensor_count, bounded by the same CORE_MAX_TENSOR_ARGS cap that
check_add_tensor_valid enforces on the payload.
- orchestration_codegen.cpp: skip Scalar-direction params when recording
the signature; update the explanatory comment.
- Update binding / stub / config-emitter docstrings to state scalars are
excluded (tensor-arg directions only, tensors-first order).
- Add a regression test asserting a scalar-bearing kernel's signature
contains only its tensor directions; tighten existing tests to assert
no SCALAR members are emitted.
2b3db53 to
26fc0fb
Compare
Fixes hw-native-sys#1458 Codegen-generated kernels were registered with an empty CoreCallable signature: kernel_config.py's KERNELS entries carried no "signature" field, so the runtime built each callable with sig_count()==0. The tensormap_and_ringbuffer tensor dump skips a task when the summed per-active-subtask tensor-arg count does not match the task payload tensor_count, so --dump-tensor captured nothing for a codegen matmul (count 0 != payload 3). Orchestration codegen now records each kernel's runtime ArgDirection signature, taken from the same ParamEntry vector that emits the task payload (add_input/output/inout/scalar) calls — guaranteeing the signature's non-SCALAR entries line up 1:1 with the payload tensors. The signature is threaded through OrchestrationResult and emitted into kernel_config.py as a "signature" field of simpler.task_interface ArgDirection members. Kernels without a recorded signature fall back to the prior empty-signature behavior. Populating signatures affects only the tensor dump: the leaf CoreCallable signature is unconsumed by the dispatch / arg-marshalling path on both runtimes. Adds regression tests for kernel_config.py signature emission and for OrchestrationResult.func_name_to_signature population.
Exclude scalar args from the per-kernel CoreCallable signature.
The CoreCallable signature_[] array is sized to CORE_MAX_TENSOR_ARGS (16)
and is a per-tensor-arg direction list. RecordKernelSignature previously
recorded every ParamEntry, including scalars, so a kernel with more than
16 total params (tensors + scalars) overflowed make_callable's MaxSig
guard ("sig_count exceeds MaxSig") — observed building the Qwen3-14B
decode kernels.
Scalars live in a separate scalar-arg store (CORE_MAX_SCALAR_ARGS) and
the runtime tensor dump skips SCALAR entries anyway, so excluding them is
behaviorally identical for the dump and makes sig_count equal to the
payload tensor_count, bounded by the same CORE_MAX_TENSOR_ARGS cap that
check_add_tensor_valid enforces on the payload.
- orchestration_codegen.cpp: skip Scalar-direction params when recording
the signature; update the explanatory comment.
- Update binding / stub / config-emitter docstrings to state scalars are
excluded (tensor-arg directions only, tensors-first order).
- Add a regression test asserting a scalar-bearing kernel's signature
contains only its tensor directions; tighten existing tests to assert
no SCALAR members are emitted.
26fc0fb to
6872487
Compare
Summary
CoreCallablesignature:kernel_config.py'sKERNELSentries carried nosignaturefield, so the runtime built each callable withsig_count()==0.tensormap_and_ringbuffertensor dump skips a task when the summed per-active-subtask tensor-arg count does not match the task payloadtensor_count, so--dump-tensorcaptured nothing for a codegen matmul (count 0 != payload 3).ArgDirectionsignature, taken from the sameParamEntryvector that emits the task payload (add_input/output/inout/scalar) calls — guaranteeing the signature's non-SCALARentries line up 1:1 with the payload tensors.OrchestrationResultand emitted intokernel_config.pyas asignaturefield ofsimpler.task_interfaceArgDirectionmembers. Kernels without a recorded signature fall back to the prior empty-signature behavior.CoreCallablesignature is unconsumed by the dispatch / arg-marshalling path on both runtimes.Testing
kernel_config.pysignature emission and forOrchestrationResult.func_name_to_signaturepopulation.Related Issues
Fixes #1458