Describe the bug
Dear AWS Neuron Team,
This is Lucas Cordeiro (https://ssvlab.github.io/lucasccordeiro/) again; thank you for merging #126 (closing #125). This is a small, lower-severity follow-on of the same class (host-side arithmetic with an admissible-but-unhandled input), surfaced by the same mechanical sweep of // <param> expressions across the kernel set.
tutorials/average_pool2d/tensor_avgpool_kernel(in_tensor, pool_size) divides by zero at trace time if pool_size is set to 0. The pool_size parameter is untyped, has no default, and the body adds no pool_size >= 1 precondition, so pool_size == 0 is admissible from the public signature.
The kernel is unchanged on the current main branch (commit 0dd1b48b612c6a1604a50bc6d034b010739e508d, 2026-05-19) — PR #126 fixed only the interpolate kernels and left this one untouched. It is byte-identical to the NKI 0.3.0 / Neuron SDK 2.29 release commit a87aaa44f7b26241bdb152af8838e287669c3947. In src/nki_samples/tutorials/average_pool2d/average_pool2d_nki_kernels.py:
@nki.jit
def tensor_avgpool_kernel(in_tensor, pool_size): # line 15 — pool_size untyped, no default
sz_cin, sz_hin, sz_win = in_tensor.shape
sz_hout = sz_hin // pool_size # line 28 — ZeroDivisionError when pool_size == 0
sz_wout = sz_win // pool_size # line 29 — same
...
sz_pool = pool_size
pool_view = in_tile.ap([
[sz_hin * sz_win, sz_p],
[sz_pool * sz_win, sz_hin // sz_pool], # lines 51–52 — also divide by sz_pool
[sz_pool, sz_win // sz_pool],
...
])
...
nisa.tensor_scalar(dst=out_tile, data=sum_tile, op0=nl.multiply,
operand0=1.0 / (pool_size * pool_size)) # line 59 — also unsafe at pool_size == 0
The first floor-div (sz_hout, line 28) is the trace-time crash point; the sz_wout floor-div (line 29), the two .ap() stride divisions (lines 51–52), and the 1.0 / (pool_size * pool_size) reciprocal (line 59) are all equally unsafe at pool_size == 0 and would fire if line 28 were guarded individually.
Expected Behavior
pool_size <= 0 is rejected at the API boundary with a clear message specifying the violated precondition, e.g., AssertionError: pool_size must be >= 1, rather than an uncaught ZeroDivisionError in the kernel body.
Current Behavior
pool_size == 0 raises ZeroDivisionError: division by zero at JIT trace time, from sz_hout = sz_hin // pool_size (line 28), before any device-side codegen. In a full @nki.jit trace, the exception propagates out of tensor_avgpool_kernel.
Reproduction Steps
Pure host-side Python; reproduces in plain CPython without a Trainium device or neuronx-cc:
sz_hin = 8
pool_size = 0 # the bug
sz_hout = sz_hin // pool_size # ZeroDivisionError: division by zero
Actual traceback (CPython 3.14):
Traceback (most recent call last):
File "<string>", line 4, in <module>
sz_hout = sz_hin // pool_size
~~~~~~~^^~~~~~~~~~~
ZeroDivisionError: division by zero
This is the exact expression at average_pool2d_nki_kernels.py:28; the kernel body never executes under pool_size == 0.
A symbolic counterexample (witness from ESBMC-Python against a stubbed port) that enumerates both floor-div sites in a single run via --multi-property is at:
ESBMC output (trimmed; reproducer lines 24/25 mirror upstream sz_hout/sz_wout at lines 28/29):
[Counterexample]
Violated property:
file harness/avgpool_hostarith_unguarded.py line 24 column 4 function output_shape
division by zero
CWE: CWE-369
pool_size = 0
✗ FAILED: 'division by zero at ... line 24 ... function output_shape'
✗ FAILED: 'division by zero at ... line 25 ... function output_shape'
VERIFICATION FAILED
Regression Issue
Possible Solution
Add a one-line precondition (and, optionally, a type annotation documenting it), mirroring the precondition style adopted in #126 for the interpolate kernels:
@nki.jit
def tensor_avgpool_kernel(in_tensor, pool_size: int):
assert pool_size >= 1, "pool_size must be >= 1"
sz_cin, sz_hin, sz_win = in_tensor.shape
sz_hout = sz_hin // pool_size
...
Additional Information/Context
Found while extending our formal verifier ESBMC-Python (https://github.com/esbmc/esbmc) with a proof-of-concept verification flow for NKI kernels. The same // <param> sweep that surfaced #125 (interpolate, chunk_size == 1) flags this site. The flow compiles a thin stub of nki.language and runs symbolic verification against host-side shape and trip-count arithmetic; ESBMC's integer division-by-zero check fires at pool_size == 0, and --multi-property enumerates both the sz_hout and sz_wout floor-divs in one run (witness pool_size = 0, CWE-369).
Full methodology, harness, regression suite, and audit notes (Finding 16):
https://github.com/lucasccordeiro/AWS-Neuron
neuronx-cc version used
Not applicable — the reproducer is pure CPython against nki-samples source at commit 0dd1b48b612c6a1604a50bc6d034b010739e508d (current main; avgpool kernel unchanged since the NKI 0.3.0 / Neuron SDK 2.29 release commit a87aaa44f7b26241bdb152af8838e287669c3947). No neuronx-cc invocation is required to trigger the defect.
Framework(s) and their versions used (JAX, PyTorch, etc..)
- Python 3.14.5 (latest stable; the standalone reproducer is version-independent) - ESBMC 8.3.0 (Python frontend, default solver Bitwuzla 0.9.0) — used to produce the symbolic counterexample - nki-samples @
0dd1b48b612c6a1604a50bc6d034b010739e508d (current main HEAD, 2026-05-19)
Describe the bug
Dear AWS Neuron Team,
This is Lucas Cordeiro (https://ssvlab.github.io/lucasccordeiro/) again; thank you for merging #126 (closing #125). This is a small, lower-severity follow-on of the same class (host-side arithmetic with an admissible-but-unhandled input), surfaced by the same mechanical sweep of
// <param>expressions across the kernel set.tutorials/average_pool2d/tensor_avgpool_kernel(in_tensor, pool_size)divides by zero at trace time ifpool_sizeis set to0. Thepool_sizeparameter is untyped, has no default, and the body adds nopool_size >= 1precondition, sopool_size == 0is admissible from the public signature.The kernel is unchanged on the current
mainbranch (commit0dd1b48b612c6a1604a50bc6d034b010739e508d, 2026-05-19) — PR #126 fixed only the interpolate kernels and left this one untouched. It is byte-identical to the NKI 0.3.0 / Neuron SDK 2.29 release commita87aaa44f7b26241bdb152af8838e287669c3947. Insrc/nki_samples/tutorials/average_pool2d/average_pool2d_nki_kernels.py:The first floor-div (
sz_hout, line 28) is the trace-time crash point; thesz_woutfloor-div (line 29), the two.ap()stride divisions (lines 51–52), and the1.0 / (pool_size * pool_size)reciprocal (line 59) are all equally unsafe atpool_size == 0and would fire if line 28 were guarded individually.Expected Behavior
pool_size <= 0is rejected at the API boundary with a clear message specifying the violated precondition, e.g.,AssertionError: pool_size must be >= 1, rather than an uncaughtZeroDivisionErrorin the kernel body.Current Behavior
pool_size == 0raisesZeroDivisionError: division by zeroat JIT trace time, fromsz_hout = sz_hin // pool_size(line 28), before any device-side codegen. In a full@nki.jittrace, the exception propagates out oftensor_avgpool_kernel.Reproduction Steps
Pure host-side Python; reproduces in plain CPython without a Trainium device or
neuronx-cc:Actual traceback (CPython 3.14):
This is the exact expression at
average_pool2d_nki_kernels.py:28; the kernel body never executes underpool_size == 0.A symbolic counterexample (witness from ESBMC-Python against a stubbed port) that enumerates both floor-div sites in a single run via
--multi-propertyis at:ESBMC output (trimmed; reproducer lines 24/25 mirror upstream
sz_hout/sz_woutat lines 28/29):Regression Issue
Possible Solution
Add a one-line precondition (and, optionally, a type annotation documenting it), mirroring the precondition style adopted in #126 for the interpolate kernels:
Additional Information/Context
Found while extending our formal verifier ESBMC-Python (https://github.com/esbmc/esbmc) with a proof-of-concept verification flow for NKI kernels. The same
// <param>sweep that surfaced #125 (interpolate,chunk_size == 1) flags this site. The flow compiles a thin stub ofnki.languageand runs symbolic verification against host-side shape and trip-count arithmetic; ESBMC's integer division-by-zero check fires atpool_size == 0, and--multi-propertyenumerates both thesz_houtandsz_woutfloor-divs in one run (witnesspool_size = 0, CWE-369).Full methodology, harness, regression suite, and audit notes (Finding 16):
https://github.com/lucasccordeiro/AWS-Neuron
neuronx-cc version used
Not applicable — the reproducer is pure CPython against nki-samples source at commit
0dd1b48b612c6a1604a50bc6d034b010739e508d(currentmain; avgpool kernel unchanged since the NKI 0.3.0 / Neuron SDK 2.29 release commita87aaa44f7b26241bdb152af8838e287669c3947). Noneuronx-ccinvocation is required to trigger the defect.Framework(s) and their versions used (JAX, PyTorch, etc..)
0dd1b48b612c6a1604a50bc6d034b010739e508d(currentmainHEAD, 2026-05-19)