`interpolate_{bi,tri}linear_2x_fwd`: `ZeroDivisionError` at trace time when `chunk_size == 1`

### Describe the bug

Dear AWS Neuron Team,

This is Lucas Cordeiro (https://ssvlab.github.io/lucasccordeiro/). Thank you for the NKI samples repository — it has been a valuable resource for extending our verifier ESBMC-Python (https://github.com/esbmc/esbmc) to handle this codebase.

While reviewing the contributed interpolation kernels, I found that both `interpolate_bilinear_2x_fwd` and `interpolate_trilinear_2x_fwd` will divide by zero at trace time if `chunk_size` is set to `1`.

The defect appears in the following files, based on upstream commit `a87aaa44f7b26241bdb152af8838e287669c3947` (NKI 0.3.0 / Neuron SDK 2.29 release, 22 April 2026):

- `contributed/interpolate_bilinear_fwd.py`, lines 59–62
- `contributed/interpolate_trilinear_fwd.py`, lines 67–69

Both kernels use the same host-side trip-count expression:

```python
wdw_size  = chunk_size
step_size = wdw_size - 1                                # == 0 when chunk_size == 1
for _ in nl.static_range(math.ceil((h_src - wdw_size) / step_size) + 1):
    ...
```

When `chunk_size` is `1`, `step_size` becomes zero, and the division inside `math.ceil(...)` raises `ZeroDivisionError`.

### Expected Behavior

Either:

1. `chunk_size == 1` is rejected at the API boundary with a clear `AssertionError` / `ValueError` naming the violated precondition (`chunk_size >= 2`), or
2. `chunk_size == 1` is supported as a meaningful "no-overlap" mode and the trip-count expression is rewritten to avoid the zero divisor.

The public type signatures (`chunk_size: int = 10`) admit `chunk_size == 1`, so the current behaviour — an uncaught `ZeroDivisionError` from inside the kernel body — might be surprising.

### Current Behavior

`chunk_size == 1` raises `ZeroDivisionError: division by zero` at JIT trace time, from inside the loop-bound expression. Stack trace from the minimal reproducer below:

```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero
```

In a full `@nki.jit` trace the same exception propagates out of `interpolate_bilinear_2x_fwd` / `interpolate_trilinear_2x_fwd` before any device-side codegen takes place.

### Reproduction Steps

The defect is in pure host-side Python and reproduces in plain CPython without a Trainium device or `neuronx-cc`:

```python
import math

chunk_size = 1                  # the bug
x_src      = 5                  # any positive int

wdw_size   = chunk_size
step_size  = wdw_size - 1       # 0

n = math.ceil((x_src - wdw_size) / step_size) + 1
# ZeroDivisionError: division by zero
```

This is the exact expression at `contributed/interpolate_bilinear_fwd.py:62` and `contributed/interpolate_trilinear_fwd.py:69`; the bodies of both kernels never execute under `chunk_size == 1`.

A symbolic counterexample (witness produced by ESBMC-Python against a stubbed port of the kernels) and the corresponding `@nki.jit` entry-point harnesses are at:

- <https://github.com/lucasccordeiro/AWS-Neuron/blob/main/harness/interpolate_bilinear_chunk1.py>
- <https://github.com/lucasccordeiro/AWS-Neuron/blob/main/harness/interpolate_trilinear_chunk1.py>

### Regression Issue

- [ ] Select this option if this issue appears to be a regression.

### Possible Solution

Add the missing precondition early in each kernel and (optionally) handle the single-window regime explicitly:

```python
assert chunk_size >= 2, "chunk_size must be >= 2 (step_size must be positive)"
assert h_src     >= 1, "h_src must be positive"

wdw_size  = chunk_size
step_size = wdw_size - 1
n_iter    = 1 if h_src <= wdw_size else math.ceil((h_src - wdw_size) / step_size) + 1
for h in nl.static_range(n_iter):
    ...
```

(Equivalent change for `d_src` in `interpolate_trilinear_fwd.py`.) If the design intent is that `chunk_size == 1` should be a meaningful "no-overlap" mode, the right fix is different — replace `step_size = chunk_size - 1` with the desired stride — but as the code stands today `chunk_size == 1` cannot work under any reading.


### Additional Information/Context

Found while extending our formal verifier ESBMC-Python (https://github.com/esbmc/esbmc) with a proof-of-concept verification flow for NKI kernels. The flow compiles a thin stub of `nki.language` and runs symbolic verification against host-side shape and trip-count arithmetic.

A related but distinct boundary case — `h_src == 1` (or `d_src == 1`) with `chunk_size >= 2` — causes the same outer loop to be *zero-trip*: `math.ceil((1 - chunk_size) / (chunk_size - 1)) + 1 == 0`. The kernel allocates `dst_arr` via `nl.ndarray(...)` but never writes to it, so the caller receives a correctly-shaped tensor with no kernel writes (no exception). I have not filed this as a separate issue yet because shape-and-bounds verification does not flag it, but the precondition `h_src >= chunk_size` (or an explicit single-window branch as above) would close it.

Full methodology, harnesses, regression suite (`interpolate_bilinear_chunk1`, `interpolate_trilinear_chunk1` pin `chunk_size == 1`), and audit notes (Finding 15):

<https://github.com/lucasccordeiro/AWS-Neuron>

### neuronx-cc version used

Not applicable — the reproducer is pure CPython against nki-samples source at commit `a87aaa44f7b26241bdb152af8838e287669c3947` (NKI 0.3.0, Neuron SDK 2.29). No `neuronx-cc` invocation is required to trigger the defect.

### Framework(s) and their versions used (JAX, PyTorch, etc..)

- Python 3.14.4 (for the standalone reproducer) - ESBMC 8.3.0 (Python frontend) — used to produce the symbolic counterexample - nki-samples @ `a87aaa44f7b26241bdb152af8838e287669c3947` (NKI 0.3.0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`interpolate_{bi,tri}linear_2x_fwd`: `ZeroDivisionError` at trace time when `chunk_size == 1` #125

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Regression Issue

Possible Solution

Additional Information/Context

neuronx-cc version used

Framework(s) and their versions used (JAX, PyTorch, etc..)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

interpolate_{bi,tri}linear_2x_fwd: ZeroDivisionError at trace time when chunk_size == 1 #125

Description

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Regression Issue

Possible Solution

Additional Information/Context

neuronx-cc version used

Framework(s) and their versions used (JAX, PyTorch, etc..)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`interpolate_{bi,tri}linear_2x_fwd`: `ZeroDivisionError` at trace time when `chunk_size == 1` #125