Skip to content

interpolate_{bi,tri}linear_2x_fwd: ZeroDivisionError at trace time when chunk_size == 1 #125

@lucasccordeiro

Description

@lucasccordeiro

Describe the bug

Dear AWS Neuron Team,

This is Lucas Cordeiro (https://ssvlab.github.io/lucasccordeiro/). Thank you for the NKI samples repository — it has been a valuable resource for extending our verifier ESBMC-Python (https://github.com/esbmc/esbmc) to handle this codebase.

While reviewing the contributed interpolation kernels, I found that both interpolate_bilinear_2x_fwd and interpolate_trilinear_2x_fwd will divide by zero at trace time if chunk_size is set to 1.

The defect appears in the following files, based on upstream commit a87aaa44f7b26241bdb152af8838e287669c3947 (NKI 0.3.0 / Neuron SDK 2.29 release, 22 April 2026):

  • contributed/interpolate_bilinear_fwd.py, lines 59–62
  • contributed/interpolate_trilinear_fwd.py, lines 67–69

Both kernels use the same host-side trip-count expression:

wdw_size  = chunk_size
step_size = wdw_size - 1                                # == 0 when chunk_size == 1
for _ in nl.static_range(math.ceil((h_src - wdw_size) / step_size) + 1):
    ...

When chunk_size is 1, step_size becomes zero, and the division inside math.ceil(...) raises ZeroDivisionError.

Expected Behavior

Either:

  1. chunk_size == 1 is rejected at the API boundary with a clear AssertionError / ValueError naming the violated precondition (chunk_size >= 2), or
  2. chunk_size == 1 is supported as a meaningful "no-overlap" mode and the trip-count expression is rewritten to avoid the zero divisor.

The public type signatures (chunk_size: int = 10) admit chunk_size == 1, so the current behaviour — an uncaught ZeroDivisionError from inside the kernel body — might be surprising.

Current Behavior

chunk_size == 1 raises ZeroDivisionError: division by zero at JIT trace time, from inside the loop-bound expression. Stack trace from the minimal reproducer below:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero

In a full @nki.jit trace the same exception propagates out of interpolate_bilinear_2x_fwd / interpolate_trilinear_2x_fwd before any device-side codegen takes place.

Reproduction Steps

The defect is in pure host-side Python and reproduces in plain CPython without a Trainium device or neuronx-cc:

import math

chunk_size = 1                  # the bug
x_src      = 5                  # any positive int

wdw_size   = chunk_size
step_size  = wdw_size - 1       # 0

n = math.ceil((x_src - wdw_size) / step_size) + 1
# ZeroDivisionError: division by zero

This is the exact expression at contributed/interpolate_bilinear_fwd.py:62 and contributed/interpolate_trilinear_fwd.py:69; the bodies of both kernels never execute under chunk_size == 1.

A symbolic counterexample (witness produced by ESBMC-Python against a stubbed port of the kernels) and the corresponding @nki.jit entry-point harnesses are at:

Regression Issue

  • Select this option if this issue appears to be a regression.

Possible Solution

Add the missing precondition early in each kernel and (optionally) handle the single-window regime explicitly:

assert chunk_size >= 2, "chunk_size must be >= 2 (step_size must be positive)"
assert h_src     >= 1, "h_src must be positive"

wdw_size  = chunk_size
step_size = wdw_size - 1
n_iter    = 1 if h_src <= wdw_size else math.ceil((h_src - wdw_size) / step_size) + 1
for h in nl.static_range(n_iter):
    ...

(Equivalent change for d_src in interpolate_trilinear_fwd.py.) If the design intent is that chunk_size == 1 should be a meaningful "no-overlap" mode, the right fix is different — replace step_size = chunk_size - 1 with the desired stride — but as the code stands today chunk_size == 1 cannot work under any reading.

Additional Information/Context

Found while extending our formal verifier ESBMC-Python (https://github.com/esbmc/esbmc) with a proof-of-concept verification flow for NKI kernels. The flow compiles a thin stub of nki.language and runs symbolic verification against host-side shape and trip-count arithmetic.

A related but distinct boundary case — h_src == 1 (or d_src == 1) with chunk_size >= 2 — causes the same outer loop to be zero-trip: math.ceil((1 - chunk_size) / (chunk_size - 1)) + 1 == 0. The kernel allocates dst_arr via nl.ndarray(...) but never writes to it, so the caller receives a correctly-shaped tensor with no kernel writes (no exception). I have not filed this as a separate issue yet because shape-and-bounds verification does not flag it, but the precondition h_src >= chunk_size (or an explicit single-window branch as above) would close it.

Full methodology, harnesses, regression suite (interpolate_bilinear_chunk1, interpolate_trilinear_chunk1 pin chunk_size == 1), and audit notes (Finding 15):

https://github.com/lucasccordeiro/AWS-Neuron

neuronx-cc version used

Not applicable — the reproducer is pure CPython against nki-samples source at commit a87aaa44f7b26241bdb152af8838e287669c3947 (NKI 0.3.0, Neuron SDK 2.29). No neuronx-cc invocation is required to trigger the defect.

Framework(s) and their versions used (JAX, PyTorch, etc..)

  • Python 3.14.4 (for the standalone reproducer) - ESBMC 8.3.0 (Python frontend) — used to produce the symbolic counterexample - nki-samples @ a87aaa44f7b26241bdb152af8838e287669c3947 (NKI 0.3.0)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions