You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Part of WS1 — Full Batch-Invariant Forward Chain (epic: #)
Why
A forward-aligned chain still breaks training if gradient reductions drift with batch shape — the optimizer then sees batch-dependent gradients and the run diverges even though inference looked aligned. Backward invariance is a separate, explicit requirement, not something the forward checks cover. This issue makes "backward also invariant" a first-class, cross-op acceptance condition.
Scope
Make batch-invariant backward a required, tested property of every WS1 op.
Ensure each op's gradient reduction (dx, dweight, dW, etc.) uses a fixed, batch-shape-independent order — no atomicAdd in backward.
Cover the most reduction-heavy backward first: RMSNorm dweight, GEMM dW/dX, attention backward, embedding-grad scatter.
Validate gradients across batch=1/N, chunked-prefill on/off, and padding layouts.
Out of scope
Re-implementing each op (each op issue owns its own backward kernel and fix; this issue owns the cross-cutting requirement, reusable gradient check, and status matrix).
No op uses atomicAdd or shape-dependent accumulation in its backward path.
A short matrix records, per op, both forward-pass and backward-pass invariance status; gradient-drift reports include max abs diff, relative diff, tensor name, and first failing op.
Best owned as a shared concern: this issue provides the gradient check and status matrix; each op issue is responsible for fixing its own backward path and passing the check.
Part of WS1 — Full Batch-Invariant Forward Chain (epic: #)
Why
A forward-aligned chain still breaks training if gradient reductions drift with batch shape — the optimizer then sees batch-dependent gradients and the run diverges even though inference looked aligned. Backward invariance is a separate, explicit requirement, not something the forward checks cover. This issue makes "backward also invariant" a first-class, cross-op acceptance condition.
Scope
Make batch-invariant backward a required, tested property of every WS1 op.
dx,dweight,dW, etc.) uses a fixed, batch-shape-independent order — noatomicAddin backward.dweight, GEMMdW/dX, attention backward, embedding-grad scatter.Out of scope
Acceptance criteria
atomicAddor shape-dependent accumulation in its backward path.Notes
Planned PRs
dweight, GEMMdW/dX, attention backward first)atomicAdd/ fixed-order accumulation in backward paths