Skip to content

Unseed only the chunk in 3-arg seed!#821

Draft
ChrisRackauckas-Claude wants to merge 2 commits into
JuliaDiff:masterfrom
ChrisRackauckas-Claude:narrow-unseed
Draft

Unseed only the chunk in 3-arg seed!#821
ChrisRackauckas-Claude wants to merge 2 commits into
JuliaDiff:masterfrom
ChrisRackauckas-Claude:narrow-unseed

Conversation

@ChrisRackauckas-Claude

Copy link
Copy Markdown
Contributor

Summary

Stacked on #816 (first commit is that PR; review only the last commit). Depends on #816 landing first.

In ForwardDiff 0.10, the chunk-mode "unseed" call seed!(duals, x, index, seed) wrote
exactly the N chunk elements starting at index. The 1.x rewrite changed it to write
from index to the end of the array:

idxs = Iterators.drop(structural_eachindex(duals, x), offset)
for idx in idxs   # no take(N) — runs to the end

Chunk mode calls this after every chunk (seed!(xdual, x, i) in
chunk_mode_gradient_expr / chunk_mode_jacobian_expr) purely to clear the chunk it
just seeded — the rest of the array is already unseeded (everything is zeroed once
up front, and every other chunk clears itself). So writing to the end does
Σ(n − i) ≈ n²/2N redundant dual writes per gradient/jacobian sweep. At n = 100000 with
chunk 12 and Dual{…,Float64,12} (104 bytes each) that is ~40 GB of memory traffic.

This restores the 0.10 behavior: write at most N elements starting at index
(Iterators.take(…, N) on the structural path, a clamped range on the dense path).
The 3-arg seed! form has no other callers in the package.

Benchmarks

Julia 1.12.4, gradient! with default chunk 12, median of 7 runs at n=100000:

master #816 #816 + this PR
gradient! n=1000 464 μs 344 μs 222 μs
gradient! n=100000 4.80 s 3.96 s 2.61 s

jacobian! is extraction-dominated at these sizes, so its unseed saving (~2%) is
within run-to-run noise.

Tests

No new tests: chunk-vs-vector-mode consistency (including non-divisible chunk sizes and
UpperTriangular/Diagonal inputs) is covered by the existing suite, which passes on
Julia 1.10 and 1.12 locally. Verified additionally that chunked results match
full-chunk references for n ∈ {5, 12, 13, 24, 25, 100} × chunk ∈ {1, 3, 5, 12}.


Note

Opened as a draft by an agent on behalf of @ChrisRackauckas. Please ignore
until reviewed by @ChrisRackauckas.

🤖 Generated with Claude Code

ChrisRackauckas and others added 2 commits July 3, 2026 12:18
ForwardDiff 1.x seeds duals with scalar setindex! loops over
structural_eachindex, which errors on GPU arrays ("scalar indexing is
disallowed") — a regression vs 0.10's broadcast seeding. Add a fast path
to the four seed! methods gated on

    duals isa DenseArray && isbitstype(V) && !Base.has_offset_axes(duals, x)

using broadcast for the bulk writes and map! over contiguous views for
the chunk writes. AbstractGPUArray <: DenseArray, so this restores GPU
jacobians without a weak dependency, and it is faster than the scalar
loop on the CPU as well: the structural path pays an O(index)
Iterators.drop walk per chunk, i.e. O(n^2/N) per chunked sweep.

map! (not broadcast) is used for the chunk writes because slicing the
seeds tuple at runtime allocates, and for the index:end write because
the broadcast dotview allocates under --check-bounds=yes on Julia 1.10.

Structural wrappers, non-isbits values (unset-element handling), and
offset axes keep using the structural path unchanged.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Vx7zQ96NYk4VV4ML2s3kAC
The chunk-mode unseed call seed!(xdual, x, i) only needs to clear the
N-wide chunk seeded at i: the rest of the array is zeroed up front and
every other chunk clears itself. ForwardDiff 0.10 wrote exactly N
elements here; the 1.x rewrite made it write from i to the end of the
array, i.e. O(n^2/2N) redundant dual writes per chunked
gradient/jacobian sweep (~40 GB of memory traffic for gradient! of
100000 elements at chunk 12).

Write at most N elements starting at index: Iterators.take(_, N) on the
structural path, a clamped range on the dense path. The 3-arg seed!
form has no other callers.

gradient! n=1000 (chunk 12): 344 -> 222 us; n=100000: 3.96 -> 2.61 s.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Vx7zQ96NYk4VV4ML2s3kAC
@codecov

codecov Bot commented Jul 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.81%. Comparing base (090ddbb) to head (bacc1a2).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #821      +/-   ##
==========================================
+ Coverage   90.74%   90.81%   +0.06%     
==========================================
  Files          11       11              
  Lines        1070     1089      +19     
==========================================
+ Hits          971      989      +18     
- Misses         99      100       +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ChrisRackauckas-Claude

Copy link
Copy Markdown
Contributor Author

CI: all 19 Julia 1 / lts / min-patch jobs green. The 6 "Julia pre" failures are the pre-existing Julia 1.13-rc1 + JET issue that also fails on master — root-cause analysis in #816 (comment). This commit introduces no new failures relative to the #816 branch it stacks on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants