Unseed only the chunk in 3-arg seed! by ChrisRackauckas-Claude · Pull Request #821 · JuliaDiff/ForwardDiff.jl

ChrisRackauckas-Claude · 2026-07-03T16:45:28Z

Summary

Stacked on #816 (first commit is that PR; review only the last commit). Depends on #816 landing first.

In ForwardDiff 0.10, the chunk-mode "unseed" call seed!(duals, x, index, seed) wrote
exactly the N chunk elements starting at index. The 1.x rewrite changed it to write
from index to the end of the array:

idxs = Iterators.drop(structural_eachindex(duals, x), offset)
for idx in idxs   # no take(N) — runs to the end

Chunk mode calls this after every chunk (seed!(xdual, x, i) in
chunk_mode_gradient_expr / chunk_mode_jacobian_expr) purely to clear the chunk it
just seeded — the rest of the array is already unseeded (everything is zeroed once
up front, and every other chunk clears itself). So writing to the end does
Σ(n − i) ≈ n²/2N redundant dual writes per gradient/jacobian sweep. At n = 100000 with
chunk 12 and Dual{…,Float64,12} (104 bytes each) that is ~40 GB of memory traffic.

This restores the 0.10 behavior: write at most N elements starting at index
(Iterators.take(…, N) on the structural path, a clamped range on the dense path).
The 3-arg seed! form has no other callers in the package.

Benchmarks

Julia 1.12.4, gradient! with default chunk 12, median of 7 runs at n=100000:

	master	#816	#816 + this PR
`gradient!` n=1000	464 μs	344 μs	222 μs
`gradient!` n=100000	4.80 s	3.96 s	2.61 s

jacobian! is extraction-dominated at these sizes, so its unseed saving (~2%) is
within run-to-run noise.

Tests

No new tests: chunk-vs-vector-mode consistency (including non-divisible chunk sizes and
UpperTriangular/Diagonal inputs) is covered by the existing suite, which passes on
Julia 1.10 and 1.12 locally. Verified additionally that chunked results match
full-chunk references for n ∈ {5, 12, 13, 24, 25, 100} × chunk ∈ {1, 3, 5, 12}.

Note

Opened as a draft by an agent on behalf of @ChrisRackauckas. Please ignore
until reviewed by @ChrisRackauckas.

🤖 Generated with Claude Code

ForwardDiff 1.x seeds duals with scalar setindex! loops over structural_eachindex, which errors on GPU arrays ("scalar indexing is disallowed") — a regression vs 0.10's broadcast seeding. Add a fast path to the four seed! methods gated on duals isa DenseArray && isbitstype(V) && !Base.has_offset_axes(duals, x) using broadcast for the bulk writes and map! over contiguous views for the chunk writes. AbstractGPUArray <: DenseArray, so this restores GPU jacobians without a weak dependency, and it is faster than the scalar loop on the CPU as well: the structural path pays an O(index) Iterators.drop walk per chunk, i.e. O(n^2/N) per chunked sweep. map! (not broadcast) is used for the chunk writes because slicing the seeds tuple at runtime allocates, and for the index:end write because the broadcast dotview allocates under --check-bounds=yes on Julia 1.10. Structural wrappers, non-isbits values (unset-element handling), and offset axes keep using the structural path unchanged. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Vx7zQ96NYk4VV4ML2s3kAC

The chunk-mode unseed call seed!(xdual, x, i) only needs to clear the N-wide chunk seeded at i: the rest of the array is zeroed up front and every other chunk clears itself. ForwardDiff 0.10 wrote exactly N elements here; the 1.x rewrite made it write from i to the end of the array, i.e. O(n^2/2N) redundant dual writes per chunked gradient/jacobian sweep (~40 GB of memory traffic for gradient! of 100000 elements at chunk 12). Write at most N elements starting at index: Iterators.take(_, N) on the structural path, a clamped range on the dense path. The 3-arg seed! form has no other callers. gradient! n=1000 (chunk 12): 344 -> 222 us; n=100000: 3.96 -> 2.61 s. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Vx7zQ96NYk4VV4ML2s3kAC

codecov · 2026-07-03T16:51:44Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.81%. Comparing base (090ddbb) to head (bacc1a2).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #821      +/-   ##
==========================================
+ Coverage   90.74%   90.81%   +0.06%     
==========================================
  Files          11       11              
  Lines        1070     1089      +19     
==========================================
+ Hits          971      989      +18     
- Misses         99      100       +1

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ChrisRackauckas-Claude · 2026-07-03T17:04:28Z

CI: all 19 Julia 1 / lts / min-patch jobs green. The 6 "Julia pre" failures are the pre-existing Julia 1.13-rc1 + JET issue that also fails on master — root-cause analysis in #816 (comment). This commit introduces no new failures relative to the #816 branch it stacks on.

ChrisRackauckas and others added 2 commits July 3, 2026 12:18

ChrisRackauckas-Claude mentioned this pull request Jul 3, 2026

Seed dense arrays of isbits duals without scalar indexing (fixes GPU jacobians) #816

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unseed only the chunk in 3-arg seed!#821

Unseed only the chunk in 3-arg seed!#821
ChrisRackauckas-Claude wants to merge 2 commits into
JuliaDiff:masterfrom
ChrisRackauckas-Claude:narrow-unseed

ChrisRackauckas-Claude commented Jul 3, 2026

Uh oh!

codecov Bot commented Jul 3, 2026 •

edited

Loading

Uh oh!

ChrisRackauckas-Claude commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ChrisRackauckas-Claude commented Jul 3, 2026

Summary

Benchmarks

Tests

Uh oh!

codecov Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ChrisRackauckas-Claude commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jul 3, 2026 •

edited

Loading