Skip to content

Conversation

@KristofferC
Copy link

HyperHessian (https://github.com/KristofferC/HyperHessians.jl) is a forward mode AD package that specializes in second-order derivatives (Hessians). I initially made it like 4 years ago as an experiment to see what would happen if you made an AD package using hyper duals over nested duals for second order derivative but only recently freshened it up to be a proper package. The hyper dual number approach does have some extra flexibility over nested duals, which allows it to squeeze out some more performance over e.g. ForwardDiff:

Hessian:

Function input length Time ForwardDiff Time HyperHessians Speedup
ackley 1 50.215 ns 34.872 ns 1.4
ackley 8 753.440 ns 292.278 ns 2.6
ackley 128 2.026 ms 494.049 μs 4.1
ackley 1024 1.062 s 236.522 ms 4.5
rosenbrock_1 1 24.240 ns 7.756 ns 3.1
rosenbrock_1 8 908.258 ns 298.307 ns 3.0
rosenbrock_1 128 3.163 ms 576.477 μs 5.5
rosenbrock_1 1024 1.704 s 280.535 ms 6.1

Hvp

Function input length Time ForwardDiff Time HyperHessians Speedup
rosenbrock_1 8 198.708 ns 127.058 ns 1.6
rosenbrock_1 16 893.167 ns 555.688 ns 1.6
rosenbrock_1 128 46.955 μs 32.693 μs 1.4
rosenbrock_1 1024 2.897 ms 2.089 ms 1.4
ackley 8 297.756 ns 124.167 ns 2.4
ackley 16 1.138 μs 464.128 ns 2.5
ackley 128 46.364 μs 24.874 μs 1.9
ackley 1024 2.807 ms 1.559 ms 1.8

(note that I am not claiming that it is impossible to improve ForwardDiff further w.r.t hessians, I am just saying that these are the numbers as of right now)

The package is not registered yet, but I thought I would open this for some feedback anyway. Note that some of the stuff in here was written with an LLM because figuring out exactly what to overload, etc, was tricky. Hopefully, it is a good starting point to iterate on. I also haven't checked that any overhead of DI vs raw HyperHessians is negligible yet.

@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 92.13483% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 2.93%. Comparing base (0dd1abf) to head (6f2f234).

Files with missing lines Patch % Lines
...ansExt/DifferentiationInterfaceHyperHessiansExt.jl 91.35% 7 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (0dd1abf) and HEAD (6f2f234). Click for more details.

HEAD has 60 uploads less than BASE
Flag BASE (0dd1abf) HEAD (6f2f234)
DI 50 1
DIT 11 0
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #940       +/-   ##
==========================================
- Coverage   98.23%   2.93%   -95.30%     
==========================================
  Files         133     100       -33     
  Lines        7968    5582     -2386     
==========================================
- Hits         7827     164     -7663     
- Misses        141    5418     +5277     
Flag Coverage Δ
DI 2.93% <92.13%> (-96.08%) ⬇️
DIT ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gdalle
Copy link
Member

gdalle commented Dec 3, 2025

Thanks for opening a PR!
If you're going to iterate a lot, can you comment out the irrelevant tests in Test.yml (remove everything related to DIT, and toggle a comment on everything that is not your backend)? The CI suite of DI is extremely expensive

@gdalle
Copy link
Member

gdalle commented Dec 3, 2025

As far as knowing what to overload, maybe this page can help: https://juliadiff.org/DifferentiationInterface.jl/DifferentiationInterface/stable/dev/contributing/

@KristofferC
Copy link
Author

Thanks for opening a PR!
If you're going to iterate a lot, can you comment out the irrelevant tests in Test.yml (remove everything related to DIT, and toggle a comment on everything that is not your backend)? The CI suite of DI is extremely expensive

I did so, feel free to cancel the running tests on previous commit

@gdalle
Copy link
Member

gdalle commented Dec 3, 2025

You can also comment out the whole test-DI-Core and test-DIT jobs.
I wish I knew a smarter way to indicate which test sets to run, but it's hard to anticipate which parts of the codebase need to be re-tested based on a given change.

@KristofferC
Copy link
Author

At least the tests for the HyperHessian backend seems to pass.

@KristofferC
Copy link
Author

What's my next step? Do I need to add AutoHyperHessian to ADTypes.jl?

@gdalle
Copy link
Member

gdalle commented Dec 3, 2025

Maybe give me a few days to review this first? Also, it would be the first backend that implements only a few first-order operations, so I'm not sure how to document that

@KristofferC
Copy link
Author

Also, it would be the first backend that implements only a few first-order operations, so I'm not sure how to document that

Maybe the table of supported features is enough?

I could add gradient, etc, but the hyperdual number would then degenerate to a normal dual number, and it would feel a bit silly to just re-implement things that are functionally identical to ForwardDiff.

@gdalle
Copy link
Member

gdalle commented Dec 4, 2025

It's not a table of supported features, it's a table of which operators are implemented natively by the backend. For instance, Mooncake only implements a pullback, but thanks to DI's internal machinery you get a Jacobian operator for free. With this package, there is no pullback or pushforward so that machinery will fail

@gdalle
Copy link
Member

gdalle commented Dec 4, 2025

It's not necessarily a bad thing, just a new thing

Copy link
Member

@gdalle gdalle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this seems reasonable, the main decision to make now is whether you want to support Cache or not. I don't think it is crucial at first glance.

Could you maybe comment on #828 to see if I missed anything vis-a-vis creating a new backend? I should probably turn that issue into docs but I'd be curious about your (LLM-enhanced) experience

be set to a positive `Int` to override HyperHessians' chunk heuristic; `nothing`
lets HyperHessians choose.
"""
struct AutoHyperHessians{CS} <: ADTypes.AbstractADType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding a comment to remember we shouldn't merge this, since it belongs in ADTypes

| `AutoFiniteDifferences` |||
| `AutoForwardDiff` |||
| `AutoGTPSA` |||
| `AutoHyperHessians` |||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you support caches, that would require dealing with hyperdual element types

import DifferentiationInterface as DI
import .DI: AutoHyperHessians
using ADTypes: ForwardMode
using HyperHessians:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all of these public, as in, protected from API changes in minor or patch versions?

## Traits
DI.check_available(::DI.AutoHyperHessians) = true
DI.inplace_support(::DI.AutoHyperHessians) = DI.InPlaceSupported()
DI.hvp_mode(::DI.AutoHyperHessians) = DI.ForwardOverForward()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DI.hvp_mode(::DI.AutoHyperHessians) = DI.ForwardOverForward()

This is deduced automatically from the mode

chunk_from_backend(backend::DI.AutoHyperHessians, N::Integer, ::Type{T}) where {T} =
isnothing(backend.chunksize) ? Chunk(pickchunksize(N, T), T) : Chunk{backend.chunksize}()

function DI.pick_batchsize(backend::DI.AutoHyperHessians, x::AbstractArray)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing the same function for integer arguments instead of arrays

contexts::Vararg{DI.Context, C},
) where {C}
DI.check_prep(f, prep, backend, x, contexts...)
fc = DI.fix_tail(f, map(DI.unwrap, contexts)...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only work with Constant contexts (see the ForwardDiff extension for how I deal with Cache)

@gdalle
Copy link
Member

gdalle commented Dec 18, 2025

Also, the tests are failing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants