Skip to content

Fix flaky gamma_inc partials test#822

Draft
ChrisRackauckas-Claude wants to merge 1 commit into
JuliaDiff:masterfrom
ChrisRackauckas-Claude:fix-gamma-inc-flaky-test
Draft

Fix flaky gamma_inc partials test#822
ChrisRackauckas-Claude wants to merge 1 commit into
JuliaDiff:masterfrom
ChrisRackauckas-Claude:fix-gamma-inc-flaky-test

Conversation

@ChrisRackauckas-Claude

Copy link
Copy Markdown
Contributor

Summary

The gamma_inc partials assertions in test/DualTest.jl fail sporadically on CI —
roughly one job per full-matrix run, on arbitrary OS/Julia-version combinations (e.g.
"Julia lts - windows-latest" in the 2026-06-22 master run, "Julia 1 - ubuntu-latest" in
the 2026-06-07 run), always of the form

partials(pq[i]) ≈ PARTIALS * Calculus.derivative(x -> gamma_inc(a, x, ind...)[i], 1 + PRIMAL) rtol=tol

with V === Float32 and random a/PRIMAL narrowly missing rtol = 5f-4.

Diagnosis

The reference, not ForwardDiff, is at fault, for two reasons:

  1. For V === Float32, Calculus.derivative central-differences the Float32
    evaluation of gamma_inc, whose finite-difference error is of the same order as the
    5f-4 tolerance — so random draws sporadically cross it (~4% of draws, measured).
  2. For ind = 1 / ind = 2, gamma_inc deliberately evaluates a reduced-accuracy
    approximation (~14 / ~6 significant digits). Finite-differencing a function with
    ~1e-6 intrinsic noise amplifies that noise by 1/h; the measured reference error
    reaches 2000–3000% of the true derivative in the worst draws. The existing
    tol^(1/2^ind) loosening (up to rtol ≈ 0.15) was compensating for this.

In every sampled failing draw, ForwardDiff's Float32 derivative was within 4e-8
relative of the true derivative (Float64 dual evaluation), while the finite-difference
reference was off by 2%–3000%.

Fix

ForwardDiff's rule (src/dual.jl) is the analytic exp(-x)·x^(a-1)/Γ(a), independent
of ind. So compare the partials against a finite difference of the full-accuracy
(ind = 0) Float64
evaluation, at the base tolerance — the ind-dependent
loosening remains only on the value comparison, where it belongs (values genuinely
differ by the requested approximation accuracy). Float64(1 + PRIMAL) converts after
the V-precision addition, so the reference is evaluated at exactly the primal of the
dual input.

Measured over 300000 random (a, PRIMAL, PARTIALS) draws × all ind variants
(2.4M assertions): old reference fails ~4% of draws; new reference fails 0, even at
the base (unloosened) tolerance.

test/DualTest.jl passes locally (Julia 1.12, --check-bounds=yes).


Note

Opened as a draft by an agent on behalf of @ChrisRackauckas. Please ignore
until reviewed by @ChrisRackauckas.

🤖 Generated with Claude Code

The partials assertions compared ForwardDiff's analytic gamma_inc
derivative against Calculus.jl central differences of the V-precision,
ind-approximation function. Two problems: for V === Float32 the finite
difference of a Float32 function has error comparable to the 5f-4
tolerance, and for ind = 1/2 gamma_inc itself is a reduced-accuracy
approximation whose intrinsic noise the finite difference amplifies far
beyond any tolerance. With random `a`/`PRIMAL` this failed sporadically
(~4% of random draws; roughly one CI job per master run).

ForwardDiff's rule is exp(-x)x^(a-1)/gamma(a), independent of `ind`, so
compare against a finite difference of the full-accuracy (ind = 0)
Float64 evaluation instead, at the *base* tolerance (the ind-dependent
loosening is only needed for the value comparison, which keeps it).

Measured over 300000 random (a, PRIMAL, PARTIALS) draws x all ind
variants (2.4M assertions): old reference fails ~4% of draws, new
reference fails 0 - while every observed old-reference failure had the
Float32 dual derivative within 4e-8 of the true derivative, i.e. the
reference, not ForwardDiff, was wrong.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Vx7zQ96NYk4VV4ML2s3kAC
@codecov

codecov Bot commented Jul 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.74%. Comparing base (090ddbb) to head (cb0e8ac).

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #822   +/-   ##
=======================================
  Coverage   90.74%   90.74%           
=======================================
  Files          11       11           
  Lines        1070     1070           
=======================================
  Hits          971      971           
  Misses         99       99           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ChrisRackauckas-Claude

Copy link
Copy Markdown
Contributor Author

A fresh occurrence of this flake just hit CI on #823 ("Julia min-patch - macOS - NaN-safe disabled", 2026-07-03): Partials(0.0029103528, 0.0038809243, 0.002506475) ≈ Partials(0.0029121826, 0.0038833646, 0.0025080508) at rtol=0.0005 — a ~6e-4 relative miss of the Float32 finite-difference reference, exactly the failure mode this PR removes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants