Skip to content

Fix master test reds: BBO loss return, ensemble solve API, FlexiChains chain access#299

Draft
ChrisRackauckas-Claude wants to merge 4 commits into
SciML:mainfrom
ChrisRackauckas-Claude:fix-ensemble-datafit-master-reds
Draft

Fix master test reds: BBO loss return, ensemble solve API, FlexiChains chain access#299
ChrisRackauckas-Claude wants to merge 4 commits into
SciML:mainfrom
ChrisRackauckas-Claude:fix-ensemble-datafit-master-reds

Conversation

@ChrisRackauckas-Claude

@ChrisRackauckas-Claude ChrisRackauckas-Claude commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Master CI was red across Core, Datafit, Downgrade (and QA; Documentation has recovered). Investigation found several distinct root causes, all stemming from the SciML/Turing stack churn. This PR fixes the concrete API-break errors; the remaining reds are pre-existing performance/numerical issues documented below.

Branched off main (b450a3c). PR #298 (AbstractMCMC import) is unrelated.

Fixed by this PR

1. l2loss/relative_l2loss return type (Datafit)

Now return only the scalar tot_loss — OptimizationBBO's BlackBoxOptim wrapper requires a Float64 objective; the sol element was never consumed.

2. DynamicPPL sample rejection (Datafit + Downgrade)

bayesianODE rejected failed solves with the removed Turing.DynamicPPL.acclogp!!(__varinfo__, -Inf) (no method for the AD-time OnlyAccsVarInfo{...Dual...}). Switched to the scalar @addlogprob! -Inf, valid on both DynamicPPL 0.30 (Downgrade) and 0.41 (current).

3. Cross-version chain extraction (Datafit + Downgrade)

New _pprior_samples(chain, i) tries chain[@varname(pprior[i])] (newer Turing's FlexiChains.VNChain) and falls back to the legacy chain["pprior[i]"] string key (MCMCChains.Chains on the Downgrade stack). The @varname-only form broke Downgrade with MethodError: getindex(::MCMCChains.Chains, ::VarName); verified the fallback against a real MCMCChains.Chains.

4. EnsembleProblem prob_func arity (Core + Downgrade)

The deprecated vector-of-problems EnsembleProblem([...]) form is broken. A 2-arg (prob, ctx) form only works on SciMLBase 3.x; the Downgrade floor (SciMLBase 2.55) still calls the legacy 3-arg prob_func(prob, i, repeat). Fixed in two places:

  • bayesian_ensemble (src/ensemble.jl) now uses EnsembleProbForwarder(all_probs), a callable supporting both arities; enprob.prob_func.all_probs exposes the trajectory count.
  • _get_sensitivity (src/sensitivity.jl) gained the (prob, ctx) method (it previously crashed Core's sensitivity.jl with MethodError: (::#prob_func#25)(::ODEProblem, ::EnsembleContext)).
    Per-trajectory solutions read via sol.u[i] throughout. The ensemble test's inline prob_func is made arity-robust the same way.

CI result of these fixes

  • Downgrade: GREEN (was red).
  • Documentation: GREEN.
  • Locally on Julia 1.12: test/ensemble.jl runs end-to-end (weights recover [0.2,0.5,0.3]; bayesian_ensemble builds 303 models; exit 0); bayesian_datafit both forms pass the variance assertion incl. the rejection branch; sensitivity.jl no longer crashes (computes the Sobol GSA); basics.jl/examples.jl pass.

NOT fixed — pre-existing issues (reproduce on clean main, separate root causes)

A. Core/Datafit are too slow → CI cancels/kills the jobs

With the API errors removed, the Bayesian/GSA paths now run to completion but are pathologically slow on the modern stack (Turing 0.45 NUTS + ForwardDiff; per-solve is fast at ~2.6 ms, the cost is the MCMC machinery): bayesian_ensemble (Core ensemble.jl) ~50 min locally; get_sensitivity with samples=1000 (Core sensitivity.jl) >40 min; bayesian_datafit at niter=3000+5000×nchains=4 (Datafit) ~98 min on CI before the runner kills it. The Core/Datafit jobs exceed the CI window and are cancelled (The operation was canceled), not failing an assertion. This is a test-runtime regression from the stack churn, not a dep cap or code-adaptation fix, and I will not mask it by cutting iteration counts.

B. Core test/threshold.jl:123optimal_parameter_threshold optimizer regression

Reproduces deterministically on clean main (threshold.jl/src untouched here). optimal_parameter_threshold uses NLopt GN_ISRES (global stochastic, maxtime-budgeted) and converges with ret=XTOL_REACHED to p1=-2, p2≈1.337, giving x(50)≈2.988 ≥ 2, so @test s2.u[end][1] < 2 fails. Constraint-satisfying parameters that drive x(50) far below 2 exist (e.g. p2≈-0.1), so the threshold is achievable — the constrained optimizer no longer finds them within budget. Separate root cause (NLopt/Optimization constrained-solve behavior), not dep-cappable, and the assertion is not loosened.

C. QA — Aqua.find_persistent_tasks_deps (Julia 1.12 only; lts passes)

On Julia 1.12, Pkg.develop honors OptimizationBBO 0.4.4–0.4.7's released relative-path [sources] OptimizationBase = {path = "../OptimizationBase"} and errors (expected package OptimizationBase to exist at path .../OptimizationBBO/OptimizationBase). Identical OptimizationBBO 0.4.7 passes on Julia 1.10. Upstream Optimization.jl packaging × Julia 1.12 Pkg change — not an EMA bug. Capping OptimizationBBO < 0.4.4 would drag SciMLBase 3→2 / ModelingToolkit 11→9 / OptimizationBase 5→2 (major regression), so a cap is the wrong fix.

Runic-formatted.

Please ignore until reviewed by @ChrisRackauckas

ChrisRackauckas and others added 4 commits June 20, 2026 06:27
…s chain access

Three independent breakages from upstream SciML updates were turning the
Core and Datafit CI groups red on master:

1. Optimization loss return type (Datafit group). `l2loss`/`relative_l2loss`
   returned `(tot_loss, sol)`. OptimizationBBO's BBO wrapper now strictly
   requires the objective to return a `Float64`, so `global_datafit` errored
   with "fitness function does NOT return the expected fitness type Float64".
   The `sol` element of the tuple was never consumed by any caller, so the loss
   functions now return only the scalar `tot_loss`. NLopt's `datafit` path
   (which tolerated the tuple by taking `first`) is unaffected.

2. EnsembleProblem / ensemble solve (Core group). The vector-of-problems form
   `EnsembleProblem([prob1, prob2, prob3])` is deprecated and broken in current
   SciMLBase (the default prob_func passes the whole vector to the per-trajectory
   solve), so `solve(enprob; saveat=1)` failed with a MethodError on
   `init(::Vector{ODEProblem}, ::Nothing)`. Migrated `bayesian_ensemble` and the
   ensemble test to the modern `prob_func = (prob, ctx) -> probs[ctx.sim_id]`
   form, pass an explicit algorithm (`Tsit5()`) and `trajectories`, and access
   per-trajectory solutions via `sol.u[i]` (EnsembleSolution's `length`/symbolic
   `getindex` now flatten across all timepoints). `ensemble_weights` updated to
   use `sol.u` accordingly.

3. Turing chain backend (Core + Datafit groups). Turing now returns a
   `FlexiChains.VNChain`, which no longer supports `chain["pprior[i]"]` string
   indexing. `bayesian_datafit` now extracts posterior samples with
   `chain[@varname(pprior[i])]`.

Verified locally on Julia 1.12 against the master dependency set:
- global_datafit (BBO) recovers [2/3, 4/3, 1, 1]; datafit (NLopt) still passes.
- prob_func ensemble solve produces a Vector{ODESolution}; ensemble_weights runs.
- bayesian_datafit returns per-parameter posterior sample vectors with reduced
  variance vs the prior (the Datafit test's assertion).

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…on ensemble prob_func

Builds on the prior fixes in this branch (BBO scalar loss return, FlexiChains
@varname extraction). Three remaining breakages from upstream SciML/Turing churn:

1. Datafit group (DynamicPPL 0.41). bayesianODE rejected non-successful solves
   with `Turing.DynamicPPL.acclogp!!(__varinfo__, -Inf)`, which no longer has a
   method when __varinfo__ is the AD-time `OnlyAccsVarInfo{...Dual...}`:
       MethodError: no method matching acclogp!!(::OnlyAccsVarInfo{...}, ::Float64)
   Replaced with the canonical sample-rejection idiom `@addlogprob! (; loglikelihood = -Inf)`,
   which dispatches correctly on the LogLikelihood accumulator under ForwardDiff.

2/3. Core + Downgrade groups (EnsembleProblem prob_func arity). The prior
   `(prob, ctx) -> probs[ctx.sim_id]` form only works on SciMLBase 3.x (2-arg
   `prob_func(prob, ctx)`). On the Downgrade floor (SciMLBase 2.55) EnsembleProblem
   still calls the legacy 3-arg `prob_func(prob, i, repeat)`, so it errored with
       MethodError: no method matching (::SciML#1#2)(::ODEProblem, ::Int64, ::Int64)
   Introduced `EnsembleProbForwarder(all_probs)`, a callable supporting both
   interfaces (integer index and `ctx.sim_id`), used as `bayesian_ensemble`'s
   prob_func. Storing `all_probs` lets callers read the trajectory count via
   `enprob.prob_func.all_probs` (the access the ensemble test already uses). The
   ensemble test's inline prob_func is likewise made arity-robust.

Verified locally on Julia 1.12 against the master dependency set (SciMLBase 3.21,
DynamicPPL 0.41.8, Turing 0.45, OptimizationBBO 0.4.7):
- simple EnsembleProblem solve(enprob, Tsit5(); trajectories) recovers weights
  [0.2, 0.5, 0.3]; EnsembleProbForwarder dispatches on both (prob, i, repeat) and
  (prob, ctx); ensemble_weights runs.
- bayesian_datafit (both t and (t, timeseries) forms) returns per-parameter
  posterior sample vectors with variance reduced vs the prior (the Datafit assertion).

Not fixed here (upstream, reported separately): QA group's
`Aqua.find_persistent_tasks_deps` fails ONLY on Julia 1.12 (passes on lts) because
Pkg now honors OptimizationBBO 0.4.4-0.4.7's released relative-path
`[sources] OptimizationBase = {path = "../OptimizationBase"}` and errors with
"expected package OptimizationBase to exist at path .../OptimizationBBO/OptimizationBase".
Same OptimizationBBO version passes on Julia 1.10. Capping OptimizationBBO < 0.4.4
would drag SciMLBase 3->2, ModelingToolkit 11->9, OptimizationBase 5->2 (major
regression), so the correct fix is upstream (stop shipping [sources] in releases).

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…on (Downgrade)

The previous commit fixed the Core/Datafit reds on the current stack but two
bayesian_datafit details still broke the Downgrade group (Turing 0.35 /
MCMCChains 6 / DynamicPPL 0.30):

1. Chain sample extraction. `chain[@varname(pprior[i])]` works on newer Turing's
   FlexiChains.VNChain but errors on the legacy MCMCChains.Chains:
       MethodError: no method matching getindex(::MCMCChains.Chains{...}, ::VarName{:pprior, IndexLens{Tuple{Int64}}})
   Added `_pprior_samples(chain, i)` which tries the VarName form and falls back to
   the legacy `chain["pprior[i]"]` string key, supporting both backends.

2. Sample rejection. `@addlogprob! (; loglikelihood = -Inf)` (NamedTuple form) only
   exists on DynamicPPL 0.41+. Switched to the scalar `@addlogprob! -Inf`, which is
   valid on both 0.30 (added to the log-prob) and 0.41 (routed to the LogLikelihood
   accumulator), and still avoids the removed `acclogp!!(__varinfo__, ::Float64)`.

Verified the VarName→string fallback against a real MCMCChains.Chains (the exact
type the Downgrade run reported), and re-verified bayesian_datafit on the current
stack (Julia 1.12, Turing 0.45/FlexiChains): both forms pass the variance-reduction
assertion, including runs that actually hit the rejection branch (solves aborting
under ForwardDiff), with no acclogp!! error.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
….jl)

`_get_sensitivity`'s internal `prob_func(prob, i, repeat)` only defined the legacy
3-arg form, so on current SciMLBase (3.x) the EnsembleProblem solve called it with
the 2-arg `(prob, ctx)` form and errored:
    MethodError: no method matching (::#prob_func#25)(::ODEProblem, ::EnsembleContext)
Added the `(prob, ctx) -> remake(...; p = ...[:, ctx.sim_id])` method alongside the
integer-index one (same cross-version pattern as the ensemble fix), and read
per-trajectory solutions via `sol.u[i]` (matching the EnsembleSolution access used
in src/ensemble.jl). This is the same upstream EnsembleProblem prob_func API change;
it surfaced in the Core group only after the ensemble.jl fix let Core run past it.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants