Fix master test reds: BBO loss return, ensemble solve API, FlexiChains chain access by ChrisRackauckas-Claude · Pull Request #299 · SciML/EasyModelAnalysis.jl

ChrisRackauckas-Claude · 2026-06-20T10:28:50Z

Master CI was red across Core, Datafit, Downgrade (and QA; Documentation has recovered). Investigation found several distinct root causes, all stemming from the SciML/Turing stack churn. This PR fixes the concrete API-break errors; the remaining reds are pre-existing performance/numerical issues documented below.

Branched off main (b450a3c). PR #298 (AbstractMCMC import) is unrelated.

Fixed by this PR

1. `l2loss`/`relative_l2loss` return type (Datafit)

Now return only the scalar tot_loss — OptimizationBBO's BlackBoxOptim wrapper requires a Float64 objective; the sol element was never consumed.

2. DynamicPPL sample rejection (Datafit + Downgrade)

bayesianODE rejected failed solves with the removed Turing.DynamicPPL.acclogp!!(__varinfo__, -Inf) (no method for the AD-time OnlyAccsVarInfo{...Dual...}). Switched to the scalar @addlogprob! -Inf, valid on both DynamicPPL 0.30 (Downgrade) and 0.41 (current).

3. Cross-version chain extraction (Datafit + Downgrade)

New _pprior_samples(chain, i) tries chain[@varname(pprior[i])] (newer Turing's FlexiChains.VNChain) and falls back to the legacy chain["pprior[i]"] string key (MCMCChains.Chains on the Downgrade stack). The @varname-only form broke Downgrade with MethodError: getindex(::MCMCChains.Chains, ::VarName); verified the fallback against a real MCMCChains.Chains.

4. EnsembleProblem `prob_func` arity (Core + Downgrade)

The deprecated vector-of-problems EnsembleProblem([...]) form is broken. A 2-arg (prob, ctx) form only works on SciMLBase 3.x; the Downgrade floor (SciMLBase 2.55) still calls the legacy 3-arg prob_func(prob, i, repeat). Fixed in two places:

bayesian_ensemble (src/ensemble.jl) now uses EnsembleProbForwarder(all_probs), a callable supporting both arities; enprob.prob_func.all_probs exposes the trajectory count.
_get_sensitivity (src/sensitivity.jl) gained the (prob, ctx) method (it previously crashed Core's sensitivity.jl with MethodError: (::#prob_func#25)(::ODEProblem, ::EnsembleContext)).
Per-trajectory solutions read via sol.u[i] throughout. The ensemble test's inline prob_func is made arity-robust the same way.

CI result of these fixes

Downgrade: GREEN (was red).
Documentation: GREEN.
Locally on Julia 1.12: test/ensemble.jl runs end-to-end (weights recover [0.2,0.5,0.3]; bayesian_ensemble builds 303 models; exit 0); bayesian_datafit both forms pass the variance assertion incl. the rejection branch; sensitivity.jl no longer crashes (computes the Sobol GSA); basics.jl/examples.jl pass.

NOT fixed — pre-existing issues (reproduce on clean `main`, separate root causes)

A. Core/Datafit are too slow → CI cancels/kills the jobs

With the API errors removed, the Bayesian/GSA paths now run to completion but are pathologically slow on the modern stack (Turing 0.45 NUTS + ForwardDiff; per-solve is fast at ~2.6 ms, the cost is the MCMC machinery): bayesian_ensemble (Core ensemble.jl) ~50 min locally; get_sensitivity with samples=1000 (Core sensitivity.jl) >40 min; bayesian_datafit at niter=3000+5000×nchains=4 (Datafit) ~98 min on CI before the runner kills it. The Core/Datafit jobs exceed the CI window and are cancelled (The operation was canceled), not failing an assertion. This is a test-runtime regression from the stack churn, not a dep cap or code-adaptation fix, and I will not mask it by cutting iteration counts.

B. Core `test/threshold.jl:123` — `optimal_parameter_threshold` optimizer regression

Reproduces deterministically on clean main (threshold.jl/src untouched here). optimal_parameter_threshold uses NLopt GN_ISRES (global stochastic, maxtime-budgeted) and converges with ret=XTOL_REACHED to p1=-2, p2≈1.337, giving x(50)≈2.988 ≥ 2, so @test s2.u[end][1] < 2 fails. Constraint-satisfying parameters that drive x(50) far below 2 exist (e.g. p2≈-0.1), so the threshold is achievable — the constrained optimizer no longer finds them within budget. Separate root cause (NLopt/Optimization constrained-solve behavior), not dep-cappable, and the assertion is not loosened.

C. QA — `Aqua.find_persistent_tasks_deps` (Julia 1.12 only; `lts` passes)

On Julia 1.12, Pkg.develop honors OptimizationBBO 0.4.4–0.4.7's released relative-path [sources] OptimizationBase = {path = "../OptimizationBase"} and errors (expected package OptimizationBase to exist at path .../OptimizationBBO/OptimizationBase). Identical OptimizationBBO 0.4.7 passes on Julia 1.10. Upstream Optimization.jl packaging × Julia 1.12 Pkg change — not an EMA bug. Capping OptimizationBBO < 0.4.4 would drag SciMLBase 3→2 / ModelingToolkit 11→9 / OptimizationBase 5→2 (major regression), so a cap is the wrong fix.

Runic-formatted.

Please ignore until reviewed by @ChrisRackauckas

@varname

…s chain access Three independent breakages from upstream SciML updates were turning the Core and Datafit CI groups red on master: 1. Optimization loss return type (Datafit group). `l2loss`/`relative_l2loss` returned `(tot_loss, sol)`. OptimizationBBO's BBO wrapper now strictly requires the objective to return a `Float64`, so `global_datafit` errored with "fitness function does NOT return the expected fitness type Float64". The `sol` element of the tuple was never consumed by any caller, so the loss functions now return only the scalar `tot_loss`. NLopt's `datafit` path (which tolerated the tuple by taking `first`) is unaffected. 2. EnsembleProblem / ensemble solve (Core group). The vector-of-problems form `EnsembleProblem([prob1, prob2, prob3])` is deprecated and broken in current SciMLBase (the default prob_func passes the whole vector to the per-trajectory solve), so `solve(enprob; saveat=1)` failed with a MethodError on `init(::Vector{ODEProblem}, ::Nothing)`. Migrated `bayesian_ensemble` and the ensemble test to the modern `prob_func = (prob, ctx) -> probs[ctx.sim_id]` form, pass an explicit algorithm (`Tsit5()`) and `trajectories`, and access per-trajectory solutions via `sol.u[i]` (EnsembleSolution's `length`/symbolic `getindex` now flatten across all timepoints). `ensemble_weights` updated to use `sol.u` accordingly. 3. Turing chain backend (Core + Datafit groups). Turing now returns a `FlexiChains.VNChain`, which no longer supports `chain["pprior[i]"]` string indexing. `bayesian_datafit` now extracts posterior samples with `chain[@varname(pprior[i])]`. Verified locally on Julia 1.12 against the master dependency set: - global_datafit (BBO) recovers [2/3, 4/3, 1, 1]; datafit (NLopt) still passes. - prob_func ensemble solve produces a Vector{ODESolution}; ensemble_weights runs. - bayesian_datafit returns per-parameter posterior sample vectors with reduced variance vs the prior (the Datafit test's assertion). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@varname

…on ensemble prob_func Builds on the prior fixes in this branch (BBO scalar loss return, FlexiChains @varname extraction). Three remaining breakages from upstream SciML/Turing churn: 1. Datafit group (DynamicPPL 0.41). bayesianODE rejected non-successful solves with `Turing.DynamicPPL.acclogp!!(__varinfo__, -Inf)`, which no longer has a method when __varinfo__ is the AD-time `OnlyAccsVarInfo{...Dual...}`: MethodError: no method matching acclogp!!(::OnlyAccsVarInfo{...}, ::Float64) Replaced with the canonical sample-rejection idiom `@addlogprob! (; loglikelihood = -Inf)`, which dispatches correctly on the LogLikelihood accumulator under ForwardDiff. 2/3. Core + Downgrade groups (EnsembleProblem prob_func arity). The prior `(prob, ctx) -> probs[ctx.sim_id]` form only works on SciMLBase 3.x (2-arg `prob_func(prob, ctx)`). On the Downgrade floor (SciMLBase 2.55) EnsembleProblem still calls the legacy 3-arg `prob_func(prob, i, repeat)`, so it errored with MethodError: no method matching (::SciML#1#2)(::ODEProblem, ::Int64, ::Int64) Introduced `EnsembleProbForwarder(all_probs)`, a callable supporting both interfaces (integer index and `ctx.sim_id`), used as `bayesian_ensemble`'s prob_func. Storing `all_probs` lets callers read the trajectory count via `enprob.prob_func.all_probs` (the access the ensemble test already uses). The ensemble test's inline prob_func is likewise made arity-robust. Verified locally on Julia 1.12 against the master dependency set (SciMLBase 3.21, DynamicPPL 0.41.8, Turing 0.45, OptimizationBBO 0.4.7): - simple EnsembleProblem solve(enprob, Tsit5(); trajectories) recovers weights [0.2, 0.5, 0.3]; EnsembleProbForwarder dispatches on both (prob, i, repeat) and (prob, ctx); ensemble_weights runs. - bayesian_datafit (both t and (t, timeseries) forms) returns per-parameter posterior sample vectors with variance reduced vs the prior (the Datafit assertion). Not fixed here (upstream, reported separately): QA group's `Aqua.find_persistent_tasks_deps` fails ONLY on Julia 1.12 (passes on lts) because Pkg now honors OptimizationBBO 0.4.4-0.4.7's released relative-path `[sources] OptimizationBase = {path = "../OptimizationBase"}` and errors with "expected package OptimizationBase to exist at path .../OptimizationBBO/OptimizationBase". Same OptimizationBBO version passes on Julia 1.10. Capping OptimizationBBO < 0.4.4 would drag SciMLBase 3->2, ModelingToolkit 11->9, OptimizationBase 5->2 (major regression), so the correct fix is upstream (stop shipping [sources] in releases). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@varname

…on (Downgrade) The previous commit fixed the Core/Datafit reds on the current stack but two bayesian_datafit details still broke the Downgrade group (Turing 0.35 / MCMCChains 6 / DynamicPPL 0.30): 1. Chain sample extraction. `chain[@varname(pprior[i])]` works on newer Turing's FlexiChains.VNChain but errors on the legacy MCMCChains.Chains: MethodError: no method matching getindex(::MCMCChains.Chains{...}, ::VarName{:pprior, IndexLens{Tuple{Int64}}}) Added `_pprior_samples(chain, i)` which tries the VarName form and falls back to the legacy `chain["pprior[i]"]` string key, supporting both backends. 2. Sample rejection. `@addlogprob! (; loglikelihood = -Inf)` (NamedTuple form) only exists on DynamicPPL 0.41+. Switched to the scalar `@addlogprob! -Inf`, which is valid on both 0.30 (added to the log-prob) and 0.41 (routed to the LogLikelihood accumulator), and still avoids the removed `acclogp!!(__varinfo__, ::Float64)`. Verified the VarName→string fallback against a real MCMCChains.Chains (the exact type the Downgrade run reported), and re-verified bayesian_datafit on the current stack (Julia 1.12, Turing 0.45/FlexiChains): both forms pass the variance-reduction assertion, including runs that actually hit the rejection branch (solves aborting under ForwardDiff), with no acclogp!! error. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

….jl) `_get_sensitivity`'s internal `prob_func(prob, i, repeat)` only defined the legacy 3-arg form, so on current SciMLBase (3.x) the EnsembleProblem solve called it with the 2-arg `(prob, ctx)` form and errored: MethodError: no method matching (::#prob_func#25)(::ODEProblem, ::EnsembleContext) Added the `(prob, ctx) -> remake(...; p = ...[:, ctx.sim_id])` method alongside the integer-index one (same cross-version pattern as the ensemble fix), and read per-trajectory solutions via `sol.u[i]` (matching the EnsembleSolution access used in src/ensemble.jl). This is the same upstream EnsembleProblem prob_func API change; it surfaced in the Core group only after the ensemble.jl fix let Core run past it. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ChrisRackauckas and others added 4 commits June 20, 2026 06:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix master test reds: BBO loss return, ensemble solve API, FlexiChains chain access#299

Fix master test reds: BBO loss return, ensemble solve API, FlexiChains chain access#299
ChrisRackauckas-Claude wants to merge 4 commits into
SciML:mainfrom
ChrisRackauckas-Claude:fix-ensemble-datafit-master-reds

ChrisRackauckas-Claude commented Jun 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ChrisRackauckas-Claude commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fixed by this PR

1. l2loss/relative_l2loss return type (Datafit)

2. DynamicPPL sample rejection (Datafit + Downgrade)

3. Cross-version chain extraction (Datafit + Downgrade)

4. EnsembleProblem prob_func arity (Core + Downgrade)

CI result of these fixes

NOT fixed — pre-existing issues (reproduce on clean main, separate root causes)

A. Core/Datafit are too slow → CI cancels/kills the jobs

B. Core test/threshold.jl:123 — optimal_parameter_threshold optimizer regression

C. QA — Aqua.find_persistent_tasks_deps (Julia 1.12 only; lts passes)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ChrisRackauckas-Claude commented Jun 20, 2026 •

edited

Loading

1. `l2loss`/`relative_l2loss` return type (Datafit)

4. EnsembleProblem `prob_func` arity (Core + Downgrade)

NOT fixed — pre-existing issues (reproduce on clean `main`, separate root causes)

B. Core `test/threshold.jl:123` — `optimal_parameter_threshold` optimizer regression

C. QA — `Aqua.find_persistent_tasks_deps` (Julia 1.12 only; `lts` passes)