Fix master test reds: BBO loss return, ensemble solve API, FlexiChains chain access#299
Draft
ChrisRackauckas-Claude wants to merge 4 commits into
Draft
Conversation
…s chain access
Three independent breakages from upstream SciML updates were turning the
Core and Datafit CI groups red on master:
1. Optimization loss return type (Datafit group). `l2loss`/`relative_l2loss`
returned `(tot_loss, sol)`. OptimizationBBO's BBO wrapper now strictly
requires the objective to return a `Float64`, so `global_datafit` errored
with "fitness function does NOT return the expected fitness type Float64".
The `sol` element of the tuple was never consumed by any caller, so the loss
functions now return only the scalar `tot_loss`. NLopt's `datafit` path
(which tolerated the tuple by taking `first`) is unaffected.
2. EnsembleProblem / ensemble solve (Core group). The vector-of-problems form
`EnsembleProblem([prob1, prob2, prob3])` is deprecated and broken in current
SciMLBase (the default prob_func passes the whole vector to the per-trajectory
solve), so `solve(enprob; saveat=1)` failed with a MethodError on
`init(::Vector{ODEProblem}, ::Nothing)`. Migrated `bayesian_ensemble` and the
ensemble test to the modern `prob_func = (prob, ctx) -> probs[ctx.sim_id]`
form, pass an explicit algorithm (`Tsit5()`) and `trajectories`, and access
per-trajectory solutions via `sol.u[i]` (EnsembleSolution's `length`/symbolic
`getindex` now flatten across all timepoints). `ensemble_weights` updated to
use `sol.u` accordingly.
3. Turing chain backend (Core + Datafit groups). Turing now returns a
`FlexiChains.VNChain`, which no longer supports `chain["pprior[i]"]` string
indexing. `bayesian_datafit` now extracts posterior samples with
`chain[@varname(pprior[i])]`.
Verified locally on Julia 1.12 against the master dependency set:
- global_datafit (BBO) recovers [2/3, 4/3, 1, 1]; datafit (NLopt) still passes.
- prob_func ensemble solve produces a Vector{ODESolution}; ensemble_weights runs.
- bayesian_datafit returns per-parameter posterior sample vectors with reduced
variance vs the prior (the Datafit test's assertion).
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…on ensemble prob_func Builds on the prior fixes in this branch (BBO scalar loss return, FlexiChains @varname extraction). Three remaining breakages from upstream SciML/Turing churn: 1. Datafit group (DynamicPPL 0.41). bayesianODE rejected non-successful solves with `Turing.DynamicPPL.acclogp!!(__varinfo__, -Inf)`, which no longer has a method when __varinfo__ is the AD-time `OnlyAccsVarInfo{...Dual...}`: MethodError: no method matching acclogp!!(::OnlyAccsVarInfo{...}, ::Float64) Replaced with the canonical sample-rejection idiom `@addlogprob! (; loglikelihood = -Inf)`, which dispatches correctly on the LogLikelihood accumulator under ForwardDiff. 2/3. Core + Downgrade groups (EnsembleProblem prob_func arity). The prior `(prob, ctx) -> probs[ctx.sim_id]` form only works on SciMLBase 3.x (2-arg `prob_func(prob, ctx)`). On the Downgrade floor (SciMLBase 2.55) EnsembleProblem still calls the legacy 3-arg `prob_func(prob, i, repeat)`, so it errored with MethodError: no method matching (::SciML#1#2)(::ODEProblem, ::Int64, ::Int64) Introduced `EnsembleProbForwarder(all_probs)`, a callable supporting both interfaces (integer index and `ctx.sim_id`), used as `bayesian_ensemble`'s prob_func. Storing `all_probs` lets callers read the trajectory count via `enprob.prob_func.all_probs` (the access the ensemble test already uses). The ensemble test's inline prob_func is likewise made arity-robust. Verified locally on Julia 1.12 against the master dependency set (SciMLBase 3.21, DynamicPPL 0.41.8, Turing 0.45, OptimizationBBO 0.4.7): - simple EnsembleProblem solve(enprob, Tsit5(); trajectories) recovers weights [0.2, 0.5, 0.3]; EnsembleProbForwarder dispatches on both (prob, i, repeat) and (prob, ctx); ensemble_weights runs. - bayesian_datafit (both t and (t, timeseries) forms) returns per-parameter posterior sample vectors with variance reduced vs the prior (the Datafit assertion). Not fixed here (upstream, reported separately): QA group's `Aqua.find_persistent_tasks_deps` fails ONLY on Julia 1.12 (passes on lts) because Pkg now honors OptimizationBBO 0.4.4-0.4.7's released relative-path `[sources] OptimizationBase = {path = "../OptimizationBase"}` and errors with "expected package OptimizationBase to exist at path .../OptimizationBBO/OptimizationBase". Same OptimizationBBO version passes on Julia 1.10. Capping OptimizationBBO < 0.4.4 would drag SciMLBase 3->2, ModelingToolkit 11->9, OptimizationBase 5->2 (major regression), so the correct fix is upstream (stop shipping [sources] in releases). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…on (Downgrade) The previous commit fixed the Core/Datafit reds on the current stack but two bayesian_datafit details still broke the Downgrade group (Turing 0.35 / MCMCChains 6 / DynamicPPL 0.30): 1. Chain sample extraction. `chain[@varname(pprior[i])]` works on newer Turing's FlexiChains.VNChain but errors on the legacy MCMCChains.Chains: MethodError: no method matching getindex(::MCMCChains.Chains{...}, ::VarName{:pprior, IndexLens{Tuple{Int64}}}) Added `_pprior_samples(chain, i)` which tries the VarName form and falls back to the legacy `chain["pprior[i]"]` string key, supporting both backends. 2. Sample rejection. `@addlogprob! (; loglikelihood = -Inf)` (NamedTuple form) only exists on DynamicPPL 0.41+. Switched to the scalar `@addlogprob! -Inf`, which is valid on both 0.30 (added to the log-prob) and 0.41 (routed to the LogLikelihood accumulator), and still avoids the removed `acclogp!!(__varinfo__, ::Float64)`. Verified the VarName→string fallback against a real MCMCChains.Chains (the exact type the Downgrade run reported), and re-verified bayesian_datafit on the current stack (Julia 1.12, Turing 0.45/FlexiChains): both forms pass the variance-reduction assertion, including runs that actually hit the rejection branch (solves aborting under ForwardDiff), with no acclogp!! error. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
….jl)
`_get_sensitivity`'s internal `prob_func(prob, i, repeat)` only defined the legacy
3-arg form, so on current SciMLBase (3.x) the EnsembleProblem solve called it with
the 2-arg `(prob, ctx)` form and errored:
MethodError: no method matching (::#prob_func#25)(::ODEProblem, ::EnsembleContext)
Added the `(prob, ctx) -> remake(...; p = ...[:, ctx.sim_id])` method alongside the
integer-index one (same cross-version pattern as the ensemble fix), and read
per-trajectory solutions via `sol.u[i]` (matching the EnsembleSolution access used
in src/ensemble.jl). This is the same upstream EnsembleProblem prob_func API change;
it surfaced in the Core group only after the ensemble.jl fix let Core run past it.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Master CI was red across Core, Datafit, Downgrade (and QA; Documentation has recovered). Investigation found several distinct root causes, all stemming from the SciML/Turing stack churn. This PR fixes the concrete API-break errors; the remaining reds are pre-existing performance/numerical issues documented below.
Branched off
main(b450a3c). PR #298 (AbstractMCMC import) is unrelated.Fixed by this PR
1.
l2loss/relative_l2lossreturn type (Datafit)Now return only the scalar
tot_loss— OptimizationBBO's BlackBoxOptim wrapper requires aFloat64objective; thesolelement was never consumed.2. DynamicPPL sample rejection (Datafit + Downgrade)
bayesianODErejected failed solves with the removedTuring.DynamicPPL.acclogp!!(__varinfo__, -Inf)(no method for the AD-timeOnlyAccsVarInfo{...Dual...}). Switched to the scalar@addlogprob! -Inf, valid on both DynamicPPL 0.30 (Downgrade) and 0.41 (current).3. Cross-version chain extraction (Datafit + Downgrade)
New
_pprior_samples(chain, i)trieschain[@varname(pprior[i])](newer Turing'sFlexiChains.VNChain) and falls back to the legacychain["pprior[i]"]string key (MCMCChains.Chainson the Downgrade stack). The@varname-only form broke Downgrade withMethodError: getindex(::MCMCChains.Chains, ::VarName); verified the fallback against a realMCMCChains.Chains.4. EnsembleProblem
prob_funcarity (Core + Downgrade)The deprecated vector-of-problems
EnsembleProblem([...])form is broken. A 2-arg(prob, ctx)form only works on SciMLBase 3.x; the Downgrade floor (SciMLBase 2.55) still calls the legacy 3-argprob_func(prob, i, repeat). Fixed in two places:bayesian_ensemble(src/ensemble.jl) now usesEnsembleProbForwarder(all_probs), a callable supporting both arities;enprob.prob_func.all_probsexposes the trajectory count._get_sensitivity(src/sensitivity.jl) gained the(prob, ctx)method (it previously crashed Core's sensitivity.jl withMethodError: (::#prob_func#25)(::ODEProblem, ::EnsembleContext)).Per-trajectory solutions read via
sol.u[i]throughout. The ensemble test's inlineprob_funcis made arity-robust the same way.CI result of these fixes
test/ensemble.jlruns end-to-end (weights recover[0.2,0.5,0.3];bayesian_ensemblebuilds 303 models; exit 0);bayesian_datafitboth forms pass the variance assertion incl. the rejection branch;sensitivity.jlno longer crashes (computes the Sobol GSA);basics.jl/examples.jlpass.NOT fixed — pre-existing issues (reproduce on clean
main, separate root causes)A. Core/Datafit are too slow → CI cancels/kills the jobs
With the API errors removed, the Bayesian/GSA paths now run to completion but are pathologically slow on the modern stack (Turing 0.45 NUTS + ForwardDiff; per-solve is fast at ~2.6 ms, the cost is the MCMC machinery):
bayesian_ensemble(Core ensemble.jl) ~50 min locally;get_sensitivitywithsamples=1000(Core sensitivity.jl) >40 min;bayesian_datafitatniter=3000+5000×nchains=4(Datafit) ~98 min on CI before the runner kills it. The Core/Datafit jobs exceed the CI window and are cancelled (The operation was canceled), not failing an assertion. This is a test-runtime regression from the stack churn, not a dep cap or code-adaptation fix, and I will not mask it by cutting iteration counts.B. Core
test/threshold.jl:123—optimal_parameter_thresholdoptimizer regressionReproduces deterministically on clean
main(threshold.jl/src untouched here).optimal_parameter_thresholduses NLoptGN_ISRES(global stochastic,maxtime-budgeted) and converges withret=XTOL_REACHEDtop1=-2, p2≈1.337, givingx(50)≈2.988 ≥ 2, so@test s2.u[end][1] < 2fails. Constraint-satisfying parameters that drivex(50)far below 2 exist (e.g.p2≈-0.1), so the threshold is achievable — the constrained optimizer no longer finds them within budget. Separate root cause (NLopt/Optimization constrained-solve behavior), not dep-cappable, and the assertion is not loosened.C. QA —
Aqua.find_persistent_tasks_deps(Julia 1.12 only;ltspasses)On Julia 1.12,
Pkg.develophonors OptimizationBBO 0.4.4–0.4.7's released relative-path[sources] OptimizationBase = {path = "../OptimizationBase"}and errors (expected package OptimizationBase to exist at path .../OptimizationBBO/OptimizationBase). Identical OptimizationBBO 0.4.7 passes on Julia 1.10. Upstream Optimization.jl packaging × Julia 1.12 Pkg change — not an EMA bug. Capping OptimizationBBO< 0.4.4would drag SciMLBase 3→2 / ModelingToolkit 11→9 / OptimizationBase 5→2 (major regression), so a cap is the wrong fix.Runic-formatted.
Please ignore until reviewed by @ChrisRackauckas