Skip to content

PEtab v2 problem importer — the 'two-adapter' proof (first step: parameters table → FreeParameter/Prior) #407

@wshlavacek

Description

@wshlavacek

Status

  • Step 1 — parameters table → FreeParameter/Prior: ✅ DONE (commit f151914, ADR-0019). Dependency-free, registry-driven, 30 tests green. See the checklist at the bottom.
  • ⚠️ Spec correction: this issue was originally drafted against a v1-flavoured picture of PEtab v2. The current v2 spec differs materially (parameterScale removed; prior columns renamed; bounds truncate; richer catalog). The sections below are corrected; the original mapping table is superseded.

Motivation

The M2 modularization gave PyBNF first-class, registry-backed Prior (ADR-0010, pybnf/priors/) and NoiseModel (ADR-0011, pybnf/noise/) abstractions, deliberately PEtab-defaulted but not PEtab-bound (ADR-0004). The payoff that justifies that shape is a PEtab v2 problem importer: a thin adapter that reads a problem.yaml + its TSV tables + SBML model and produces the same internal objects a native .conf produces.

That makes it the "two-adapter" proof the refactor plan calls out — native .conf and a PEtab problem feeding one set of FreeParameter/Prior/NoiseModel/exp-data objects. If both adapters land on the same objects, the abstractions are right; if PEtab forces a special case, we learn where they're wrong.

This is an umbrella/tracking issue. It scopes the whole importer; each chunk splits into its own issue when work begins.

Spec correction — what PEtab v2 actually specifies

Verified against the live v2 data-format spec. Three premises in the original draft are now wrong:

  • parameterScale was removed entirely. v2 parameters are all in linear space; a scale change is expected to be done in the model file. PyBNF derives a parameter's Scale (Linear/Log10) from its prior family instead — so the original "natural-log scale gap" is moot.
  • Prior columns renamed to priorDistribution / priorParameters (from objectivePriorType / objectivePriorParameters). There is a single prior, used for the objective only (initializationPrior* was also removed).
  • Bounds truncate the prior, and the catalog is richer than the draft assumed: uniform, normal, laplace, log-normal, log-laplace, log-uniform, cauchy, gamma, exponential, chisquare, rayleigh. log-normal / log-laplace use the natural log.

What a PEtab v2 problem is

  • problem.yaml — references the model file(s) + the TSV tables
  • model — SBML (PyBNF already imports SBML/Antimony: SbmlModel, BngsimAntimony)
  • parameters.tsvparameterId, lowerBound, upperBound, nominalValue, estimate (true|false), priorDistribution, priorParameters (no parameterScale)
  • observables.tsvobservableId, observableFormula, noiseFormula, observableTransformation, noiseDistribution
  • measurements.tsvobservableId, simulationConditionId, measurement, time, …
  • conditions.tsv — per-condition parameter/species overrides

Mapping to PyBNF's existing abstractions (corrected for v2)

PEtab v2 concept PyBNF target Status
priorDistribution uniform / normal / laplace (linear) Uniform / Normal / Laplace family, Linear scale ✅ exact
log-uniform loguniform_var (Uniform × Log10); params are linear bounds ✅ exact (base-independent)
log-normal / log-laplace (natural log) lognormal_var / loglaplace_var; convert μ/ln10, σ/ln10 ✅ θ-distribution identical, no Jacobian (ADR-0003 — the scale lives in the sampling parameterization)
omitted prior + bounds uniform over [lowerBound, upperBound] ✅ matches v2's default-to-uniform rule
estimate = false (fixed) model constant, not a FreeParameter ⏭ later chunk (conditions / model overrides)
lowerBound / upperBound truncate the prior reflecting bounds exist only on Uniform ⚠️ Uniform truncates exactly (box intersection); truncation of an unbounded family raises (Step 1) — a [truncation feature] is a follow-up
cauchy, gamma, exponential, chisquare, rayleigh ⚠️ 5 families PyBNF lacks (catalog parity; the 1-param ones need grammar/arity work)
noiseDistribution normal / laplace Gaussian / (Laplace noise — only as a prior today) ⚠️ partial
observableTransformation lin / log / log10 NoiseModel additive-noise-scale axis (ADR-0011) ✅ partial
location = median (PEtab hardcodes) Location Interpretation axis ✅ exists
observableFormula / noiseFormula (sympy over model entities) ⚠️ biggest chunk: a formula layer

Step 1 — parameters table → Prior/FreeParameter (DONE)

pybnf/petab/parameters.py reads parameters.tsv and maps each estimated row to a FreeParameter carrying a Prior, driven by the prior registry (synthesizes the *_var keyword, validates against PRIOR_KEYWORD_MAP, builds through the FreeParameter constructor → bit-identical to the native .conf path — not a parallel mapping table). Dependency-free (stdlib csv; runs in the bngsim-less CI tier) behind a neutral PetabParameterRow seam, so a later petab-library adoption feeds the same mapping from Problem.parameter_df with no rewrite. PEtab/PyBNF boundaries are explicit NotImplementedErrors (the 5 unsupported families; unbounded-family truncation; estimate=false). Commit f151914, ADR-0019; 30 tests (equivalence across all 6 mappable families, a scipy lognorm sampling oracle for the natural-log conversion, the gap boundaries, the TSV reader).

Chunks (rough order, each its own issue when reached)

  • Step 1 — parameters → Prior/FreeParameter (✅ f151914, ADR-0019)
  • observables.tsv → NoiseModel selection + noiseDistribution/transformation → (family, scale-additive-on, location) mapping (ADR-0011)
  • measurements.tsv + conditions.tsv → PyBNF exp-data + per-condition model overrides
  • observableFormula / noiseFormula expression layer (sympy over model entities) — the largest piece; where the petab library is adopted as an optional extra (ADR-0019)
  • problem.yaml top-level wiring + SBML model load → a complete Configuration
  • End-to-end: import a small published PEtab benchmark problem and fit it
  • Catalog parity for the ⚠️ gaps: the 5 missing prior families (cauchy/gamma/exponential/chisquare/rayleigh); the truncation-of-unbounded feature; Laplace noise family

Notes / constraints

  • New runtime deps (petab, python-libsbml, sympy) must be hand-mirrored into .github/actions/setup-pybnf or the tests/integration CI tiers go red (the recurring single-sync-point gotcha). Decision (ADR-0019): petab is adopted as an optional extra (pybnf[petab]) at the formula/SBML chunk, not in core — Step 1 stays dependency-free.
  • Keep the importer simulator-free where possible so it runs in the bngsim-less CI tier.
  • Out-of-scope framing comes from dev/refactor-plan.md. Relevant ADRs: 0003 (no Jacobian), 0004 (PEtab-defaulted not -bound), 0010 (Prior), 0011 (NoiseModel), 0019 (importer Step 1).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions