PEtab v2 problem importer — the 'two-adapter' proof (first step: parameters table → FreeParameter/Prior)

## Status

- **Step 1 — `parameters` table → `FreeParameter`/`Prior`: ✅ DONE** (commit `f151914`, ADR-0019). Dependency-free, registry-driven, 30 tests green. See the checklist at the bottom.
- ⚠️ **Spec correction:** this issue was originally drafted against a v1-flavoured picture of PEtab v2. The current v2 spec differs materially (`parameterScale` removed; prior columns renamed; bounds truncate; richer catalog). The sections below are corrected; the original mapping table is superseded.

## Motivation

The M2 modularization gave PyBNF first-class, registry-backed **Prior** (ADR-0010, `pybnf/priors/`) and **NoiseModel** (ADR-0011, `pybnf/noise/`) abstractions, deliberately *PEtab-defaulted but not PEtab-bound* (ADR-0004). The payoff that justifies that shape is a **PEtab v2 problem importer**: a thin adapter that reads a `problem.yaml` + its TSV tables + SBML model and produces the *same* internal objects a native `.conf` produces.

That makes it the **"two-adapter" proof** the refactor plan calls out — native `.conf` and a PEtab problem feeding one set of `FreeParameter`/`Prior`/`NoiseModel`/exp-data objects. If both adapters land on the same objects, the abstractions are right; if PEtab forces a special case, we learn where they're wrong.

This is an **umbrella/tracking issue**. It scopes the whole importer; each chunk splits into its own issue when work begins.

## Spec correction — what PEtab v2 actually specifies

Verified against the live v2 data-format spec. Three premises in the original draft are now wrong:

- **`parameterScale` was removed entirely.** v2 parameters are all in linear space; a scale change is expected to be done in the model file. PyBNF derives a parameter's **Scale** (Linear/Log10) from its *prior family* instead — so the original "natural-log scale gap" is moot.
- **Prior columns renamed** to `priorDistribution` / `priorParameters` (from `objectivePriorType` / `objectivePriorParameters`). There is a **single** prior, used for the objective only (`initializationPrior*` was also removed).
- **Bounds truncate the prior**, and the catalog is richer than the draft assumed: `uniform, normal, laplace, log-normal, log-laplace, log-uniform, cauchy, gamma, exponential, chisquare, rayleigh`. `log-normal` / `log-laplace` use the **natural** log.

## What a PEtab v2 problem is

- `problem.yaml` — references the model file(s) + the TSV tables
- **model** — SBML (PyBNF already imports SBML/Antimony: `SbmlModel`, `BngsimAntimony`)
- `parameters.tsv` — `parameterId, lowerBound, upperBound, nominalValue, estimate (true|false), priorDistribution, priorParameters` (**no** `parameterScale`)
- `observables.tsv` — `observableId, observableFormula, noiseFormula, observableTransformation, noiseDistribution`
- `measurements.tsv` — `observableId, simulationConditionId, measurement, time, …`
- `conditions.tsv` — per-condition parameter/species overrides

## Mapping to PyBNF's existing abstractions (corrected for v2)

| PEtab v2 concept | PyBNF target | Status |
|---|---|---|
| `priorDistribution` `uniform` / `normal` / `laplace` (linear) | Uniform / Normal / Laplace family, Linear scale | ✅ exact |
| `log-uniform` | `loguniform_var` (Uniform × Log10); params are linear bounds | ✅ exact (base-independent) |
| `log-normal` / `log-laplace` (**natural** log) | `lognormal_var` / `loglaplace_var`; convert `μ/ln10, σ/ln10` | ✅ θ-distribution identical, **no Jacobian** (ADR-0003 — the scale lives in the sampling parameterization) |
| omitted prior + bounds | uniform over `[lowerBound, upperBound]` | ✅ matches v2's default-to-uniform rule |
| `estimate = false` (fixed) | model constant, **not** a `FreeParameter` | ⏭ later chunk (conditions / model overrides) |
| `lowerBound` / `upperBound` **truncate** the prior | reflecting bounds exist only on Uniform | ⚠️ Uniform truncates exactly (box intersection); truncation of an **unbounded** family raises (Step 1) — a [truncation feature] is a follow-up |
| `cauchy`, `gamma`, `exponential`, `chisquare`, `rayleigh` | — | ⚠️ **5 families PyBNF lacks** (catalog parity; the 1-param ones need grammar/arity work) |
| `noiseDistribution` normal / laplace | Gaussian / (Laplace noise — only as a prior today) | ⚠️ partial |
| `observableTransformation` lin / log / log10 | NoiseModel additive-noise-scale axis (ADR-0011) | ✅ partial |
| location = median (PEtab hardcodes) | Location Interpretation axis | ✅ exists |
| `observableFormula` / `noiseFormula` (sympy over model entities) | — | ⚠️ **biggest chunk**: a formula layer |

## Step 1 — `parameters` table → `Prior`/`FreeParameter` (DONE)

`pybnf/petab/parameters.py` reads `parameters.tsv` and maps each estimated row to a `FreeParameter` carrying a `Prior`, **driven by the prior registry** (synthesizes the `*_var` keyword, validates against `PRIOR_KEYWORD_MAP`, builds through the `FreeParameter` constructor → **bit-identical** to the native `.conf` path — not a parallel mapping table). Dependency-free (stdlib `csv`; runs in the bngsim-less CI tier) behind a neutral `PetabParameterRow` seam, so a later `petab`-library adoption feeds the same mapping from `Problem.parameter_df` with no rewrite. PEtab/PyBNF boundaries are explicit `NotImplementedError`s (the 5 unsupported families; unbounded-family truncation; `estimate=false`). Commit `f151914`, ADR-0019; 30 tests (equivalence across all 6 mappable families, a scipy `lognorm` sampling oracle for the natural-log conversion, the gap boundaries, the TSV reader).

## Chunks (rough order, each its own issue when reached)

- [x] **Step 1 — parameters → Prior/FreeParameter** (✅ `f151914`, ADR-0019)
- [ ] `observables.tsv` → NoiseModel selection + `noiseDistribution`/transformation → (family, scale-additive-on, location) mapping (ADR-0011)
- [ ] `measurements.tsv` + `conditions.tsv` → PyBNF exp-data + per-condition model overrides
- [ ] `observableFormula` / `noiseFormula` expression layer (sympy over model entities) — the largest piece; where the `petab` library is adopted as an optional extra (ADR-0019)
- [ ] `problem.yaml` top-level wiring + SBML model load → a complete `Configuration`
- [ ] End-to-end: import a small published PEtab benchmark problem and fit it
- [ ] Catalog parity for the ⚠️ gaps: the 5 missing prior families (`cauchy`/`gamma`/`exponential`/`chisquare`/`rayleigh`); the truncation-of-unbounded feature; Laplace **noise** family

## Notes / constraints

- New runtime deps (`petab`, `python-libsbml`, `sympy`) must be hand-mirrored into `.github/actions/setup-pybnf` or the `tests`/`integration` CI tiers go red (the recurring single-sync-point gotcha). Decision (ADR-0019): `petab` is adopted as an **optional extra** (`pybnf[petab]`) at the formula/SBML chunk, not in core — Step 1 stays dependency-free.
- Keep the importer **simulator-free** where possible so it runs in the bngsim-less CI tier.
- Out-of-scope framing comes from `dev/refactor-plan.md`. Relevant ADRs: 0003 (no Jacobian), 0004 (PEtab-defaulted not -bound), 0010 (Prior), 0011 (NoiseModel), 0019 (importer Step 1).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PEtab v2 problem importer — the 'two-adapter' proof (first step: parameters table → FreeParameter/Prior) #407

Status

Motivation

Spec correction — what PEtab v2 actually specifies

What a PEtab v2 problem is

Mapping to PyBNF's existing abstractions (corrected for v2)

Step 1 — `parameters` table → `Prior`/`FreeParameter` (DONE)

Chunks (rough order, each its own issue when reached)

Notes / constraints

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

PEtab v2 concept	PyBNF target	Status
`priorDistribution` `uniform` / `normal` / `laplace` (linear)	Uniform / Normal / Laplace family, Linear scale	✅ exact
`log-uniform`	`loguniform_var` (Uniform × Log10); params are linear bounds	✅ exact (base-independent)
`log-normal` / `log-laplace` (natural log)	`lognormal_var` / `loglaplace_var`; convert `μ/ln10, σ/ln10`	✅ θ-distribution identical, no Jacobian (ADR-0003 — the scale lives in the sampling parameterization)
omitted prior + bounds	uniform over `[lowerBound, upperBound]`	✅ matches v2's default-to-uniform rule
`estimate = false` (fixed)	model constant, not a `FreeParameter`	⏭ later chunk (conditions / model overrides)
`lowerBound` / `upperBound` truncate the prior	reflecting bounds exist only on Uniform	⚠️ Uniform truncates exactly (box intersection); truncation of an unbounded family raises (Step 1) — a [truncation feature] is a follow-up
`cauchy`, `gamma`, `exponential`, `chisquare`, `rayleigh`	—	⚠️ 5 families PyBNF lacks (catalog parity; the 1-param ones need grammar/arity work)
`noiseDistribution` normal / laplace	Gaussian / (Laplace noise — only as a prior today)	⚠️ partial
`observableTransformation` lin / log / log10	NoiseModel additive-noise-scale axis (ADR-0011)	✅ partial
location = median (PEtab hardcodes)	Location Interpretation axis	✅ exists
`observableFormula` / `noiseFormula` (sympy over model entities)	—	⚠️ biggest chunk: a formula layer

PEtab v2 problem importer — the 'two-adapter' proof (first step: parameters table → FreeParameter/Prior) #407

Description

Status

Motivation

Spec correction — what PEtab v2 actually specifies

What a PEtab v2 problem is

Mapping to PyBNF's existing abstractions (corrected for v2)

Step 1 — parameters table → Prior/FreeParameter (DONE)

Chunks (rough order, each its own issue when reached)

Notes / constraints

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Step 1 — `parameters` table → `Prior`/`FreeParameter` (DONE)