Skip to content

fix: handle Stan tuple and complex types#1174

Open
avehtari wants to merge 3 commits intomasterfrom
fix-925-handle-tuple-and-complex-types
Open

fix: handle Stan tuple and complex types#1174
avehtari wants to merge 3 commits intomasterfrom
fix-925-handle-tuple-and-complex-types

Conversation

@avehtari
Copy link
Copy Markdown
Member

@avehtari avehtari commented Apr 8, 2026

Closes #925

Stan examples by me, @spinkney and @WardBrian (from https://discourse.mc-stan.org/t/proof-of-concept-binary-output-format-for-cmdstan/40846/67). Careful iterative prompting by me. Code , tests and PR description assisted by Claude.

Fix tuple and complex variable handling

Summary

CmdStan 2.38+ introduced tuple and complex types, which use new naming
conventions in CSV output that CmdStanR did not understand. This PR adds
full support for these types across metadata parsing, model methods, and
init value handling.

  • Fix repair_variable_names() and variable_dims() to handle complex
    suffixes (.real/.imag) and tuple separators (:) in CSV column names
  • Fix variable_skeleton() and unconstrain_draws() for models with
    tuple and complex parameters
  • Add tuple init support: mod$sample(init = fit) now correctly passes
    tuple parameter values as nested JSON objects
  • Add write_stan_json() support for tuple values as named lists

Problem

CmdStan CSV headers now use three different separators:

Separator Meaning Example
. between digits array/matrix index beta.1.2beta[1,2]
: tuple element b_tuple:1:1.2b_tuple:1:1[2]
.real / .imag complex part z.realz[real]

These can combine: arr_pair.1:1 (array + tuple), z3D.1.1.1.real
(array + complex), nested:2:2.real (tuple + complex).

Before this PR, the following issues occurred:

  1. Warning: NAs introduced by coercion from variable_dims() on
    every model with complex or tuple types — as.numeric("imag") returns
    NA.

  2. Wrong metadata: fit$metadata()$stan_variable_sizes contained NA
    for complex variables and incorrect dimensions.

  3. Broken variable_skeleton(): Returned NA names for tuple
    parameters because create_skeleton() couldn't match Stan-level names
    ("b_tuple") against C++ leaf names ("b_tuple.1.1").

  4. Crashing unconstrain_draws(): Tuple parameters were silently
    dropped from the draw subset (Stan-level name "b_tuple" not found in
    leaf names "b_tuple:1:1"), causing CmdStan to receive the wrong number
    of scalars.

  5. Missing init values: mod$sample(init = fit) dropped tuple
    parameters entirely, and write_stan_json() could not serialize tuple
    values (heterogeneous lists crashed list_to_array()).

Changes

R/csv.R

repair_variable_names() — Detects and strips .real/.imag suffixes
before the dot-to-bracket conversion, then re-attaches them in the correct
position. The : tuple separator passes through unchanged.

variable_dims() — For non-numeric indices ("real", "imag", tuple
indices like "1:2"), counts unique values across all entries for that
dimension position instead of calling as.numeric().

R/utils.R

New helper functions for bridging between Stan-level names and leaf names:

  • stan_param_has_leaf() — Checks if Stan-level names have matching
    leaf names using ":" prefix matching.
  • expand_stan_params_to_leaves() — Expands Stan-level names to
    their leaf equivalents for posterior::subset_draws().
  • is_tuple_type() — Detects tuple parameters from model_variables
    type info ($type is a list for tuples).
  • build_tuple_init_value() — Recursively reconstructs a nested
    named-list init value from flat leaf draws, also validating for NA/Inf.
  • .extract_draw_value() — Shared helper for the draw extraction
    pipeline.

create_skeleton() — Expands tuple Stan-level names to leaf components
from param_metadata_ using "." prefix matching.

R/fit.R

unconstrain_draws() and unconstrain_variables() — Use
stan_param_has_leaf() for the zero-length parameter check and
expand_stan_params_to_leaves() for draw subsetting.

R/args.R

validate_fit_init() — Uses stan_param_has_leaf() instead of %in%.

process_init.draws() — Separates parameters into tuple and non-tuple.
For tuples, uses build_tuple_init_value() to reconstruct nested init
values. NA/Inf validation is done inline during extraction.

R/data.R

write_stan_json() — Detects tuple-style named lists (keys "1",
"2", ...) and processes them recursively instead of calling
list_to_array().

New helpers: is_tuple_list(), prepare_tuple_for_json().

Tests

tests/testthat/resources/stan/tuple_complex.stan — Test model with
tuple and complex types as both parameters and generated quantities:
nested tuples, tuples with complex, arrays of tuples, complex vectors,
complex matrices, arrays of complex matrices.

tests/testthat/test-tuple-complex.R — 137 tests covering:

  • Unit tests for repair_variable_names, unrepair_variable_names,
    variable_dims with all tuple/complex name patterns
  • Helper function tests (stan_param_has_leaf, expand_stan_params_to_leaves)
  • write_stan_json with tuple values
  • build_tuple_init_value reconstruction
  • End-to-end: sampling with no warnings, correct metadata, variable_skeleton(),
    unconstrain_draws(), init = fit round-trip, manual init lists

Test plan

  • All 397 existing test-csv.R tests pass
  • All 72 existing test-data.R tests pass
  • All 15 existing test-fit-init.R tests pass
  • 137 new test-tuple-complex.R tests pass
  • Manual verification with three Stan models covering: scalar/array/matrix
    params, complex scalar/vector/matrix/array params, simple/nested/arrayed
    tuples, tuples as parameters and generated quantities

Copyright and Licensing

Please list the copyright holder for the work you are submitting
(this will be you or your assignee, such as a university or company):

Aki Vehtari

By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the following licenses:

@avehtari avehtari requested a review from jgabry April 8, 2026 11:49
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 95.20548% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.23%. Comparing base (801b2b4) to head (db15c82).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
R/utils.R 93.61% 3 Missing ⚠️
R/args.R 95.00% 2 Missing ⚠️
R/data.R 88.23% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master    #1174    +/-   ##
========================================
  Coverage   91.23%   91.23%            
========================================
  Files          15       15            
  Lines        6070     6173   +103     
========================================
+ Hits         5538     5632    +94     
- Misses        532      541     +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jgabry
Copy link
Copy Markdown
Member

jgabry commented Apr 8, 2026

Thanks Aki! This one might take me some time to review.

Copy link
Copy Markdown
Member

@jgabry jgabry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed the code yet but the failure on WSL is because

Error: Additional model methods are not currently available with WSL CmdStan and will not be compiled

Adding skip_if(os_is_wsl()) to those specific tests should fix it.

EDIT: oops I meant to "request changes" not "approve" yet

@jgabry jgabry self-requested a review April 8, 2026 21:19
Copy link
Copy Markdown
Member

@jgabry jgabry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed the code yet but the failure on WSL is because a few of them error with

Error: Additional model methods are not currently available with WSL CmdStan and will not be compiled

Adding skip_if(os_is_wsl()) to those specific tests should fix it.

@WardBrian
Copy link
Copy Markdown
Member

Just to add a note, it would be nice if some of these IO utilities were moved somewhere independent, so we could use them from e.g. bridgestan or similar (I know that having more packages is more CRAN pain...)
stan-dev/stanio#7

@jgabry
Copy link
Copy Markdown
Member

jgabry commented Apr 14, 2026

I still haven't had a chance to go through this myself yet (I definitely will at some point), but I asked codex to take a look to see if it could find anything that's clearly an issue:

  1. Comment on tuple detection in write_stan_json():

I don't think write_stan_json() can safely infer "tuple" from names alone here. This function has
historically treated lists as array-like containers, and it does not have model-type information to
tell whether a sequentially named list is actually meant to be a tuple.

For example, write_stan_json(list(x = list("1" = 1:3, "2" = 4:6)), ... ) used to serialize x as an
array, but with this change it becomes a JSON object with "1" and "2" keys. That looks like a
breaking change for existing code that happens to produce named lists via split(), setNames(), or
subsetting.

Other than backwards compatibility, I don't think those names add useful semantics for Stan arrays
anyway. If we want tuple serialization here, I think it needs to be driven by model metadata or an
explicit opt-in, not by a name heuristic.

  1. Comment asking for a regression test around the ambiguity:

Can we add a regression test for the ambiguous non-tuple case too? Right now the new tests cover tuple
serialization, but they don't cover the case where a sequentially named list should still behave like
an array.

For example, I'd want write_stan_json(list(x = list("1" = 1:3, "2" = 4:6)), ...) to keep matching
the existing array behavior unless we have explicit type information saying x is a tuple. Without
that test, it's easy to lock in the new heuristic and accidentally break old callers.

  1. Comment on tuple root variable filtering:

I think tuple support is still incomplete in the variable-filter path. The helpers used by
fit$draws(), fit$print(), and read_cmdstan_csv(..., variables = ...) still only expand name[
prefixes, so tuple roots like b_tuple, pair, or nested won't resolve to their name:... leaves.

For example, if the CSV contains variables like b_tuple:1:1[1], b_tuple:1:1[2], and
b_tuple:2[1,1], I'd expect fit$draws(variables = "b_tuple") to return those columns, just like
fit$draws(variables = "theta") expands to theta[1], theta[2], etc. But with the current matching
logic, "b_tuple" is treated as not found.

I'd update the same prefix-matching logic here to treat name: as another expansion case.

I haven't verified yet that these are all actually issues. But if 1 is true then I guess we could break backwards compatibility here because we're going to do v1.0. We could say that any list with names will be interpreted as a tuple. It's not ideal, but it's doable if there's not another solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NA dimensions for complex and tuple types

4 participants