Skip to content

fix: R wrapper cleanup, DataFrame mutation bug, test coverage and docs#127

Merged
MitchellAcoustics merged 20 commits intodevfrom
claude/jolly-joliot
Mar 1, 2026
Merged

fix: R wrapper cleanup, DataFrame mutation bug, test coverage and docs#127
MitchellAcoustics merged 20 commits intodevfrom
claude/jolly-joliot

Conversation

@MitchellAcoustics
Copy link
Owner

Summary

  • Bug fix: MultiSkewNorm.fit() was mutating the caller's DataFrame in-place via data.columns = ["x", "y"]; now copies before renaming
  • Error handling: Replace fragile if "sn" in str(e) string-matching in check_sn_package/check_circe_package with canonical except ImportError: raise pattern; remove redundant from rpy2 import robjects re-imports inside functions
  • Test coverage: Add sample_mtsn() tests (shape, bounds, storage, unfitted error), from_params() branch coverage (all 4 paths), _ver() unit tests, regression test for DataFrame mutation fix, SATP graceful-degradation tests in test_basic.py
  • Dead code removal: Remove test_initialize_r_session_fails (always skipped when rpy2 available; assertions referenced non-existent error strings)
  • Documentation: Fix stale MultiSkewNorm class docstring method names (ks2dsks2d2s, spispi_score); expand msn.py module docstring to document standalone public functions; replace vague TODO in DirectParams.from_cp warning with specific message

Test plan

  • uv run pytest test/spi/ test/satp/ test/test_basic.py — 78 passed
  • uv run tox run -e py312-core (no R/rpy2) — 147 passed, graceful degradation verified
  • uv run tox run -e py312-r (with R) — 211 passed

🤖 Generated with Claude Code

Andrew Mitchell and others added 20 commits February 28, 2026 18:29
- Replace deprecated numpy2ri.activate()/pandas2ri.activate() with
  explicit context managers throughout _rsn_wrapper.py, resolving
  issue #111
- Add _r2np() helper for consistent R→numpy conversion via context manager
- Fix correctness bug: bfgs() now requires explicit n (sample count)
  parameter instead of silently using data_cor.shape[0] (which returned
  8 — the variable count — not the participant count), causing incorrect
  chi-square and RMSEA statistics
- Update CircE.compute_bfgs_fit() and SATP.run() to pass the correct n
- Remove module-level get_r_session() calls from _circe_wrapper.py and
  _rsn_wrapper.py; session is now acquired lazily inside each function
- Remove _raw_bfgs_fit: ro.ListVector from CircE dataclass to avoid
  keeping live R objects in long-lived Python dataclasses
- Remove unused ro import and ConfigDict from circe.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace stats_package.pchisq() with scipy.chi2.sf() in extract_bfgs_fit()
to avoid the rpy2 py2rpy conversion failure: pandas2ri.converter produces
pandas Series from rpy2py, which cannot be passed back to R outside the
context. scipy.chi2.sf(x, df) == 1 - pchisq(x, df) by definition.

Also restore ConfigDict(arbitrary_types_allowed=True) on the CircE
dataclass, required for the polar_angles: pd.DataFrame | None field.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ntext

MultiSkewNorm.fit() was storing the rpy2 RS4 selm_model object as an
instance attribute, creating a persistent reference into R's heap that
can outlive the R session. Since cp and dp are extracted immediately
after fitting, the R object is no longer needed. Sampling now always
goes through the dp (Direct Parameters) branch, which constructs the
required R vectors on the fly.

Also extend the converter context in bfgs() to include the as_matrix()
call, keeping all DataFrame→R operations within the same context boundary.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
soundscapy/__init__.py
- Replace unconditional try/except imports of spi/satp with module-level
  __getattr__ + __dir__ (PEP 562). R now only starts when the user first
  accesses sspy.satp, sspy.spi, sspy.SATP, sspy.MultiSkewNorm, etc.
- Use importlib.import_module() for dynamic sub-module loading.
- __dir__ exposes all lazy names to dir() and IDE autocomplete.

_r_wrapper.py
- Add RSession NamedTuple return type for get_r_session(). Callers now
  use r.sn / r.base / r.circe instead of fragile positional unpacking.
- Add _ver() helper for tuple-based version comparison, fixing a
  lexicographic bug where "1.10" < "1.2" was True (affected CircE
  and sn version checks).
- Rename copy-paste error _raise_sn_check_error -> _raise_circe_check_error
  inside check_circe_package().
- Remove stale comment about imports "not used in the code".

_circe_wrapper.py / _rsn_wrapper.py
- Update all get_r_session() call sites to use RSession named fields.
- Move base.as_matrix() call back outside the pandas2ri converter context
  to prevent its R-matrix return value being auto-converted to numpy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove `selm_model` assertions from test_init and three fit tests
(test_fit_with_dataframe, test_fit_with_numpy_array, test_fit_with_x_y)
since the R RS4 object is no longer stored on the instance.

Update test_sample_not_fitted_or_defined error message match to reflect
the new message raised by MultiSkewNorm.sample() rather than the old
message from the underlying _rsn_wrapper.py sample_msn() function.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix p-value calculation in extract_bfgs_fit() to use model df ('d') not
  null model df ('dfnull'); p was ~0.95 instead of the correct ~0.041
- Fix circe.py CircE.df field to map from 'd' not 'dfnull'
- Remove dead 'R not found' error paths from check_r_availability()
  (R is always already running by the time the function is called)
- Rename shutdown_r_session() -> reset_r_session() to accurately reflect
  that only Python package references are cleared; R process stays alive
- Update test_r_wrapper.py and test_MSN.py for the above changes
- Replace stale test_cp2dp skip with a dp→cp→dp2cp round-trip test
- Add test/satp/test_circe.py: integration tests for bfgs() and
  extract_bfgs_fit() verified against the vocational interests example
  from CircE.BFGS.Rd (exact reference values from the R package), plus
  structural tests for CircE.compute_bfgs_fit() and SATP pipeline

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Strip r_wrapper.__all__ to empty: it is an internal implementation
  package; user-facing names live in soundscapy.spi / soundscapy.satp
- Replace AUTO_INSTALL_R_PACKAGES=True with _confirm_install_r_packages():
  checks SOUNDSCAPY_AUTO_INSTALL_R env var first (CI/script opt-in),
  then prompts interactively when stdin is a TTY, otherwise does nothing
- Move experimental UserWarning from module-level into SATP.__init__ and
  MultiSkewNorm.__init__ so stacklevel=2 points at the user's call site
  and Python's default filter deduplicates per call location
- Remove now-unused `import warnings` from satp/__init__ and spi/__init__
- Add spi_score to _SPI_ATTRS so sspy.spi_score works via lazy loading
- Add comment in _r_wrapper.py and _rsn_wrapper.py explaining that R
  starts unconditionally when rpy2 is imported

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…G_SRC ordering

- circe.py from_bfgs(): fix model_type comparison — was comparing a ModelType
  dataclass against CircModelE enum members (always False); must compare
  model_type.name against the enum
- circe.py from_bfgs(): fix key name — R returns 'polar.angles' (dot), not
  'polar_angles' (underscore); polar_angles was always None for every model
- _circe_wrapper.py extract_bfgs_fit(): normalise all shape-(1,) numpy arrays
  to Python scalars so callers never need .item() and numpy >= 1.25
  DeprecationWarnings are eliminated
- _r_wrapper.py: move PKG_SRC definition above _confirm_install_r_packages()
  which references it, fixing forward-reference style issue
- test_circe.py: add polar_angles tests for free-angle and constrained models;
  update .item() calls that are now redundant after scalar normalisation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both functions are dead code — no call site exists in src, tests, or docs.
MultiSkewNorm.fit() calls selm() + extract_cp() + extract_dp() separately
to share the single fitted model object; calc_cp/dp would call selm() twice,
doubling R computation for the most common use case.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…mprove docs and test coverage

- Remove redundant `from rpy2 import robjects` re-imports in check_sn_package,
  check_circe_package, and initialize_r_session (module-level import already covers these)
- Replace fragile `if "sn" in str(e)` / `if "CircE" in str(e)` string-matching with
  canonical `except ImportError: raise` / `except Exception as e:` split
- Fix MultiSkewNorm.fit() mutating the caller's DataFrame in-place via data.columns=["x","y"];
  now copies before renaming
- Fix stale MultiSkewNorm class docstring: ks2ds→ks2d2s, spi→spi_score, add sample_mtsn
- Expand msn.py module docstring to document standalone public functions
- Replace vague TODO in DirectParams.from_cp warning with specific message
- Remove dead test_initialize_r_session_fails (always skipped, assertions referenced
  non-existent error strings)
- Add _ver() unit tests covering lexicographic comparison correctness
- Add sample_mtsn() tests: shape, bounds enforcement, sample_data storage, unfitted error
- Add from_params() branch coverage: DirectParams, CentredParams, CP kwargs, no-args error
- Add test_fit_does_not_mutate_input_dataframe regression test
- Replace TODO comments in ks2d2s/spi_score tests with concrete range assertions
- Add test_soundscapy_satp_module and test_satp_import_error to test_basic.py for
  symmetric SATP coverage alongside existing SPI tests
- Remove commented-out dead assertions in test_soundscapy_spi_module

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Regression test for DataFrame mutation was vacuous (MOCK_DF already
  has ["x","y"] columns); use ["ISOPleasant","ISOEventful"] so a
  reintroduced bug would be visible
- Fix PKG_SRC.CIRCE assertion to use .value (str(enum) yields
  "PKG_SRC.CIRCE", not the underlying install-path string)
- from_params(xi=..., omega=..., alpha=...) was leaving instance.cp=None;
  add CentredParams.from_dp() call after setting instance.dp
- Add test_from_params_with_xi_omega_alpha_kwargs_sets_cp to cover the
  fixed branch

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- r_wrapper/__init__.py: add noqa F401 on re-export imports
- _r_wrapper.py: noqa N801 (PKG_SRC), noqa BLE001 (exception wrappers),
  restructure try/except/else to satisfy TRY300
- msn.py: shorten over-long comment (E501)
- test_MSN.py: rename df→input_df (PD901), shorten docstring (E501),
  add match= to pytest.raises (PT011)
- test_circe.py: use [*PAQ_IDS, ...] unpacking (RUF005), inline rename
  for RET504, shorten assertion messages (E501), fix docstring (D205)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
**CIRCUMPLEX model equal_ang was wrong (correctness bug)**
The four CircE models form a 2×2 grid of {equal, free} × {angles, comms}.
EQUAL_COM should have *free* angles (equal_ang=False), and CIRCUMPLEX
should have *both* constrained (equal_ang=True, equal_com=True). The
previous sets had them swapped, so CIRCUMPLEX and EQUAL_COM were
producing identical CircE_BFGS calls. Confirmed by the polar_angles tests.

**ImportError masking in check_r_availability**
_raise_r_version_too_old_error raises ImportError, which was caught by
the bare `except Exception` and re-wrapped with "Error querying R version",
swallowing the informative "R version X is too old" message. Added
`except ImportError: raise` before the generic handler.

**R version float formula replaced with _ver() tuple comparison**
`major + minor/10` fails for multi-digit minor versions and is
inconsistent with the _ver() utility added for exactly this purpose.
Now uses `paste(major, minor, sep='.')` → _ver() comparison.

**Double-ipsatize guard**
ipsatize() had an _ipsatized flag but never checked it. A second call
would KeyError because groupby.transform drops the `participant` column.
Added an early return with a warning.

**numpy import moved to module level in _circe_wrapper.py**

**DESCRIPTION: add CircE GitHub remote for CI**
setup-r-dependencies installs packages from DESCRIPTION. Only `sn` was
listed; `CircE` (from GitHub) was missing, causing the py311-r CI job to
fail with PackageNotInstalledError.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use usethis::use_dev_package() to add CircE to DESCRIPTION in the
canonical R format: Imports field with version pin (>= 1.1) plus a
Remotes entry pointing to MitchellAcoustics/CircE-R on GitHub.

r-lib/actions/setup-r-dependencies reads this via pak, which fetches
the GitHub repo DESCRIPTION to resolve the package name → source
mapping. No workflow changes needed; this is the standard R approach
for non-CRAN dependencies.

Also reverts the erroneous extra-packages workflow addition from the
previous commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pak/pkgdepends cannot auto-map a package name to a repository when they
differ (CircE package vs CircE-R repo). The explicit `CircE=MitchellAcoustics/CircE-R`
syntax in the Remotes field tells pak which GitHub repo satisfies the
`CircE (>= 1.1)` Imports dependency, fixing the CI failure in
setup-r-dependencies.

Also removes the commented-out (dead) Rscript lines from the py-r tox
environment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… syntax

setup-r-dependencies installs R packages into a library that rpy2
cannot see from within the tox virtualenv. The explicit
`Rscript -e "if(!require(...)) { pak::... }"` pattern in commands_pre
is the reliable workaround (same pattern py-all already uses successfully).

Changes:
- py-r: add Rscript install steps for sn and CircE (was missing entirely)
- py-all, py-tutorials: update CircE pak ref to explicit Package=user/repo
  syntax (`CircE=MitchellAcoustics/CircE-R`) required when the package name
  differs from the repo name

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove redundant `# tox.ini` file header
- Unify R install comment across py-r, py-all, py-tutorials to accurately
  describe both packages being installed (sn and CircE)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- _r_wrapper.py: add missing single-quotes around PKG_SRC.CIRCE.value in
  three CircE error-message f-strings (produced invalid R syntax)
- _r_wrapper.py: remove dead 'tvtnorm' from install_r_packages() default
  list and update stale docstring
- _r_wrapper.py: reset _sn_checked/_circe_checked in reset_r_session() and
  update docstring to accurately describe re-verification behaviour
- _rsn_wrapper.py: add max_iter=100_000 guard to sample_mtsn() rejection
  loop; raises RuntimeError with a clear message instead of hanging forever
  when the distribution has negligible mass inside [a, b]
- circe.py: change self.data annotation from pandera DataFrame to pd.DataFrame
  since ipsatize() overwrites it with an unvalidated result
- tox.ini: replace pak::local_install_deps() with pak::pkg_install('sn') for
  direct, reliable sn installation; add quietly=TRUE to both require() guards

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- sample_mtsn: replace sentinel-row + np.append loop with list
  accumulation + np.vstack; add max_iter guard against infinite loops
- REQUIRED_R_VERSION: change from float 3.6 to str "3.6" to avoid
  str(4.10) == "4.1" pitfall with future two-digit minor versions
- reset_r_session: reset _sn_checked/_circe_checked flags so next call
  re-verifies package availability from scratch
- test_r_wrapper: remove unreachable SPI_DEPS branching; tests already
  gated by optional_deps("r") so both branches always run with R present
- msn.py: remove ks2d() from module docstring — no such top-level fn
- CircE.df → CircE.d to match raw R output key; update from_bfgs(),
  test_circe.py (4 refs), and scratch/circe_test.py integration script

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the nine scattered module-level globals (_r_checked, _sn_checked,
_circe_checked, _r_session, _sn_package, _circe_package, _stats_package,
_base_package, _session_active) with a single @DataClass RSession instance
(_state).

Key changes:
- RSession promoted from NamedTuple return type to mutable @DataClass state
  container, adding active/r_checked/sn_checked/circe_checked fields and an
  is_ready property
- _state = RSession() is the sole module-level state; functions mutate its
  attributes directly — no global declarations needed (mutating an object
  attribute does not rebind the module-level name)
- reset_r_session() becomes: global _state; _state = RSession() — atomically
  clears all state including check flags, making it impossible to forget a
  field (fixes the _r_checked omission found in code review)
- get_r_session() validation collapses to: if not _state.is_ready
- initialize_r_session() error path calls reset_r_session() for clean retry
- is_session_active() needs no global declaration
- Module docstring updated to explain the _state singleton pattern

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@MitchellAcoustics MitchellAcoustics merged commit d6f5e34 into dev Mar 1, 2026
17 checks passed
@MitchellAcoustics MitchellAcoustics deleted the claude/jolly-joliot branch March 1, 2026 00:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant