fix: R wrapper cleanup, DataFrame mutation bug, test coverage and docs#127
Merged
MitchellAcoustics merged 20 commits intodevfrom Mar 1, 2026
Merged
fix: R wrapper cleanup, DataFrame mutation bug, test coverage and docs#127MitchellAcoustics merged 20 commits intodevfrom
MitchellAcoustics merged 20 commits intodevfrom
Conversation
- Replace deprecated numpy2ri.activate()/pandas2ri.activate() with explicit context managers throughout _rsn_wrapper.py, resolving issue #111 - Add _r2np() helper for consistent R→numpy conversion via context manager - Fix correctness bug: bfgs() now requires explicit n (sample count) parameter instead of silently using data_cor.shape[0] (which returned 8 — the variable count — not the participant count), causing incorrect chi-square and RMSEA statistics - Update CircE.compute_bfgs_fit() and SATP.run() to pass the correct n - Remove module-level get_r_session() calls from _circe_wrapper.py and _rsn_wrapper.py; session is now acquired lazily inside each function - Remove _raw_bfgs_fit: ro.ListVector from CircE dataclass to avoid keeping live R objects in long-lived Python dataclasses - Remove unused ro import and ConfigDict from circe.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace stats_package.pchisq() with scipy.chi2.sf() in extract_bfgs_fit() to avoid the rpy2 py2rpy conversion failure: pandas2ri.converter produces pandas Series from rpy2py, which cannot be passed back to R outside the context. scipy.chi2.sf(x, df) == 1 - pchisq(x, df) by definition. Also restore ConfigDict(arbitrary_types_allowed=True) on the CircE dataclass, required for the polar_angles: pd.DataFrame | None field. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ntext MultiSkewNorm.fit() was storing the rpy2 RS4 selm_model object as an instance attribute, creating a persistent reference into R's heap that can outlive the R session. Since cp and dp are extracted immediately after fitting, the R object is no longer needed. Sampling now always goes through the dp (Direct Parameters) branch, which constructs the required R vectors on the fly. Also extend the converter context in bfgs() to include the as_matrix() call, keeping all DataFrame→R operations within the same context boundary. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
soundscapy/__init__.py - Replace unconditional try/except imports of spi/satp with module-level __getattr__ + __dir__ (PEP 562). R now only starts when the user first accesses sspy.satp, sspy.spi, sspy.SATP, sspy.MultiSkewNorm, etc. - Use importlib.import_module() for dynamic sub-module loading. - __dir__ exposes all lazy names to dir() and IDE autocomplete. _r_wrapper.py - Add RSession NamedTuple return type for get_r_session(). Callers now use r.sn / r.base / r.circe instead of fragile positional unpacking. - Add _ver() helper for tuple-based version comparison, fixing a lexicographic bug where "1.10" < "1.2" was True (affected CircE and sn version checks). - Rename copy-paste error _raise_sn_check_error -> _raise_circe_check_error inside check_circe_package(). - Remove stale comment about imports "not used in the code". _circe_wrapper.py / _rsn_wrapper.py - Update all get_r_session() call sites to use RSession named fields. - Move base.as_matrix() call back outside the pandas2ri converter context to prevent its R-matrix return value being auto-converted to numpy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove `selm_model` assertions from test_init and three fit tests (test_fit_with_dataframe, test_fit_with_numpy_array, test_fit_with_x_y) since the R RS4 object is no longer stored on the instance. Update test_sample_not_fitted_or_defined error message match to reflect the new message raised by MultiSkewNorm.sample() rather than the old message from the underlying _rsn_wrapper.py sample_msn() function. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix p-value calculation in extract_bfgs_fit() to use model df ('d') not
null model df ('dfnull'); p was ~0.95 instead of the correct ~0.041
- Fix circe.py CircE.df field to map from 'd' not 'dfnull'
- Remove dead 'R not found' error paths from check_r_availability()
(R is always already running by the time the function is called)
- Rename shutdown_r_session() -> reset_r_session() to accurately reflect
that only Python package references are cleared; R process stays alive
- Update test_r_wrapper.py and test_MSN.py for the above changes
- Replace stale test_cp2dp skip with a dp→cp→dp2cp round-trip test
- Add test/satp/test_circe.py: integration tests for bfgs() and
extract_bfgs_fit() verified against the vocational interests example
from CircE.BFGS.Rd (exact reference values from the R package), plus
structural tests for CircE.compute_bfgs_fit() and SATP pipeline
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Strip r_wrapper.__all__ to empty: it is an internal implementation package; user-facing names live in soundscapy.spi / soundscapy.satp - Replace AUTO_INSTALL_R_PACKAGES=True with _confirm_install_r_packages(): checks SOUNDSCAPY_AUTO_INSTALL_R env var first (CI/script opt-in), then prompts interactively when stdin is a TTY, otherwise does nothing - Move experimental UserWarning from module-level into SATP.__init__ and MultiSkewNorm.__init__ so stacklevel=2 points at the user's call site and Python's default filter deduplicates per call location - Remove now-unused `import warnings` from satp/__init__ and spi/__init__ - Add spi_score to _SPI_ATTRS so sspy.spi_score works via lazy loading - Add comment in _r_wrapper.py and _rsn_wrapper.py explaining that R starts unconditionally when rpy2 is imported Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…G_SRC ordering - circe.py from_bfgs(): fix model_type comparison — was comparing a ModelType dataclass against CircModelE enum members (always False); must compare model_type.name against the enum - circe.py from_bfgs(): fix key name — R returns 'polar.angles' (dot), not 'polar_angles' (underscore); polar_angles was always None for every model - _circe_wrapper.py extract_bfgs_fit(): normalise all shape-(1,) numpy arrays to Python scalars so callers never need .item() and numpy >= 1.25 DeprecationWarnings are eliminated - _r_wrapper.py: move PKG_SRC definition above _confirm_install_r_packages() which references it, fixing forward-reference style issue - test_circe.py: add polar_angles tests for free-angle and constrained models; update .item() calls that are now redundant after scalar normalisation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both functions are dead code — no call site exists in src, tests, or docs. MultiSkewNorm.fit() calls selm() + extract_cp() + extract_dp() separately to share the single fitted model object; calc_cp/dp would call selm() twice, doubling R computation for the most common use case. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…mprove docs and test coverage - Remove redundant `from rpy2 import robjects` re-imports in check_sn_package, check_circe_package, and initialize_r_session (module-level import already covers these) - Replace fragile `if "sn" in str(e)` / `if "CircE" in str(e)` string-matching with canonical `except ImportError: raise` / `except Exception as e:` split - Fix MultiSkewNorm.fit() mutating the caller's DataFrame in-place via data.columns=["x","y"]; now copies before renaming - Fix stale MultiSkewNorm class docstring: ks2ds→ks2d2s, spi→spi_score, add sample_mtsn - Expand msn.py module docstring to document standalone public functions - Replace vague TODO in DirectParams.from_cp warning with specific message - Remove dead test_initialize_r_session_fails (always skipped, assertions referenced non-existent error strings) - Add _ver() unit tests covering lexicographic comparison correctness - Add sample_mtsn() tests: shape, bounds enforcement, sample_data storage, unfitted error - Add from_params() branch coverage: DirectParams, CentredParams, CP kwargs, no-args error - Add test_fit_does_not_mutate_input_dataframe regression test - Replace TODO comments in ks2d2s/spi_score tests with concrete range assertions - Add test_soundscapy_satp_module and test_satp_import_error to test_basic.py for symmetric SATP coverage alongside existing SPI tests - Remove commented-out dead assertions in test_soundscapy_spi_module Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Regression test for DataFrame mutation was vacuous (MOCK_DF already has ["x","y"] columns); use ["ISOPleasant","ISOEventful"] so a reintroduced bug would be visible - Fix PKG_SRC.CIRCE assertion to use .value (str(enum) yields "PKG_SRC.CIRCE", not the underlying install-path string) - from_params(xi=..., omega=..., alpha=...) was leaving instance.cp=None; add CentredParams.from_dp() call after setting instance.dp - Add test_from_params_with_xi_omega_alpha_kwargs_sets_cp to cover the fixed branch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- r_wrapper/__init__.py: add noqa F401 on re-export imports - _r_wrapper.py: noqa N801 (PKG_SRC), noqa BLE001 (exception wrappers), restructure try/except/else to satisfy TRY300 - msn.py: shorten over-long comment (E501) - test_MSN.py: rename df→input_df (PD901), shorten docstring (E501), add match= to pytest.raises (PT011) - test_circe.py: use [*PAQ_IDS, ...] unpacking (RUF005), inline rename for RET504, shorten assertion messages (E501), fix docstring (D205) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
**CIRCUMPLEX model equal_ang was wrong (correctness bug)**
The four CircE models form a 2×2 grid of {equal, free} × {angles, comms}.
EQUAL_COM should have *free* angles (equal_ang=False), and CIRCUMPLEX
should have *both* constrained (equal_ang=True, equal_com=True). The
previous sets had them swapped, so CIRCUMPLEX and EQUAL_COM were
producing identical CircE_BFGS calls. Confirmed by the polar_angles tests.
**ImportError masking in check_r_availability**
_raise_r_version_too_old_error raises ImportError, which was caught by
the bare `except Exception` and re-wrapped with "Error querying R version",
swallowing the informative "R version X is too old" message. Added
`except ImportError: raise` before the generic handler.
**R version float formula replaced with _ver() tuple comparison**
`major + minor/10` fails for multi-digit minor versions and is
inconsistent with the _ver() utility added for exactly this purpose.
Now uses `paste(major, minor, sep='.')` → _ver() comparison.
**Double-ipsatize guard**
ipsatize() had an _ipsatized flag but never checked it. A second call
would KeyError because groupby.transform drops the `participant` column.
Added an early return with a warning.
**numpy import moved to module level in _circe_wrapper.py**
**DESCRIPTION: add CircE GitHub remote for CI**
setup-r-dependencies installs packages from DESCRIPTION. Only `sn` was
listed; `CircE` (from GitHub) was missing, causing the py311-r CI job to
fail with PackageNotInstalledError.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use usethis::use_dev_package() to add CircE to DESCRIPTION in the canonical R format: Imports field with version pin (>= 1.1) plus a Remotes entry pointing to MitchellAcoustics/CircE-R on GitHub. r-lib/actions/setup-r-dependencies reads this via pak, which fetches the GitHub repo DESCRIPTION to resolve the package name → source mapping. No workflow changes needed; this is the standard R approach for non-CRAN dependencies. Also reverts the erroneous extra-packages workflow addition from the previous commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pak/pkgdepends cannot auto-map a package name to a repository when they differ (CircE package vs CircE-R repo). The explicit `CircE=MitchellAcoustics/CircE-R` syntax in the Remotes field tells pak which GitHub repo satisfies the `CircE (>= 1.1)` Imports dependency, fixing the CI failure in setup-r-dependencies. Also removes the commented-out (dead) Rscript lines from the py-r tox environment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… syntax
setup-r-dependencies installs R packages into a library that rpy2
cannot see from within the tox virtualenv. The explicit
`Rscript -e "if(!require(...)) { pak::... }"` pattern in commands_pre
is the reliable workaround (same pattern py-all already uses successfully).
Changes:
- py-r: add Rscript install steps for sn and CircE (was missing entirely)
- py-all, py-tutorials: update CircE pak ref to explicit Package=user/repo
syntax (`CircE=MitchellAcoustics/CircE-R`) required when the package name
differs from the repo name
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove redundant `# tox.ini` file header - Unify R install comment across py-r, py-all, py-tutorials to accurately describe both packages being installed (sn and CircE) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- _r_wrapper.py: add missing single-quotes around PKG_SRC.CIRCE.value in
three CircE error-message f-strings (produced invalid R syntax)
- _r_wrapper.py: remove dead 'tvtnorm' from install_r_packages() default
list and update stale docstring
- _r_wrapper.py: reset _sn_checked/_circe_checked in reset_r_session() and
update docstring to accurately describe re-verification behaviour
- _rsn_wrapper.py: add max_iter=100_000 guard to sample_mtsn() rejection
loop; raises RuntimeError with a clear message instead of hanging forever
when the distribution has negligible mass inside [a, b]
- circe.py: change self.data annotation from pandera DataFrame to pd.DataFrame
since ipsatize() overwrites it with an unvalidated result
- tox.ini: replace pak::local_install_deps() with pak::pkg_install('sn') for
direct, reliable sn installation; add quietly=TRUE to both require() guards
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- sample_mtsn: replace sentinel-row + np.append loop with list
accumulation + np.vstack; add max_iter guard against infinite loops
- REQUIRED_R_VERSION: change from float 3.6 to str "3.6" to avoid
str(4.10) == "4.1" pitfall with future two-digit minor versions
- reset_r_session: reset _sn_checked/_circe_checked flags so next call
re-verifies package availability from scratch
- test_r_wrapper: remove unreachable SPI_DEPS branching; tests already
gated by optional_deps("r") so both branches always run with R present
- msn.py: remove ks2d() from module docstring — no such top-level fn
- CircE.df → CircE.d to match raw R output key; update from_bfgs(),
test_circe.py (4 refs), and scratch/circe_test.py integration script
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the nine scattered module-level globals (_r_checked, _sn_checked, _circe_checked, _r_session, _sn_package, _circe_package, _stats_package, _base_package, _session_active) with a single @DataClass RSession instance (_state). Key changes: - RSession promoted from NamedTuple return type to mutable @DataClass state container, adding active/r_checked/sn_checked/circe_checked fields and an is_ready property - _state = RSession() is the sole module-level state; functions mutate its attributes directly — no global declarations needed (mutating an object attribute does not rebind the module-level name) - reset_r_session() becomes: global _state; _state = RSession() — atomically clears all state including check flags, making it impossible to forget a field (fixes the _r_checked omission found in code review) - get_r_session() validation collapses to: if not _state.is_ready - initialize_r_session() error path calls reset_r_session() for clean retry - is_session_active() needs no global declaration - Module docstring updated to explain the _state singleton pattern Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
MultiSkewNorm.fit()was mutating the caller's DataFrame in-place viadata.columns = ["x", "y"]; now copies before renamingif "sn" in str(e)string-matching incheck_sn_package/check_circe_packagewith canonicalexcept ImportError: raisepattern; remove redundantfrom rpy2 import robjectsre-imports inside functionssample_mtsn()tests (shape, bounds, storage, unfitted error),from_params()branch coverage (all 4 paths),_ver()unit tests, regression test for DataFrame mutation fix, SATP graceful-degradation tests intest_basic.pytest_initialize_r_session_fails(always skipped when rpy2 available; assertions referenced non-existent error strings)MultiSkewNormclass docstring method names (ks2ds→ks2d2s,spi→spi_score); expandmsn.pymodule docstring to document standalone public functions; replace vague TODO inDirectParams.from_cpwarning with specific messageTest plan
uv run pytest test/spi/ test/satp/ test/test_basic.py— 78 passeduv run tox run -e py312-core(no R/rpy2) — 147 passed, graceful degradation verifieduv run tox run -e py312-r(with R) — 211 passed🤖 Generated with Claude Code