feat: SATP circumplex SEM module — functional API and architecture refactor#130
Open
MitchellAcoustics wants to merge 6 commits intodevfrom
Open
feat: SATP circumplex SEM module — functional API and architecture refactor#130MitchellAcoustics wants to merge 6 commits intodevfrom
MitchellAcoustics wants to merge 6 commits intodevfrom
Conversation
…ebook - Add `gdiff` computed property to `CircE` dataclass: RMSD between fitted polar angles and ideal 45°-spaced circumplex positions. Returns None for models with fixed angles (EQUAL_ANG, CIRCUMPLEX). Adds module-level `_IDEAL_ANGLES` and `_IDEAL_ANGLES_REV` constants mirroring the R `sem_funcs.R` implementation. - Add `test/satp/fixtures/sem-fit-ipsatized-canonical.csv`: canonical reference output from the original R analysis (2024-06-13, SATP v1.4, 16 languages × 4 models). Documents a known RMSEA.L/RMSEA.U swap bug in the original R CSV export code. - Add `docs/tutorials/SATP_CircE_Analysis.qmd`: Quarto notebook replicating the SATP circumplex SEM analysis using Soundscapy. Confirms numerical consistency against the canonical: all df values match exactly, RMSEA bounds are correctly ordered, and 6 reflected (equivalent) angular solutions are detected and documented. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rchitecture - Delete SATP class and ModelType class; replace with fit_circe() function returning a tidy DataFrame directly (one row per model) - Fold equal_ang/equal_com boolean properties into CircModelE enum directly, removing the redundant ModelType wrapper - Convert CircE from pydantic dataclass to stdlib dataclasses.dataclass; remove dead BeforeValidator/length_1_array_to_number machinery (already handled by extract_bfgs_fit()) - Change polar_angles: pd.DataFrame|None → pd.Series|None with PAQ_IDS index; fix extraction to correctly use pd.DataFrame(raw_pa).T.iloc[0] for R matrix orientation (variables × stats) - Add CircE.to_dict() with PAQ angle columns expanded for DataFrame construction - Add public ipsatize() function (was private SATP._ipsatize_df()) - Fix n/correlation to use listwise deletion (complete cases), consistent with R's na.omit — resolves n discrepancies for languages with NaN PAQ values - Update exports: fit_circe, ipsatize added; SATP, ModelType removed - Rewrite test suite: preserve all numerical regression anchors in TestBfgsWrapper unchanged; replace TestSATP with TestFitCirce using new API; add TestCircModelEProperties; add to_dict and listwise deletion tests - Update SATP_CircE_Analysis.qmd: use fit_circe() loop, normalize canonical CSV columns to lowercase/snake_case for comparison (language, model, chisq_can etc.), remove mixed-case column gymnastics Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- circe.py: generalize SATPSchema.column_alias to normalize all schema field names case-insensitively via a lowercase→canonical mapping dict, covering PAQ_IDS and 'participant' without hardcoded special cases - circe.py: add pre-validation empty-data guard in fit_circe() — raises ValueError immediately rather than producing 4 cryptic R error rows - circe.py: add post-ipsatization n=0 guard for cases where validation passes but no complete PAQ rows survive listwise deletion - circe.py: fix to_dict() return annotation dict → dict[str, Any] - _circe_wrapper.py: fix docstring example (sspy.spi.bfgs → sspyr.bfgs), add Any import, fix extract_bfgs_fit() return annotation dict → dict[str, Any] - test_circe.py: add 8 new tests — gdiff None/float for constrained/free-angle models, rmsea_l≤rmsea≤rmsea_u invariant, ipsatize_data=False path, models=[] returns empty DataFrame, error row structure via mock, n=0 raises ValueError, case-insensitive PARTICIPANT → participant schema normalization Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ions
- Extract _COLUMN_ALIASES module-level constant combining PAQ label names,
PAQ IDs, and participant field into a single lowercase→canonical lookup.
Built once at import time instead of inside column_alias on every call.
- Extend case-insensitive normalization to PAQ label names: 'Pleasant',
'PLEASANT' etc. now correctly map to 'PAQ1' (previously only exact-match
lowercase labels like 'pleasant' were handled).
- Simplify column_alias parser to a single dict comprehension over _COLUMN_ALIASES
replacing the two-pass rename_dict construction.
- Fix CircE dataclass field type annotations: m, chisq, d, p, cfi, gfi, agfi,
srmr, mcsc, rmsea, rmsea_l, rmsea_u declared as T|None to match from_bfgs()
which uses .get(key, None) for all fit statistics.
- Add test_satp_schema_paq_label_case_insensitive: verifies title-cased PAQ
label names ('Pleasant', 'Vibrant', ...) are normalized to PAQ_IDS.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SATP class was replaced by fit_circe() in the refactor; the smoke test hadn't been updated and was failing in CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pant optional
- polar_angles extraction: use label-based column access ("estimates") with
iloc[:, 0] fallback instead of fragile positional .T.iloc[0]
- extract_bfgs_fit: explicit int() cast for m/d/dfnull stats to guarantee
annotation holds regardless of rpy2 storage type
- fit_circe error rows: populate all expected columns with None to prevent
pandas from promoting numeric dtypes across successful rows
- SATPSchema: make participant Optional so ipsatize_data=False callers do not
need a participant column; add runtime ValueError if ipsatize_data=True
without participant
- Tests: 3 new tests covering ipsatize_data=False sans participant, the
ValueError path, and dtype preservation under partial failure
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the
SATPclass-based API with a clean functional design and delivers a canonical verification notebook confirming numerical parity with the original R analysis.fit_circe(data, language, datasource)— primary public API; validates, ipsatizes, fits all four circumplex model types, and returns a tidy DataFrame directlySATPclass andModelTypeclass — replaced with stateless function;equal_ang/equal_comfolded intoCircModelEenum propertiesCircEconverted from pydantic dataclass → stdlibdataclasses.dataclass— removes deadBeforeValidatormachinery (already handled byextract_bfgs_fit)polar_angles: pd.Series | None(waspd.DataFrame | None) — PAQ_IDS index, estimates only;gdiffproperty computes RMSD against ideal circumplexipsatize()promoted to public module-level functioncomplete.dropna()before correlation) — consistent with R'sna.omit_COLUMN_ALIASESconstant — handles PAQ label names ("Pleasant"→"PAQ1"), PAQ IDs ("paq1"→"PAQ1"), and participant field ("PARTICIPANT"→"participant").qmdverifying 16 languages × 4 models against canonical R CSV; confirms RMSEA.L/U swap bug in canonical dataipsatize_data=False, RMSEA bounds ordering, gdiff, case-insensitive schema)Test plan
uv run pytest test/satp/test_circe.py -v— 41 tests passuv run quarto render docs/tutorials/SATP_CircE_Analysis.qmd— all 12 cells execute cleanlyfit_circe(data, language=..., datasource=...)returns a 4-row DataFrame with correct columnsCircModelE.UNCONSTRAINED.equal_angisFalse,CircModelE.CIRCUMPLEX.equal_angisTrue"PARTICIPANT"and"Pleasant"column names are accepted bySATPSchema🤖 Generated with Claude Code