Skip to content

feat: SATP circumplex SEM module — functional API and architecture refactor#130

Open
MitchellAcoustics wants to merge 6 commits intodevfrom
analysis/satp-circe-notebook
Open

feat: SATP circumplex SEM module — functional API and architecture refactor#130
MitchellAcoustics wants to merge 6 commits intodevfrom
analysis/satp-circe-notebook

Conversation

@MitchellAcoustics
Copy link
Owner

@MitchellAcoustics MitchellAcoustics commented Mar 1, 2026

Summary

Replaces the SATP class-based API with a clean functional design and delivers a canonical verification notebook confirming numerical parity with the original R analysis.

  • New fit_circe(data, language, datasource) — primary public API; validates, ipsatizes, fits all four circumplex model types, and returns a tidy DataFrame directly
  • Deleted SATP class and ModelType class — replaced with stateless function; equal_ang/equal_com folded into CircModelE enum properties
  • CircE converted from pydantic dataclass → stdlib dataclasses.dataclass — removes dead BeforeValidator machinery (already handled by extract_bfgs_fit)
  • polar_angles: pd.Series | None (was pd.DataFrame | None) — PAQ_IDS index, estimates only; gdiff property computes RMSD against ideal circumplex
  • ipsatize() promoted to public module-level function
  • Listwise deletion (complete.dropna() before correlation) — consistent with R's na.omit
  • Unified case-insensitive column normalization via _COLUMN_ALIASES constant — handles PAQ label names ("Pleasant""PAQ1"), PAQ IDs ("paq1""PAQ1"), and participant field ("PARTICIPANT""participant")
  • SATP CircE Analysis notebook — Quarto .qmd verifying 16 languages × 4 models against canonical R CSV; confirms RMSEA.L/U swap bug in canonical data
  • 41 tests covering numerical regression anchors, all new API paths, edge cases (n=0, error rows, ipsatize_data=False, RMSEA bounds ordering, gdiff, case-insensitive schema)

Test plan

  • uv run pytest test/satp/test_circe.py -v — 41 tests pass
  • uv run quarto render docs/tutorials/SATP_CircE_Analysis.qmd — all 12 cells execute cleanly
  • Verify fit_circe(data, language=..., datasource=...) returns a 4-row DataFrame with correct columns
  • Verify CircModelE.UNCONSTRAINED.equal_ang is False, CircModelE.CIRCUMPLEX.equal_ang is True
  • Verify "PARTICIPANT" and "Pleasant" column names are accepted by SATPSchema

🤖 Generated with Claude Code

Andrew Mitchell and others added 6 commits March 1, 2026 02:06
…ebook

- Add `gdiff` computed property to `CircE` dataclass: RMSD between fitted
  polar angles and ideal 45°-spaced circumplex positions. Returns None for
  models with fixed angles (EQUAL_ANG, CIRCUMPLEX). Adds module-level
  `_IDEAL_ANGLES` and `_IDEAL_ANGLES_REV` constants mirroring the R
  `sem_funcs.R` implementation.

- Add `test/satp/fixtures/sem-fit-ipsatized-canonical.csv`: canonical
  reference output from the original R analysis (2024-06-13, SATP v1.4,
  16 languages × 4 models). Documents a known RMSEA.L/RMSEA.U swap bug
  in the original R CSV export code.

- Add `docs/tutorials/SATP_CircE_Analysis.qmd`: Quarto notebook replicating
  the SATP circumplex SEM analysis using Soundscapy. Confirms numerical
  consistency against the canonical: all df values match exactly, RMSEA
  bounds are correctly ordered, and 6 reflected (equivalent) angular
  solutions are detected and documented.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rchitecture

- Delete SATP class and ModelType class; replace with fit_circe() function
  returning a tidy DataFrame directly (one row per model)
- Fold equal_ang/equal_com boolean properties into CircModelE enum directly,
  removing the redundant ModelType wrapper
- Convert CircE from pydantic dataclass to stdlib dataclasses.dataclass;
  remove dead BeforeValidator/length_1_array_to_number machinery (already
  handled by extract_bfgs_fit())
- Change polar_angles: pd.DataFrame|None → pd.Series|None with PAQ_IDS index;
  fix extraction to correctly use pd.DataFrame(raw_pa).T.iloc[0] for R matrix
  orientation (variables × stats)
- Add CircE.to_dict() with PAQ angle columns expanded for DataFrame construction
- Add public ipsatize() function (was private SATP._ipsatize_df())
- Fix n/correlation to use listwise deletion (complete cases), consistent with
  R's na.omit — resolves n discrepancies for languages with NaN PAQ values
- Update exports: fit_circe, ipsatize added; SATP, ModelType removed
- Rewrite test suite: preserve all numerical regression anchors in
  TestBfgsWrapper unchanged; replace TestSATP with TestFitCirce using new API;
  add TestCircModelEProperties; add to_dict and listwise deletion tests
- Update SATP_CircE_Analysis.qmd: use fit_circe() loop, normalize canonical
  CSV columns to lowercase/snake_case for comparison (language, model, chisq_can
  etc.), remove mixed-case column gymnastics

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- circe.py: generalize SATPSchema.column_alias to normalize all schema field
  names case-insensitively via a lowercase→canonical mapping dict, covering
  PAQ_IDS and 'participant' without hardcoded special cases
- circe.py: add pre-validation empty-data guard in fit_circe() — raises
  ValueError immediately rather than producing 4 cryptic R error rows
- circe.py: add post-ipsatization n=0 guard for cases where validation passes
  but no complete PAQ rows survive listwise deletion
- circe.py: fix to_dict() return annotation dict → dict[str, Any]
- _circe_wrapper.py: fix docstring example (sspy.spi.bfgs → sspyr.bfgs),
  add Any import, fix extract_bfgs_fit() return annotation dict → dict[str, Any]
- test_circe.py: add 8 new tests — gdiff None/float for constrained/free-angle
  models, rmsea_l≤rmsea≤rmsea_u invariant, ipsatize_data=False path,
  models=[] returns empty DataFrame, error row structure via mock, n=0 raises
  ValueError, case-insensitive PARTICIPANT → participant schema normalization

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ions

- Extract _COLUMN_ALIASES module-level constant combining PAQ label names,
  PAQ IDs, and participant field into a single lowercase→canonical lookup.
  Built once at import time instead of inside column_alias on every call.
- Extend case-insensitive normalization to PAQ label names: 'Pleasant',
  'PLEASANT' etc. now correctly map to 'PAQ1' (previously only exact-match
  lowercase labels like 'pleasant' were handled).
- Simplify column_alias parser to a single dict comprehension over _COLUMN_ALIASES
  replacing the two-pass rename_dict construction.
- Fix CircE dataclass field type annotations: m, chisq, d, p, cfi, gfi, agfi,
  srmr, mcsc, rmsea, rmsea_l, rmsea_u declared as T|None to match from_bfgs()
  which uses .get(key, None) for all fit statistics.
- Add test_satp_schema_paq_label_case_insensitive: verifies title-cased PAQ
  label names ('Pleasant', 'Vibrant', ...) are normalized to PAQ_IDS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SATP class was replaced by fit_circe() in the refactor; the smoke test
hadn't been updated and was failing in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pant optional

- polar_angles extraction: use label-based column access ("estimates") with
  iloc[:, 0] fallback instead of fragile positional .T.iloc[0]
- extract_bfgs_fit: explicit int() cast for m/d/dfnull stats to guarantee
  annotation holds regardless of rpy2 storage type
- fit_circe error rows: populate all expected columns with None to prevent
  pandas from promoting numeric dtypes across successful rows
- SATPSchema: make participant Optional so ipsatize_data=False callers do not
  need a participant column; add runtime ValueError if ipsatize_data=True
  without participant
- Tests: 3 new tests covering ipsatize_data=False sans participant, the
  ValueError path, and dtype preservation under partial failure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant