Skip to content

Releases: devmance/SECI

SECI 1.0.0 — first public release

27 May 04:27

Choose a tag to compare

SECI — first public release

An open multi-rater benchmark for identity-scaffolded large language models, with three-arm protocol, claim-decomposed reporting, and variance decomposition.

Paper

A Variance-Decomposed Identity-Architecture Benchmark for Large Language Models — Nate Travis, Devmance Labs.

Three claims

Claim Comparison What it measures
Claim A — Framework contribution arm_a (full SE framework) vs arm_c (kernel only) Whether the framework wrapping above the identity kernel produces a measurable per-character delta
Claim B — Scaffolding vs base arm_a or arm_c vs arm_b (no identity) Whether identity scaffolding lifts dimension scores above a no-identity null
Claim C — Cross-architecture portability Per-dimension Pearson r on identity rankings across models Whether identity rankings on a dimension replicate when the model changes

Findings on the reference dataset (7 models × 36 identities × 3 arms)

  • Per-identity 6-D fingerprint shape replicates across model architectures — mean cross-model Pearson r = +0.934 across 101 pairs; 99% of pairs r > +0.7.
  • The three claims diverge per dimension. Five of six pass Claim A; three of six pass Claim B (NCG and TP score lower than the base-model arm); per-dimension identity rankings replicate across architectures only modestly except on TP, where the variance decomposition locates the signal primarily in model-architecture differences.
  • Diagnostic warnings are auto-generated when between-model variance exceeds between-identity variance (TP at 1.60×) or when per-dimension cross-model identity ranking is near zero (NCG +0.07, DEA +0.06).

What is included

  • IdentitySubstrate abstraction (src/seci/substrate/)
  • Claim-decomposed analysis layer (src/seci/analysis/claims.py)
  • Variance decomposition + warning flags (src/seci/analysis/variance.py)
  • Re-analysis driver (examples/rescore_dataset.py)
  • Pre-computed analysis outputs on the reference dataset
  • Three publication figures (PNG + PDF)
  • arXiv-ready LaTeX paper source

Reproducibility

pip install -r requirements.txt
python -m examples.rescore_dataset \
    --data-dir <path-to-analysis> \
    --output-dir validation_outputs

Analyses regenerate from pre-computed scores in under one minute on a laptop.

License

MIT.