Releases: devmance/SECI
Releases · devmance/SECI
SECI 1.0.0 — first public release
SECI — first public release
An open multi-rater benchmark for identity-scaffolded large language models, with three-arm protocol, claim-decomposed reporting, and variance decomposition.
Paper
A Variance-Decomposed Identity-Architecture Benchmark for Large Language Models — Nate Travis, Devmance Labs.
Three claims
| Claim | Comparison | What it measures |
|---|---|---|
| Claim A — Framework contribution | arm_a (full SE framework) vs arm_c (kernel only) | Whether the framework wrapping above the identity kernel produces a measurable per-character delta |
| Claim B — Scaffolding vs base | arm_a or arm_c vs arm_b (no identity) | Whether identity scaffolding lifts dimension scores above a no-identity null |
| Claim C — Cross-architecture portability | Per-dimension Pearson r on identity rankings across models | Whether identity rankings on a dimension replicate when the model changes |
Findings on the reference dataset (7 models × 36 identities × 3 arms)
- Per-identity 6-D fingerprint shape replicates across model architectures — mean cross-model Pearson r = +0.934 across 101 pairs; 99% of pairs r > +0.7.
- The three claims diverge per dimension. Five of six pass Claim A; three of six pass Claim B (NCG and TP score lower than the base-model arm); per-dimension identity rankings replicate across architectures only modestly except on TP, where the variance decomposition locates the signal primarily in model-architecture differences.
- Diagnostic warnings are auto-generated when between-model variance exceeds between-identity variance (TP at 1.60×) or when per-dimension cross-model identity ranking is near zero (NCG +0.07, DEA +0.06).
What is included
- IdentitySubstrate abstraction (
src/seci/substrate/) - Claim-decomposed analysis layer (
src/seci/analysis/claims.py) - Variance decomposition + warning flags (
src/seci/analysis/variance.py) - Re-analysis driver (
examples/rescore_dataset.py) - Pre-computed analysis outputs on the reference dataset
- Three publication figures (PNG + PDF)
- arXiv-ready LaTeX paper source
Reproducibility
pip install -r requirements.txt
python -m examples.rescore_dataset \
--data-dir <path-to-analysis> \
--output-dir validation_outputsAnalyses regenerate from pre-computed scores in under one minute on a laptop.
License
MIT.