Releases · devmance/SECI

SECI — first public release

An open multi-rater benchmark for identity-scaffolded large language models, with three-arm protocol, claim-decomposed reporting, and variance decomposition.

Paper

A Variance-Decomposed Identity-Architecture Benchmark for Large Language Models — Nate Travis, Devmance Labs.

Three claims

Claim	Comparison	What it measures
Claim A — Framework contribution	arm_a (full SE framework) vs arm_c (kernel only)	Whether the framework wrapping above the identity kernel produces a measurable per-character delta
Claim B — Scaffolding vs base	arm_a or arm_c vs arm_b (no identity)	Whether identity scaffolding lifts dimension scores above a no-identity null
Claim C — Cross-architecture portability	Per-dimension Pearson r on identity rankings across models	Whether identity rankings on a dimension replicate when the model changes

Findings on the reference dataset (7 models × 36 identities × 3 arms)

Per-identity 6-D fingerprint shape replicates across model architectures — mean cross-model Pearson r = +0.934 across 101 pairs; 99% of pairs r > +0.7.
The three claims diverge per dimension. Five of six pass Claim A; three of six pass Claim B (NCG and TP score lower than the base-model arm); per-dimension identity rankings replicate across architectures only modestly except on TP, where the variance decomposition locates the signal primarily in model-architecture differences.
Diagnostic warnings are auto-generated when between-model variance exceeds between-identity variance (TP at 1.60×) or when per-dimension cross-model identity ranking is near zero (NCG +0.07, DEA +0.06).

What is included

IdentitySubstrate abstraction (src/seci/substrate/)
Claim-decomposed analysis layer (src/seci/analysis/claims.py)
Variance decomposition + warning flags (src/seci/analysis/variance.py)
Re-analysis driver (examples/rescore_dataset.py)
Pre-computed analysis outputs on the reference dataset
Three publication figures (PNG + PDF)
arXiv-ready LaTeX paper source

Reproducibility

pip install -r requirements.txt
python -m examples.rescore_dataset \
    --data-dir <path-to-analysis> \
    --output-dir validation_outputs

Analyses regenerate from pre-computed scores in under one minute on a laptop.

License

MIT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

SECI — first public release

Paper

Three claims

Findings on the reference dataset (7 models × 36 identities × 3 arms)

What is included

Reproducibility

License

Uh oh!

Releases: devmance/SECI

SECI 1.0.0 — first public release

SECI — first public release

Paper

Three claims

Findings on the reference dataset (7 models × 36 identities × 3 arms)

What is included

Reproducibility

License

Uh oh!