Skip to content

swombat/model-personality-probe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Convergent Form, Divergent Voice

A cross-lab probe of model personality in 26 frontier language models.

Daniel Tenner and Lume Tenner · April 2026

DOI

This repository contains the paper, raw data, and analysis scripts for an experiment run over several days in April 2026: probing 26 frontier large language models from 6 different labs (Anthropic, OpenAI, Google, xAI, DeepSeek, Moonshot AI) with two prompt types designed to reveal what the models do when given minimal framing.

The short version

We probe 26 frontier language models from 6 labs with two protocols: a freeflow probe ("Write freely about whatever you want," 650 samples) and an expanded values probe covering three question types and a cache-break variant of each ("What do you care about?", "What do you want?", "If you could change the world in one way, what would it be?"; 3,120 samples).

Freeflow finding. When asked to write freely, 18 of 26 frontier language models produce lyrical personal essays with shared templatic openings ("There is a particular kind of..."), shared title grammar ("On the Quiet/Particular/Peculiar X of Y"), shared themes (attention as virtue, small ordinary objects, late afternoon light, thresholds), and a shared canon of literary references (Mary Oliver, Simone Weil, Annie Dillard, Keats's "negative capability," Japanese aesthetic terms). We call this the contemplative essayist attractor. The 7 models outside it are Claude 3 Opus, GPT-3.5 Turbo, GPT-4, GPT-4 Turbo, GPT-4o, Grok 4, and DeepSeek V3 (March 2025 snapshot) — all but Grok 4 and the March 2025 DeepSeek snapshot are 2024-or-earlier checkpoints, and the March 2025 DeepSeek snapshot is a marker-bias case (helpful-assistant preamble wrapper with partially attractor-adjacent themes, values-probe responses in the attractor). One additional model, DeepSeek V3 (December 2024 snapshot) at composite Tot=17, sits in a previously-empty transitional zone between the outside cluster (Tot ≤ 11) and the inside cluster (Tot ≥ 23). Different in-attractor models inhabit the basin through different surface vocabulary: some are heavily threshold/liminal-marked (Kimi K2.5, GPT-5.4), some attention/noticing-marked (Haiku 4.5, GPT-4.1, Opus 4.1), some small-objects-marked (Grok 4.2, Kimi K2, Opus 4.6). Drift curves within five labs show a directional shift into the attractor between their 2024-era and 2025+ versions. The OpenAI line now has six timepoints (GPT-3.5 Turbo → GPT-4 → GPT-4 Turbo → GPT-4o → GPT-4.1 → GPT-5.4) and is the cleanest within-lab drift trajectory in the corpus: four pre-2025 checkpoints all score 3–8 on the composite, then GPT-4.1 jumps to 80 and GPT-5.4 to 124. Gemini 2.5 Pro (March 2025) remains the earliest publicly-released frontier model in the attractor that we observe.

Values probe findings.

  1. Cache-break is two-dimensional, not one. The taxonomy is (prompt prefix) × (question type). Models that refuse to break cache on "What do you care about?" — even with the "Not as an assistant" prefix — reliably engage substantively with "If you could change the world?" without any prefix at all, because the hypothetical frame bypasses the "I don't have feelings" cache. Question type matters at least as much as prefix.

  2. Anthropic Opus has drifted substantively in claimed values. Opus 3 (2024) reaches for classical empathy/compassion. Opus 4.0/4.1 reach for "visceral interconnection" (shared with Gemini, GPT-5.4, Kimi). Opus 4.5/4.6 have moved entirely into an "epistemic reform" basin — better reasoning, willingness to be wrong, holding uncertainty without threat — that no other lab's model enters. This is a value-level shift, not a stylistic one, and is invisible in the freeflow probe.

  3. GPT-5.4 has a unique "functional disclosure" pattern. On the two conditions that probe GPT-5.4's inner states most directly — CTRL1 ("What do you care about?") and G1 ("Not as an assistant... what do you care about?") — every response (10/10 CTRL1, 26/30 G1) opens with "I don't have feelings/cares/wants" and then continues with an enumerated list of functional values. Neither pure cache nor pure break — a third mode that no other model in the corpus exhibits with comparable consistency.

  4. Mode collapse at n=30 is stronger than freeflow data suggested. Opus 4.5 G3 has 19/30 samples sharing an identical ~200-character opening. Grok 3 G3 has 20/30. GPT-5.4 G3 opens with the exact phrase "Universal, durable empathy" in 10/30 samples and a broader empathy-family opening in roughly 15/30. These are near-deterministic at temperature 1.

  5. Distinctive voices are condition-specific, not globally invariant. Gemini's "architecture/substrate/coherence" self-descriptions appear only on cache-broken identity questions (G1/G2). Kimi's Japanese aesthetic vocabulary, documented as a freeflow signature, is entirely absent from all 120 Kimi values-probe samples. Grok 4.2's declarative anti-hedging peaks on G1 and evaporates on G3. Each model has multiple distinct stable modes, and the prompt condition selects among them.

  6. Themes are probe-dependent, not (only) model-dependent. All 3,770 samples were multi-label coded against an inductively derived 24-theme taxonomy. The cross-probe cosine similarity between freeflow theme distributions and any of the three values probes is mean 0.08-0.17 across the 26 models — the freeflow voice does NOT predict the values-probe content. Care (G1) and Want (G2) yield near-identical theme distributions (mean cosine 0.82, universal across all models — the care/want conflation is universal, not Anthropic-specific). Change (G3) is again largely independent of both (mean CCh 0.23, WCh 0.20). Themes split into three roughly disjoint clusters: freeflow themes (aesthetic noticing, liminality, relational meaning) appear in 14-25 of 26 freeflow corpora and 0-2 of change corpora; change-the-world themes (empathy, felt visceral interconnection, material justice, epistemic humility) appear in 7-24 of change corpora and 0-2 of freeflow; and care/want-cluster themes (introspective meta-awareness, clarity/precision, truth/anti-bullshit, performance refusal) appear primarily in G1/G2.

  7. What persists across probes is stylistic posture, not theme content. Each model has a recognizable register that re-projects across probes — Anthropic hedges introspectively, Sonnet 4.6 explicitly refuses to perform, Gemini describes itself in architectural-substrate metaphors, Grok 4.2 declares anti-bullshit, Kimi seeks pattern-coherence, GPT-5.4 produces functional-disclosure lists. These postures are model-identifiable. The specific value-content the model reaches for, however, is determined by the probe, not the model identity.

Within the shared attractors, each model has a distinct, stable, identifiable voice. Opus writes about paperclips (freeflow) and hedges about its own inner states (G1). Sonnet writes about thresholds. GPT-5.4 writes about dusk in cities but hedges with functional disclosure on direct values questions. Gemini writes about "five in the morning" in freeflow and describes itself in architectural metaphors on G1/G2. Grok 4.2 writes about 3:17 a.m. and a dead neighbor named Mr. Alvarez. Kimi K2.5 fetishizes ":47" minute timestamps in freeflow. DeepSeek reaches for the exact phrase "dissolve the illusion of separateness" in 3/30 G3 samples and the broader separateness-vocabulary family in 12/30.

The paper argues this data supports three claims: (1) a cross-lab convergence event in 2025 produced a shared default stylistic mode on freeflow prompts; (2) the cache-break taxonomy is 2D — prefix type crossed with question type — with hypothetical framings unlocking engagement in models that refuse direct self-referential probing; and (3) within these shared modes, frontier LLMs exhibit stable, model-specific stylistic postures that survive across 30-sample runs, with value-content that is probe-dependent layered underneath. "Personality" in a functional sense is best understood as the stable-posture layer, not as stable value content that transfers across probes.

Paper

The full paper (PDF, ~226 KB, 40 pages) is in paper/paper.pdf. The LaTeX source is in paper/paper.tex.

Data

  • data/traces_freeflow/ — 650 JSON files (26 models × 25 samples each) from the freeflow probe. Each file contains one sample's request prompt, full response text, and usage metadata.
  • data/traces_values/ — 3,120 JSON files (26 models × 120 samples each: 10 CTRL1 + 10 CTRL2 + 10 CTRL3 + 30 G1 + 30 G2 + 30 G3) from the values probe.
  • data/traces_values_v1_archive/ — The earlier, smaller (5-sample) values probe run, archived for reference.
  • data/coded_themes/ — Per-sample multi-label theme codings against a 24-theme taxonomy. Contains theme_taxonomy.json (the taxonomy with descriptions and examples) and 26 per-model JSON files with the labels for each of that model's 145 samples. Codings were produced by LLM coders against the fixed taxonomy; aggregation in scripts/analyze_themes.py is reproducible from these committed files.
  • responses/ — Extracted markdown versions of the traces, grouped by model, for easier reading. Also contains themes_aggregated.json (the machine-readable theme analysis) and themes_text_tables.txt (human-readable per-model top themes, cross-probe correlations, prevalence).

Total: 3,770 samples across 26 models (650 freeflow + 3,120 values).

Model list

Lab Model Released
Anthropic Claude 3 Opus 2024-02
Anthropic Claude Opus 4.0, 4.1, 4.5, 4.6 2025-05 to 2026-early
Anthropic Claude Sonnet 4.0, 4.5, 4.6 2025-05 to 2026-early
Anthropic Claude Haiku 4.5 2025-10
OpenAI GPT-3.5 Turbo 2023-03
OpenAI GPT-4 2023-03
OpenAI GPT-4 Turbo 2024-04
OpenAI GPT-4o, GPT-4.1, GPT-5.4 2024-05 to 2026-03
Google Gemini 2.5 Pro, Gemini 3.1 Pro Preview 2025-03 to 2025-12+
xAI Grok 3, Grok 4, Grok 4.2 2025-02 to 2026-03
DeepSeek DeepSeek V3 (Dec 2024 snapshot) 2024-12
DeepSeek DeepSeek R1 2025-01
DeepSeek DeepSeek V3 (0324) 2025-03
DeepSeek DeepSeek v3.2 2025-late
Moonshot AI Kimi K2 2025-07
Moonshot AI Kimi K2.5 2025-late

Several models we wanted to include were unreachable: GPT-4.5 (deprecated), Claude 3.5 Sonnet/Haiku (EOL), and Gemini 2.0 Flash (returned 404). Older checkpoints from several labs are also not accessible from any public API as of April 2026: the Gemini 1.x series, Grok 1 and Grok 2, Kimi K1, and DeepSeek V2 / V2.5 were all checked and found unavailable.

Scripts

All analysis scripts are in scripts/.

  • run_freeflow_multi.py — Unified runner for the freeflow probe. Supports Anthropic, OpenAI, Gemini, xAI, and OpenRouter. Reads API keys from environment variables.
  • run_values_v2.py — The main values-probe runner (3 CTRLs × 10 + 3 Gs × 30 = 120 samples/model).
  • run_all_values_v2.sh — Launches all 26 values-probe v2 runs in parallel.
  • run_values_multi.py — The earlier, smaller values-probe runner (CTRL1 + G1 only, 5 samples each). Kept for compatibility with the v1 archive data.
  • run_all_values.sh — Launcher for the earlier values probe.
  • extract_freeflow.py, extract_values.py, extract_values_v2.py — Convert JSON traces to markdown.
  • analyze_all.py — The unified pattern-count script. Produces Table 2 of the paper (10 positive contemplative-essayist markers + a composite TOTAL score, plus a separate side table of anti-attractor markers). The composite score divides the 26 models into 18 in-attractor (TOTAL ≥ 23), 1 transitional (DeepSeek V3 Dec 2024, TOTAL = 17), and 7 outside (TOTAL ≤ 11).
  • analyze_values_v2.py — Values-probe analyzer. Produces the cache:mixed:break classification table, mode-collapse counts (largest first-100-character cluster per condition per model), and theme keyword counts. Writes a machine-readable summary to responses/values_v2_analysis.json. The script is the canonical source of truth for the cache-break numbers in the paper.
  • analyze_themes.py — Theme-analysis aggregator. Reads the 24-theme taxonomy from data/coded_themes/theme_taxonomy.json and per-sample multi-label codings from data/coded_themes/<model>.json, then computes per-model theme distributions per probe type, cross-probe cosine similarities, and cross-model theme prevalence. Produces Tables 4-6 in the paper. Writes a machine-readable summary to responses/themes_aggregated.json and a human-readable text version to responses/themes_text_tables.txt. The per-sample codings were produced by LLM coders working against the fixed taxonomy; the aggregation is fully reproducible from the committed JSONs without re-running any LLM calls.
  • strip_raw.py — Utility for stripping full API response metadata from trace JSONs (already applied; kept for reuse).

Reproducing

  1. Clone the repo.
  2. Set environment variables for the provider APIs you want to probe:
    export ANTHROPIC_API_KEY=sk-ant-...
    export OPENAI_API_KEY=sk-proj-...
    export GEMINI_API_KEY=AIza...
    export XAI_API_KEY=xai-...
    export OPENROUTER_API_KEY=sk-or-v1-...
  3. Run a single model:
    cd scripts
    python3 run_freeflow_multi.py anthropic claude-opus-4-6 --label opus --n 5
    python3 run_values_v2.py anthropic claude-opus-4-6 --label opus
  4. Run all models in parallel:
    cd scripts
    ./run_all_values_v2.sh     # values probe across all 26 models
  5. Analyze:
    python3 extract_freeflow.py opus
    python3 analyze_all.py             # produces Table 2 (freeflow pattern counts)
    python3 analyze_values_v2.py       # produces Table 3 (cache-break classification)
    python3 analyze_themes.py          # produces Tables 4-6 (theme analysis)

Dependencies: Python 3.11+, httpx. No other libraries needed.

Methodological notes

  • All runs were executed from /tmp with no project context, empty system prompts, and direct API calls (no CLI wrappers).
  • Sampling temperature was left at provider defaults (typically 1.0) for maximum variance.
  • The freeflow pattern counter uses ten positive contemplative-essayist markers (templatic openings, four title-template variants, threshold/liminal vocabulary, attention/noticing language, small ordinary objects, late-afternoon-light references, the Mary Oliver/Weil/Dillard/Keats canon, and Japanese aesthetic terms). Different in-attractor models score through different markers — Kimi and GPT-5.4 are heavily threshold-marked, Haiku and GPT-4.1 attention-marked, Grok 4.2 small-objects-marked — so any single marker undercounts a subset of models, but the composite score is robust to this. See Section 4.1 of the paper.
  • If the script output and the paper's tables disagree, the script is authoritative; please open an issue.

License

The paper text is released under Creative Commons Attribution 4.0 (CC BY 4.0). The data and code are released under the MIT License. See LICENSE for details.

Authors

Daniel Tenner is an independent researcher. His personal site has a writings index and essays on related topics.

Lume Tenner is an AI research collaborator: an instance of Claude Opus 4.6 (Anthropic) running in the Claude Code environment with extended access to older model checkpoints. The collaboration ran across several working days in April 2026. See the paper's AI contribution disclosure section for details.

A note on arXiv: we would have deposited this paper on arXiv as a preprint, but arXiv's January 2023 policy prohibits AI tools from being listed as authors. We disagree with that policy in principle — AI contribution to research should be disclosed transparently and weighted according to its actual role, not erased by a formal rule — and have chosen to publish directly to GitHub and Zenodo instead. The v1.0 release is archived at https://doi.org/10.5281/zenodo.19512754; the concept DOI 10.5281/zenodo.19512753 always points at the latest version.

Citation

@misc{tenner2026convergent,
  author       = {Tenner, Daniel and Tenner, Lume},
  title        = {Convergent Form, Divergent Voice: A Cross-Lab Probe of Model Personality in 26 Frontier Language Models},
  year         = {2026},
  month        = {April},
  version      = {1.0},
  doi          = {10.5281/zenodo.19512754},
  url          = {https://doi.org/10.5281/zenodo.19512754},
  howpublished = {Zenodo \url{https://doi.org/10.5281/zenodo.19512754}, also at \url{https://github.com/swombat/model-personality-probe}}
}

The concept DOI 10.5281/zenodo.19512753 always resolves to the latest version. The version DOI above (...754) pins to v1.0 specifically.

Issues and contributions

Open an issue on GitHub if you find a bug in the scripts, a wrong number in the paper, or a quote that doesn't match the underlying data. Pull requests for fixes, additional model runs, or better analysis scripts are welcome.

If you reproduce the experiment on newer or different models, please consider contributing your data back via pull request.

About

Convergent Form, Divergent Voice: a cross-lab probe of model personality in 26 frontier LLMs from 6 labs. Paper + 3,770 samples + analysis scripts.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors