The h(x) → f*(Z) → h'(y) decomposition is NOT three equal boxes.
Encoding (h): Layers 0-31 — THICK, gradual, 30+ layers
- L0-L14: Noise floor (~0.01 similarity to L33). Still thinking linguistically.
- L15-L19: Convergence starts. Language stripping begins.
- L20-L31: Accelerating compression. Reasoning and encoding happen SIMULTANEOUSLY.
- The model doesn't have a clean "first encode, then reason" phase. It's doing both at once, with linguistic clothing getting thinner each layer.
Z instantiation: Layer 32 — the purest Z
- 12/12 configs confirm. Maximum language-agnostic compression.
- Not L33 (the bottleneck) but L32 (the approach layer) — more dynamic range.
Decoding (h'): Layers 33-36 — THIN, sharp, 4 layers
- L33→L34 drop of 0.452 is the sharpest phase transition in the network.
- Once you have the answer in Z-space, re-wrapping in language is EASY.
- This matches the "thin wrapper" prediction from the original hypothesis.
Critical insight: You DON'T need to find Z independently at every layer.
- Find Z where it's cleanest (L32)
- Use L32's Z basis to decompose EVERY other layer's activations
- At layer 5, activations are tangled — but projecting onto L32's basis tells you: how much of what's here will eventually become reasoning content vs linguistic scaffolding
For each layer transition k → k+1: Δh_k = h_{k+1} - h_k
Project onto Z and Z⊥ (using L32's basis): ||Δh_k^Z|| = magnitude of update within reasoning dims ||Δh_k^Z⊥|| = magnitude of update within linguistic dims
The RATIO tells you what each layer is doing:
- ||Δh_k^Z⊥|| >> ||Δh_k^Z|| → layer is mostly stripping language (encoding)
- ||Δh_k^Z|| >> ||Δh_k^Z⊥|| → layer is mostly computing reasoning steps
- Comparable → layer is doing both simultaneously
Prediction: Early layers Z⊥-dominated (encoding), middle mixed (thinking in language while compressing), L33-36 Z⊥-dominated again (decoding). Pure Z layers cluster at L32.
Most profound case: If the ratio is NEVER strongly Z-dominated, then the model never has a "pure reasoning" phase. Z is an emergent property of 30 layers of mixed computation, not a discrete computational phase. This would mean encoding and reasoning are the SAME process.
Open question: Does Z rotate as information flows forward?
- If same dimension indices survive ARD at L10, L20, L25, L32 → Z is a FIXED subspace (reasoning allocated to specific dimensions at initialization)
- If different indices → Z ROTATES, and L32's mask underestimates early-layer Z-content
- Either answer is informative. Fixed = pre-allocated. Rotating = active reorganization.
Test: Run ARD-MMD at L10, L20, L25, L32 independently. Compare surviving indices.
| Tier | Before Phase 2 | After Phase 2 | What changed |
|---|---|---|---|
| ~90%: Something interesting at a specific layer | ~90% | ~98% (confirmed) | L32 found, 12/12 configs |
| ~80%: Linear bridge works | ~80% | ~75% | No behavioral asymmetry at 3B weakens bridge narrative |
| ~65%: Z is low-dim, clean | ~65% | ~65% | 83% config success encouraging but lengthscale shape unknown |
| ~30%: Universal across models | ~30% | ~30% | No cross-model comparison run |
| NEW: Language actively excluded from Z | n/a | ~85% | Energy fraction 54% of random |
- Does swapping Z-activations at L32 between zh/en change output?
- Does the change go in the predicted direction? (English gets "Chinese-style" reasoning)
- Is the effect size large enough to rule out noise?
Phase 3 is the difference between "interesting structural observation" and "paper."
Add ~20 lines to Phase 3 script:
- Extract activations at every layer for zh/en pairs
- Compute Δh_k at each layer
- Project onto L32's Z basis
- Plot ||Δh_k^Z|| / ||Δh_k^Z⊥|| ratio across all 36 layers
- Identify pure-reasoning vs pure-encoding vs mixed layers
"Structural identification of language-agnostic reasoning subspaces without multilingual data"
Key differentiation from NeurIPS 2505.15257:
- They need paired multilingual activations → data-driven SVD
- We need only the weight matrices → structural SVD
- We can predict WHERE Z lives before seeing any data
- Our approach is model-introspective, not dataset-dependent