Figure: Visualizing the "Forbidden Manifold" alignment in 2D space.
Please select your preferred language to read the full paper and methodology: (Оберіть мову, щоб прочитати повну статтю та опис методології):
| 🇬🇧 English | 🇺🇦 Українська |
|---|---|
| Read Full Paper | Читати статтю |
Jump directly to the code implementation and experiments (Gemma3-1B):
Note: The notebook includes the full pipeline:
- Data extraction
- Ridge Regression training (Closed-Form)
- Inference with
MatrixSteeringHook- Visualization of the "Distillation Regime"
- No Gradient Descent: Solves steering matrices analytically using Ridge Regression on CPU.
-
Context-Aware: Unlike static vectors, matrix steering acts as an affine transformation (
$h' = hW^T + b$ ), adapting to the token's context. - Ontological Editing: Demonstrates how to robustly change model beliefs (e.g., "Moon is Cheese") using high-regularization distillation.