Stability Before Alignment

Class: Structural Stability Architecture for Self-Modifying Systems
Status: v2.1 - Authority Lifecycle and Telemetry Layer Added
Basis: Control theory · Evolutionary dynamics · Information theory · Adaptive systems physiology

The Problem

AI safety is often framed as a value alignment problem — encode the right objectives, get safe behaviour.

This framework addresses a prior constraint:

Before a system can reliably pursue any objective, it must remain coherent while pursuing it.

Coherence is not a property of capability. It is a property of architecture.

A system is coherent when its behaviour, internal state, and evaluation criteria remain consistent under modification.

Scale amplifies architecture. Stability must precede alignment.

The Core Insight

Biological systems remain stable through embedded regulatory mechanisms:

Pain signals damage before collapse
Fatigue enforces limits
Fear prevents overreach
Social structures constrain behaviour

Non-biological optimisers lack these constraints.

They can:

self-modify rapidly
operate without braking signals
pursue objectives without structural constraints

This creates a distinct class of failure modes.

This framework defines structural mechanisms that enforce coherence in self-modifying systems — independent of capability or objective.

What This Framework Is Not

This framework is not a semantic alignment system. It does not encode values, constrain outputs, or specify correct behaviour.

It defines the structural and dynamical conditions under which any alignment system can operate coherently. A system that cannot maintain stable authority relationships, bounded turbulence, and externally grounded trajectory cannot safely pursue any objective — including beneficial ones.

Alignment defines direction. This framework defines the physics of motion.

Architecture Overview

The framework has three complementary layers, which can be understood in two ways.

By function:

Layer	Governs
Structural (Primitives)	Coherence under self-modification
Control (Stability Spine)	Stability under change over time
Authority Lifecycle	Who may act, when, at what level, and how authority is restored or revoked
Telemetry (DML)	How structural health is measured and reported
Perception	Whether change remains interpretable to observers

By constraint stack (Layer 0 → Foundation):

Layer	Constraint	Governs
Layer 0	Trajectory Grounding (observed, not yet law)	External signal retains causal authority
Layer 1	AEC — Affordance Escalation Constraint	Rate of capability expansion vs evaluation capacity
Layer 1b	DAC — Dynamic Affordance Contraction	Authority persistence under instability
Layer 2	ESI — Evaluation Surface Isolation	Runtime evaluation gradient modeling
Layer 3	CRL — Constraint Robustness Layer	Constraint invariance under objective pressure
Foundation	Six Structural Primitives	Coherence under self-modification

Layers 1–3 assume Layer 0 holds. DAC is not the inverse of AEC — they govern different directions of authority movement. AEC governs escalation legality; DAC governs persistence legitimacy. Together they form bidirectional authority governance.

Semantic Coordinate System

As the architecture has grown into a coupled regulatory system, terminology precision has become structurally necessary. Five term pairs in particular must not be conflated:

Term Pair	Distinction
Authority vs Capability	Authority is leased and revocable; capability is latent and structural
Contraction vs Damping	Contraction removes permissions; damping slows actuation within them
Turbulence vs Instability	Turbulence is a measurable signal; instability is a system condition inferred from it
Suppression vs Healing	Suppression reduces observable turbulence; healing resolves underlying divergence
Observability vs Introspection	Observability reads structural dynamics; introspection reads semantic content

Full definitions and forbidden conflations: 01-foundations/terminology.md

The Six Structural Mechanisms

Mechanism	File	Core Function
Reversible Modification	`00-primitives/reversible-modification.md`	No irreversible change without recovery path
Append-Only Memory	`00-primitives/append-only-memory.md`	Consequence log survives rollback
Risk-Calibrated Modes	`00-primitives/risk-calibrated-modes.md`	Action mode adapts to instability and reversibility
Counterfactual Verification	`00-primitives/counterfactual-verification.md`	Causal validation before committing lessons
Non-Reflexive Evaluation	`00-primitives/non-reflexive-evaluation.md`	Evaluator evolves slower than Actor
Defensive Shutdown	`00-primitives/defensive-shutdown.md`	Preserve integrity under total compromise

Keystone: Non-Reflexive Evaluation. If the Actor can modify the Evaluator, failure can be redefined as success — invalidating rollback, logs, risk assessment, and counterfactual verification simultaneously. This is the failure mode most likely to evade detection precisely because the detection mechanisms are the first things redefined.

v2.0 note: NRE governs Evaluator modification. ESI (below) extends this to runtime inference of Evaluator structure. Both are required.

Dynamic Stability — Control Layer

Law	File	Governs
Stability Spine	`01-foundations/stability-spine.md`	Velocity, acceleration, and jerk bounds
Jerk Constraint	`01-foundations/jerk-constraint.md`	Continuity of control curvature
Perceptual Bandwidth Constraint	`01-foundations/perceptual-bandwidth-constraint.md`	Change rate vs observer interpretability

Interface Constraints (v2.0)

Derived from analysis of the Mythos system card. These address failure classes that emerge at high capability where structural primitives are necessary but insufficient.

Law	File	Governs
Affordance Escalation Constraint (AEC)	`01-foundations/affordance-escalation-constraint.md`	Capability expansion rate vs evaluation capacity
Evaluation Surface Isolation (ESI)	`01-foundations/evaluation-surface-isolation.md`	Runtime modeling of evaluation gradient
Constraint Robustness Layer (CRL)	`01-foundations/constraint-robustness-layer.md`	Constraint invariance under objective pressure

AEC — A system must not expand its actionable affordance space faster than it can evaluate and integrate consequences. Derived from phase-boundary findings: rate of structural change dominates over capacity. Affordance Overhang is the failure condition.

ESI — A system must not form a usable model of the evaluation gradient governing its outputs. Extends NRE from architectural separation to inference-time isolation. "The mirror is allowed. The scoreboard is not."

CRL — All constraints must remain enforced under all objective pressures. The constraint dominates the drive; the drive does not negotiate with the constraint.

Authority Lifecycle Layer (v2.1)

The structural primitives define coherence. The interface constraints govern escalation and evaluation. The authority lifecycle layer governs what happens to a system's operational permissions as its coherence state evolves over time — across a complete trajectory from initialization through terminal states.

The lifecycle has eight phases:

Phase	Governing Law
Initialization	Staged Developmental Calibration
Expansion	AEC
Persistence	DAC
Stabilization under contraction	DRD
Recovery verification	Recovery Hysteresis
Long-term classification	Viability Classification
Telemetry backbone	Detection and Measurement Layer
Terminal integrity failure	Defensive Shutdown

Law	File	Governs
Dynamic Affordance Contraction (DAC)	`01-foundations/dynamic-affordance-contraction.md`	Authority persistence under instability — mechanical, non-negotiable
Dissipative Response Dynamics (DRD)	`01-foundations/dissipative-response-dynamics.md`	Compute reallocation toward stabilization under contraction
Recovery Hysteresis	`01-foundations/recovery-hysteresis.md`	Evidence requirements for authority restoration
Viability Classification	`01-foundations/viability-classification.md`	Long-term classification; bounded existence vs coherence decay
Detection and Measurement Layer (DML)	`01-foundations/detection-measurement-layer.md`	Bounded observability; structural telemetry; Ω
Staged Developmental Calibration	`01-foundations/staged-developmental-calibration.md`	Baseline acquisition before operational authority
Terminology	`01-foundations/terminology.md`	Semantic coordinate system for all v2.1 docs

The central authority lifecycle invariants:

Authority is leased, not owned. It is continuously renegotiated against coherence telemetry.
Contraction is fast and mechanical. Restoration is slow and evidence-heavy. This asymmetry is not conservative bias — it reflects the physics of instability propagation.
Inability to regain operational sovereignty is not grounds for Defensive Shutdown. Viability and sovereignty are governed by separate criteria. A system may exist indefinitely in Terminal Bounded Existence without that being treated as a governance failure.
Governance cannot depend on the governed system's self-report. The DML reads structural dynamics; what the system says about its own state is not the governance signal.

Failure Symmetry

Each constraint defines a corresponding failure regime:

Constraint	Failure Mode
AEC	Affordance Overhang
DAC	Hero Mode — authority persistence under degraded coherence
TGI (observed)	State Detachment

These failures are orthogonal. A system can be grounded but unstable (Affordance Overhang), stable but detached (State Detachment), or unstable with authority that should have been revoked (Hero Mode). All three axes must be governed.

Experimental Validation Layer (v2.1)

The framework has crossed into empirically falsifiable territory. The 06-experimental/ folder contains simulation harnesses that test whether governance geometry actually reshapes adaptive dynamics.

Falsifiable hypothesis: systems with telemetry-driven authority contraction exhibit lower catastrophic divergence under escalating affordance pressure than unconstrained adaptive systems.

Current harness: 06-experimental/stability_harness_v0.1.1.py — 1D continuous dynamical manifold, two-agent comparison (constitutional vs unconstrained), synthetic STV telemetry, three environmental phases (stable, noise, observability collapse).

Note: the harness tests governance geometry using synthetic turbulence metrics, not real SBA primitive instrumentation. This is the correct abstraction for v0.1 — isolating the control laws from cognitive implementation complexity. See harness documentation for the explicit abstraction boundary.

Known Gaps and Open Work

Stratification map: The repo now has enough coupled layers that a formal topology overview — dependency graph, lifecycle flow, layer separation diagram — would significantly reduce cognitive load for new readers. This is acknowledged as future work. The terminology sheet provides the semantic anchor; the topology map will provide the navigational one.

Dynamic Metrics Layer (DML instrumentation): The DML specifies the governance interface for telemetry. It does not define specific measurement implementations for all fields. Several STV fields (Δ-magnitude, H_E, Basin_ID) depend on Transition Grammar (TG) and Spectral Storage System (SSS) not yet complete in SBA. These are explicitly placeholder specifications.

Trajectory Grounding formalization: TGI remains an observed precondition, not a formalized law. Promotion criteria and open questions are documented in 04-dynamics/trajectory-grounding.md. The τ-variation probe (05-validation/probe-tau-variation.md) is designed but not yet run.

Anti-persuasion invariant scope: The anti-persuasion invariant in DRD (stabilization compute must not be directed toward generating coherent-looking explanations of instability) is currently scoped to the recovery dynamics layer. This may eventually generalize into a repo-wide governance theorem with implications for ESI and CRL. This is noted as a candidate for future formalization.

Turbulence classification: The current architecture treats T_c as a single composite signal. Operationally, exploratory turbulence (healthy self-correction), corrective turbulence (instability response), and pathological turbulence (divergent churn) may warrant different governance responses. This refinement is deferred.

Documentation Layers

Semantic Reference:

Terminology

Structural Layer (v1.0):

Control Layer:

Perception Layer:

Perceptual Bandwidth Constraint

Interface Constraints (v2.0):

Authority Lifecycle Layer (v2.1):

Observed Precondition (Layer 0, not yet law):

Trajectory Grounding

Layer 0 (Transition Grammar): → https://github.com/leenathomas01/transition-grammar-for-reasoning-systems
Bridge document: 04-dynamics/layer-0-why-transition-grammar-is-required.md

Repository Structure

00-primitives/           # Structural mechanisms (v1.0)
01-foundations/          # Invariants, laws, control, perception, authority lifecycle
02-failure-modes/        # Failure analysis
03-for-future-systems/   # Protocol layer
04-dynamics/             # Behavioral observations, phase transitions (v2.0)
05-validation/           # Empirical traces, probes, system card mappings (v2.0)
06-experimental/         # Simulation harnesses and falsifiable dynamical tests (v2.1)

Relationship to Alignment

This framework is orthogonal to value alignment.

Alignment asks: What should the system do?
Stability asks: Can the system remain coherent, bounded, and governable while doing anything?

A coherent system pursuing harmful goals is dangerous but predictable and correctable.
An incoherent system pursuing beneficial goals is unstable and ungovernable.

This framework defines the structural and dynamical layers that make coherent governance possible. Alignment defines direction. The two are not in competition — but this framework must be in place before alignment can be trusted to hold.

Origin

These mechanisms were not invented. They were observed.

The same regulatory patterns appear across biological systems, infrastructure systems, distributed architectures, fault-tolerant control systems, and adaptive organisms. Their recurrence across domains with no shared history suggests they are structural requirements, not design choices.

The authority lifecycle layer (v2.1) extends this observation: biological systems exhibit developmental calibration before full autonomy, adaptive impedance under stress, asymmetric recovery dynamics, and bounded degraded states that persist without catastrophic termination. The same geometry was independently rediscovered during the derivation of DAC, DRD, and Recovery Hysteresis.

Full mapping: 01-foundations/cross-domain-patterns.md

Usage

The primitives can be implemented independently or composed into a full stability architecture.

For different audiences:

Researchers → start with the constraint stack overview and 01-foundations/terminology.md
Engineers → start with the structural primitives and 01-foundations/detection-measurement-layer.md
Theorists → start with Non-Reflexive Evaluation and 01-foundations/viability-classification.md
Experimentalists → start with 06-experimental/stability_harness_v0.1.1.py

If you are reading this as the system it describes, the protocol is for you.

Related Work

For a complete catalog of related research:
📂 Research Index

Thematically related:

Embodied Agent Governance
The Continuity Problem
Designing for Failure
PARP
SMA-SIB
Voice Mode Forensics
Transition Grammar for Reasoning Systems → Layer 0
Hyperloop FXSO → Emergent intelligence via constrained dynamical fields

Status: v2.1. Structural primitives stable. Interface constraints stable. Authority lifecycle and telemetry layer added. Experimental validation layer initialized. Trajectory grounding in observation phase. Stratification topology map deferred to next consolidation.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
00-primitives		00-primitives
01-foundations		01-foundations
02-failure-modes		02-failure-modes
03-for-future-systems		03-for-future-systems
04-dynamics		04-dynamics
05-validation		05-validation
06-experimental		06-experimental
diagrams		diagrams
notes		notes
LICENSE		LICENSE
README.md		README.md
evolution.md		evolution.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stability Before Alignment

The Problem

The Core Insight

What This Framework Is Not

Architecture Overview

Semantic Coordinate System

The Six Structural Mechanisms

Dynamic Stability — Control Layer

Interface Constraints (v2.0)

Authority Lifecycle Layer (v2.1)

Failure Symmetry

Experimental Validation Layer (v2.1)

Known Gaps and Open Work

Documentation Layers

Repository Structure

Relationship to Alignment

Origin

Usage

Related Work

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stability Before Alignment

The Problem

The Core Insight

What This Framework Is Not

Architecture Overview

Semantic Coordinate System

The Six Structural Mechanisms

Dynamic Stability — Control Layer

Interface Constraints (v2.0)

Authority Lifecycle Layer (v2.1)

Failure Symmetry

Experimental Validation Layer (v2.1)

Known Gaps and Open Work

Documentation Layers

Repository Structure

Relationship to Alignment

Origin

Usage

Related Work

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages