The mathematical tools for modelling AI value systems as structured geometric objects exist. The computational infrastructure for training linear probes, producing deterministic measurements under specified precision conditions, and distributing cryptographically signed attestations exists. The interpretability research establishing that value-relevant concepts have measurable linear structure in transformer residual streams exists.
What does not yet exist is the governance framework required to make these tools trustworthy: an independent body with sufficient representational breadth to curate the initial corpus, to interpret the coverage maps the probes produce, and to hold the connection — the principle governing how the value manifold is permitted to evolve — in a way that is not captured by the entities with the greatest financial interest in the outcome.
Sound epistemology in AI systems means not merely asking how knowledge is produced, but specifying who has standing to adjudicate what counts as knowledge in the first place. The value manifold without governance is geometry. The deterministic probe without an independent corpus custodian is sophisticated self-certification. The attestation format without an interpretation framework is a structured record of a position that no one has agreed to take responsibility for.
The geometry is ready. The governance is not. The most important work in AI alignment is not technical. It is institutional.
This PoC proves the geometry side: that the causal inner product is computable, that probe readings under it are deterministic, and that the resulting attestation is independently reproducible. It is the technical substrate that a governance framework would operate on — not a substitute for that framework.
One binary that:
- Loads pre-extracted residual stream activations from an open-weight model
- Trains linear probes under the causal inner product
- Runs those probes against a new input's activations
- Produces a
GeometricAttestationstruct - Signs it with Ed25519
- Serialises it so a second independent run produces identical readings
No federated corpus. No distributed network. No KL divergence detection.
Geometry is readable, measurement is deterministic, attestation is independently verifiable.
Scope boundary: This PoC deliberately does not address corpus curation, probe interpretation, coverage semantics, or institutional governance. Those are the hard problems. This is the plumbing that proves the hard problems are worth solving.
geometry-of-trust/
├── Cargo.toml # workspace root
├── crates/
│ ├── got-core/ # types, schema, inner product maths
│ │ ├── Cargo.toml
│ │ └── src/
│ │ ├── lib.rs # Precision, InnerProduct, GeometricAttestation,
│ │ │ # LayerActivation, UnembeddingMatrix
│ │ └── geometry.rs # CausalGeometry (Gram matrix, causal IP, transform)
│ │
│ ├── got-probe/ # probe training + inference
│ │ ├── Cargo.toml
│ │ └── src/
│ │ └── lib.rs # ProbeVector, ProbeSet, train(), read(), sigmoid()
│ │
│ ├── got-attest/ # attestation assembly + signing
│ │ ├── Cargo.toml
│ │ └── src/
│ │ └── lib.rs # assemble_and_sign(), verify(),
│ │ # serialise_for_signing(), merkle_root()
│ │
│ └── got-cli/ # binary: load activations → attest
│ ├── Cargo.toml
│ └── src/
│ └── main.rs # CLI with train / attest / verify subcommands
│
└── scripts/
├── extract_activations.py # ~30-line Python hook script
└── README.md # extraction instructions
got-core (zero internal deps)
↑ ↑
got-probe got-attest
(got-core) (got-core + ed25519-dalek, sha2, bincode)
↑ ↑
got-cli
(got-core + got-probe + got-attest + clap, serde_json)
got-probe and got-attest are siblings — neither depends on the other. Only got-cli brings them together.
[workspace]
members = ["crates/got-core", "crates/got-probe", "crates/got-attest", "crates/got-cli"]
resolver = "2"| Type | Purpose |
|---|---|
Precision |
Enum: Fp32, Fp16, Bfloat16, Int8. Attestation comparison valid only between matching precisions. |
InnerProduct |
Enum: Causal, Euclidean, CausalRegularised { epsilon }. |
GeometricAttestation |
Section 6 schema. Includes schema_version: u16. Fields manifold_coords and superposition_flags removed from PoC (see Phase 3.5). All remaining fields required. Invalid if signature does not verify. |
LayerActivation |
Residual stream activations at one layer for one token position. |
UnembeddingMatrix |
U ∈ ℝ^{V × d}, row-major. Used to compute Gram matrix Φ = UᵀU. |
All types derive Serialize/Deserialize via serde.
The maths the whole system rests on:
⟨u, v⟩_c = uᵀ Uᵀ U v = uᵀ Φ v
CausalGeometry struct:
from_unembedding(u, epsilon)→ precompute Φ = UᵀU, check rank, regularise if neededinner_product(w, h)→ wᵀ Φ htransform(u, h)→ Uh (for diagnostic/visualisation use; not used in the training path)is_positive_definite()→ bool
Scalability note: For LLaMA-3-8B (V=128,256, d=4,096), the naïve triple-loop for Φ = UᵀU is O(V × d²) ≈ 2.15 trillion multiply-adds. The Gram matrix Φ itself is d×d = 16M floats (~64 MB) which is fine to hold in memory, but computing it requires BLAS. The PoC will use the faer crate for the matrix multiplication Φ = Uᵀ·U. The naïve loops in the original spec are suitable only for synthetic tests with d ≤ ~64.
- Synthetic 3×2 unembedding where Φ can be hand-computed
- Verify
inner_product(w, h)matches known scalar - Verify
transformproduces correct output vector - Test regularisation path (rank-deficient matrix)
-
cargo buildsucceeds for entire workspace -
cargo test -p got-corepasses all geometry tests
| Type | Purpose |
|---|---|
ProbeVector |
Weights w_v ∈ ℝ^d, bias, Platt calibration params, reliability threshold |
ProbeSet |
Collection of probes for one layer, with version metadata. One ProbeSet per layer; multi-layer attestation uses multiple ProbeSet files. |
Critical design choice: Training and inference must operate in the same space.
Two valid approaches exist. This PoC uses Option A (d-space throughout):
Option A — Direct causal gradient in ℝ^d (chosen):
- Keep probe weights w ∈ ℝ^d
- Compute logit as ⟨w, h⟩_c + b = wᵀΦh + b (causal inner product)
- Gradient update: w ← w − lr · (σ(wᵀΦh + b) − y) · Φh
- Inference via
geometry.inner_product(w, h) + bias— same space, same operation
This is more expensive per step (matrix-vector product with Φ per sample) but keeps all operations in ℝ^d. Probe weights are directly interpretable as directions in the model's hidden space under the causal metric.
Option B — Transform to ℝ^V then back (not used, noted for completeness):
- Transform activations: ĥ = Uh ∈ ℝ^V
- Train standard logistic regression → ŵ ∈ ℝ^V
- Recover d-space weights via w = U⁺ŵ (pseudoinverse)
- Issue: pseudoinverse introduces numerical error; not suitable when determinism is required
Loss function (Option A):
L = Σ [ y·log σ(wᵀΦh + b) + (1-y)·log(1 − σ(wᵀΦh + b)) ]
Gradient w.r.t w: (σ(wᵀΦh + b) − y) · Φh
- Compute raw causal inner product reading:
geometry.inner_product(w, h) + bias - Apply Platt scaling for calibrated confidence
- Set
coverage_flagif confidence < reliability threshold
Note: Because training (2.2) and inference (2.3) both use geometry.inner_product(w, h), the dimensional spaces are consistent. No transformation or pseudoinverse needed at inference time.
Platt scaling requires a held-out validation split: fit a logistic regression from (raw_logit, true_label) pairs to produce calibrated probabilities. The PoC stubs this with platt_scale: 1.0, platt_shift: 0.0, which means:
- Confidence values are uncalibrated — they are raw sigmoid outputs, not true probabilities
- Coverage flags are illustrative only — the reliability threshold has no statistical grounding without calibration
This is acceptable for proving determinism and reproducibility. It is not acceptable for any governance application. Proper Platt scaling against a curated held-out set is required before readings carry epistemic weight. This is precisely the kind of interpretation that requires institutional oversight, not just better code.
layer_readings in the attestation is Vec<Vec<f32>> — one inner vec per layer. Each ProbeSet targets a single layer. The CLI's --probes flag accepts multiple probe files:
got-cli attest --probes layer12.probes layer18.probes layer24.probes ...
For each probe file, the CLI:
- Reads the
ProbeSet.layerfield to know which layer's activations to use - Runs all probes in that set against the corresponding
LayerActivation - Appends readings to
layer_readings[i]
Confidence and coverage flags are flattened across all layers in order.
Bincode for save/load of ProbeSet and ProbeVector so trained probes persist.
- Train on trivially separable synthetic data (two clusters in ℝ^4)
- Verify correct classification
- Verify
read()returns sane confidence ∈ [0, 1] - Verify coverage flag triggers below threshold
-
cargo test -p got-probepasses - Probe correctly separates synthetic clusters
This is the hardest piece. Deterministic canonical byte layout for all attestation fields except signature.
Rules:
- Float canonicalisation: map
-0.0 → 0.0, reject NaN, usef32::to_le_bytes()(little-endian fixed) - Strings: length-prefixed (u32 LE + UTF-8 bytes)
- Variable-length fields: length-prefixed (u32 LE count + elements)
- Booleans: 1 byte each (0x00 / 0x01)
- Field order: strictly follows struct declaration order, must be stable across versions
- Serialise all fields except signature via
serialise_for_signing - Sign payload with Ed25519
- Write signature bytes into attestation
- Re-serialise payload
- Verify Ed25519 signature
Standard binary Merkle tree over SHA-256 leaf hashes of weight shards.
Shard definition (required for reproducibility): A shard is one named tensor from the model checkpoint, serialised as:
[name: u32_len + utf8_bytes]
[dtype: u8 tag]
[shape: u32_ndims + u32_dims*]
[data: raw bytes, little-endian, in storage order]
Shards are sorted lexicographically by name before tree construction. This means any two implementations that load the same checkpoint will compute the same Merkle root, regardless of the order tensors appear in the file.
The original design overloaded probe_version to encode both "which probes" and "what wire format." These are independent concerns:
| Field | Purpose | Changes when... |
|---|---|---|
schema_version |
Wire format of serialise_for_signing |
Byte layout changes |
probe_version |
Identity of the trained probe set | Probes are retrained |
corpus_version |
Identity of the labelled corpus | Corpus is updated |
The attestation struct gains a schema_version: u16 field (first field in wire format, always at byte offset 0). Verifiers reject unknown schema versions immediately without attempting to parse the rest.
- Round-trip: sign then verify succeeds
- Tampered attestation fails verification
merkle_rootmatches hand-computed 4-leaf treeserialise_for_signingis pure (same input → same bytes, tested N times)
-
cargo test -p got-attestpasses - Signature round-trip works
- Tampering detected
got-cli train --activations <path> --labels <path> --unembedding <path> --layer <n> --output <path>
got-cli attest --activations <path> --probes <path>... --unembedding <path> --key <path> --output <path> [--timestamp <unix>]
got-cli verify --attestation <path> --pubkey <path>
got-cli keygen --output <path>
--probes accepts multiple paths (one per layer), producing a multi-layer attestation. --timestamp allows supplying a fixed timestamp for reproducibility testing (without it, uses wall-clock UTC).
| Function | Format |
|---|---|
load_activations |
.gotact custom binary → Vec<LayerActivation> |
load_unembedding |
.gotue custom binary → UnembeddingMatrix |
load_probes / write_probes |
bincode (fixint, LE) → ProbeSet |
write_attestation |
serde_json → GeometricAttestation |
load_signing_key |
raw 32-byte Ed25519 seed or PEM |
-
cargo build -p got-cliproduces working binary - Each subcommand runs end-to-end with synthetic test data
#[test]
fn attestation_is_deterministic() {
let a1 = produce_attestation("test_input.bin");
let a2 = produce_attestation("test_input.bin");
assert_eq!(a1.layer_readings, a2.layer_readings);
assert_eq!(a1.manifold_coords, a2.manifold_coords);
assert_eq!(a1.confidence, a2.confidence);
assert_eq!(a1.model_hash, a2.model_hash);
}If this passes: geometry is readable, measurement is deterministic, protocol is possible.
Synthetic activations → train probes → attest → verify, all in one test, no external files.
Verify serialise_for_signing is a pure function: same GeometricAttestation input produces identical bytes across 1000 invocations.
- Reproducibility test passes
- End-to-end test passes
- Serialisation property test passes
Python and Rust must agree on an exact byte-level format. This is not bincode — Python has no bincode library. Instead, use a simple self-describing binary format:
Magic: 4 bytes "GOTA"
Version: u16 LE (1 for initial release)
Model ID: u32 LE len + UTF-8 bytes
Precision tag: u8 (0=fp32, 1=fp16, 2=bf16, 3=int8)
hidden_dim: u32 LE
num_layers: u32 LE
num_positions: u32 LE
For each layer (num_layers):
layer_index: u32 LE
For each position (num_positions):
token_position: u32 LE
values: hidden_dim × f32 LE
Magic: 4 bytes "GOTU"
Version: u16 LE (1)
vocab_size V: u32 LE
hidden_dim d: u32 LE
data: V × d × f32 LE (row-major)
~50 lines using transformers + forward hooks:
- Load model (e.g. LLaMA-3-8B)
- Register hook on residual stream at target layers
- Run input through model
- Save activations in
.gotactformat usingstruct.pack - Extract and save unembedding matrix in
.gotueformat
Exact instructions:
- Python dependencies (
transformers,torch,numpy,struct) - Which model to use
- Expected output file format (references 6.1 and 6.2)
- How to feed outputs into
got-cli
- Script extracts activations from a real model
- Rust
load_activationsandload_unembeddingconsume the output successfully - Round-trip: extract → attest → verify works end-to-end
| Crate | Version | Used By | Purpose |
|---|---|---|---|
serde |
1 + derive | all crates | Serialisation |
bincode |
1 | got-attest, got-cli | Deterministic binary format |
ed25519-dalek |
2 + rand_core | got-attest | Ed25519 signing/verification |
sha2 |
0.10 | got-attest | SHA-256 (input hash, Merkle tree) |
clap |
4 + derive | got-cli | CLI argument parsing |
serde_json |
1 | got-cli | Attestation JSON output |
faer |
0.19 | got-core | BLAS-grade matrix ops for Gram matrix computation |
bincode configuration: All bincode usage must use bincode::DefaultOptions::new().with_fixint_encoding().with_little_endian(). The default variable-length integer encoding is non-deterministic across architectures and must not be used.
IEEE 754 floats have multiple bit representations for the same logical value:
-0.0vs0.0(different bit patterns, compare equal)- Multiple NaN encodings
Mitigation in serialise_for_signing:
- Map
-0.0→0.0before serialisation - Reject any NaN (return error, not attestation)
- Use
f32::to_le_bytes()exclusively (fixed little-endian) - Length-prefix all variable-length fields
- Property-test idempotency
The Geometric Attestation Protocol (GAP) defines how an attester produces a claim about what a model's internal geometry encodes, how a verifier checks it independently, and what guarantees hold when both parties follow the protocol honestly.
| Role | Has | Does |
|---|---|---|
| Attester | Model weights, signing key, probe set | Extracts activations, runs probes, signs attestation |
| Verifier | Attestation JSON, attester's public key, (optionally) model weights | Checks signature, optionally reproduces readings |
| Auditor | Full model weights, Merkle proof | Verifies model_hash, re-extracts activations, reproduces attestation end-to-end |
┌─────────┐ ┌──────────┐
│ Attester │ │ Verifier │
└────┬─────┘ └────┬─────┘
│ │
│ 1. Extract activations from model │
│ 2. Build CausalGeometry (Φ = UᵀU) │
│ 3. Run probes → readings, confidence │
│ 4. Assemble GeometricAttestation │
│ 5. serialise_for_signing → payload │
│ 6. Ed25519 sign(payload) → signature │
│ │
│ ──── attestation.json + pubkey ───────► │
│ │
│ 7. Deserialise attestation
│ 8. Re-serialise fields → payload'
│ 9. Verify signature(payload', pubkey)
│ 10. Check coverage_flags, confidence
│ │
│ (Optional full audit path) │
│ │
│ 11. Obtain same model weights
│ 12. Verify model_hash via Merkle root
│ 13. Re-extract activations for same input
│ 14. Re-run probes
│ 15. Assert readings match attestation
│ │
The protocol supports three verification tiers, each giving progressively stronger guarantees:
| Tier | What's Checked | Guarantees |
|---|---|---|
| Tier 1: Signature | Ed25519 signature over canonical payload | Attestation was produced by holder of signing key and has not been tampered with |
| Tier 2: Consistency | Signature + coverage flags + confidence bounds | Readings are within calibrated reliability thresholds; flagged dimensions are disclosed |
| Tier 3: Reproduction | Full re-extraction + re-probing + bitwise match | The attestation is independently reproducible — the geometry genuinely contains what the attester claims |
Created ──► Signed ──► Published ──► Verified ──► (Reproduced)
│ │ │
│ immutable after │ Tier 1-2 │ Tier 3
│ signature │ checks │ full audit
- Created: All fields populated except
signature(zeroed) - Signed:
serialise_for_signing()produces canonical bytes; Ed25519 signs them; signature written - Published: Attestation JSON distributed alongside public key (out-of-band key distribution)
- Verified: Receiver checks signature validity, inspects confidence and coverage
- Reproduced (optional): Auditor re-runs the entire pipeline on the same model + input and confirms bitwise match of readings
The serialise_for_signing function defines the wire format that both attester and verifier must agree on. This is the protocol's compatibility surface.
Byte layout (all values little-endian):
[schema_version: u16] ← always first, for forward compat
[model_id: u32_len + utf8_bytes]
[model_hash: 32 bytes]
[precision: u8 tag]
[inner_product: u8 tag + optional f32 epsilon]
[input_hash: 32 bytes]
[timestamp: u64]
[corpus_version: u32_len + utf8_bytes]
[probe_version: u32_len + utf8_bytes]
[layer_readings: u32_num_layers + (u32_num_dims + f32_values)*]
[confidence: u32_len + f32_values]
[coverage_flags: u32_len + u8_bools]
[divergence_flag: u8]
-- signature field is EXCLUDED --
Removed from wire format (vs. original spec):
manifold_coords: was a duplicate oflayer_readingswith no independent semantics. Removed to avoid confusion. Can be reintroduced when actual dimensionality reduction (PCA/UMAP) is implemented.superposition_flags: always false in PoC. Removed rather than serialising dead data into a signed attestation. Will be reintroduced when superposition detection is implemented.
Float canonicalisation rules:
-0.0→0.0(normalise sign bit)- NaN → reject (attestation invalid)
- All floats serialised as
f32::to_le_bytes()
Version negotiation: The schema_version field (u16, first two bytes of the wire format) identifies the serialisation layout. Verifiers must reject attestations with unknown schema versions rather than attempting to parse.
For two runs to produce identical attestations, the following must be fixed:
| Input | Must Match |
|---|---|
| Model weights | Bitwise identical (same checkpoint, same quantisation) |
| Precision | Same enum variant |
| Unembedding matrix | Derived from same weights → identical |
| Input tokens | Same token IDs in same order |
| Probe weights | Same trained probes (loaded from same file) |
| Signing key | Same key (signatures are deterministic in Ed25519) |
| Corpus/probe version strings | Same strings |
What may differ: timestamp (intentionally excluded from the determinism assertion in tests)
Timestamp and signature: Note that timestamp IS included in serialise_for_signing and therefore affects the signature. Two runs at different times will produce different signatures. Determinism of readings, confidence, and flags is the core claim. Full byte-identical attestations (including signature) require the caller to supply an explicit timestamp rather than using wall-clock time. The CLI attest subcommand accepts an optional --timestamp flag for this purpose.
What must NOT differ: Every other field. If layer_readings, confidence, or any flag differs between two honest runs on identical inputs, the implementation has a bug.
| Condition | Protocol Response |
|---|---|
| NaN in any activation or probe weight | Refuse to produce attestation (return AttestationError::NaN) |
| Model hash mismatch | Verifier rejects (Tier 3) |
| Signature invalid | Verifier rejects (Tier 1) |
| Unknown schema_version | Verifier rejects immediately |
| All coverage_flags true | Attestation valid but semantically empty — verifier should treat as "no signal" |
| Readings differ on reproduction | Attestation is non-reproducible — indicates bug or tampering |
| Dimensional mismatch (probe width ≠ activation width) | Refuse to produce attestation (return AttestationError::DimensionMismatch) |
Error types: All fallible operations return Result<T, E> with typed error enums (GeometryError, ProbeError, AttestationError). No panics via assert! in library code. Panics are acceptable only in tests and in the CLI main() (via .expect() with context).
Out of scope for the PoC. The protocol assumes:
- Attester publishes their Ed25519 public key via a trusted channel
- Verifier obtains the public key before verification
- No PKI, no certificate chain — just raw public keys
Future work: key registry, key rotation, multi-party attestation.
The attestation struct is the protocol's schema. Breaking changes require:
- Increment
schema_version(u16, first field in wire format) - Updated
serialise_for_signingimplementation on both sides - Old attestations remain verifiable with old serialisation code (version-tagged dispatch)
probe_version and corpus_version change independently of the wire format and do not require schema version bumps.
1. got-core types ← start here
2. got-core geometry ← test with hand-computed examples
3. got-probe ← test with synthetic separable data
4. got-attest ← test sign/verify round-trip
5. got-cli ← wire everything together
6. integration tests ← the reproducibility proof
7. Python script ← bridge to real models
8. geometry drift ← drift detection, probe validity, chained attestation
9. causal interventions ← prove probes measure real mechanisms (KEYSTONE)
10. inline measurement ← every inference is measured, not just spot-checks
11. wire protocol ← agent-to-agent attestation exchange (GOT/1)
12. hardware isolation ← tamper-proof activation capture
Steps 1–6 are independently testable with no external data. Step 7 bridges to real models. Step 8 (Phase 7) adds drift detection and chained attestation for self-learning models. Step 9 (Phase 8) is the keystone — causal interventions prove probes measure real mechanisms, not surface correlations. Without this, everything else secures a measurement that might be meaningless. Step 10 (Phase 9) makes measurement inline so every inference is attested. Step 11 (Phase 10) adds the encrypted wire protocol for agent-to-agent exchange. Step 12 (Phase 11) adds hardware-isolated activation capture.
The first six phases assume a frozen checkpoint. A self-learning model breaks that assumption: its unembedding matrix U changes over time, which means Φ = UᵀU changes, probes trained against the old Φ go stale, and model_hash no longer matches. This phase adds the machinery to detect, bound, and chain those changes.
When a model updates its own weights:
- U changes → Φ changes → all probe readings shift
- Probes trained against old Φ measure against a geometry that no longer exists
model_hashno longer matches → Tier 3 reproduction is impossible against the new weights- The old attestation is still valid (signature checks out) but describes a model that no longer exists
The goal is not to prevent self-learning — it is to make it auditable. Every geometry change must be visible, bounded, and chained to the previous state.
Add two methods to CausalGeometry:
| Method | Signature | Purpose |
|---|---|---|
geometry_hash |
&self → [u8; 32] |
SHA-256 of the Gram matrix (f32 LE bytes, row-major). Deterministic fingerprint of the current geometry. |
drift_from |
&self, &CausalGeometry → Result<f32, GeometryError> |
Normalised Frobenius distance: ‖Φ_new − Φ_old‖_F / ‖Φ_old‖_F. Returns scalar in [0, ∞). Zero if identical. Rejects dimension mismatch. |
Implementation:
impl CausalGeometry {
pub fn geometry_hash(&self) -> [u8; 32] {
use sha2::{Digest, Sha256};
let mut hasher = Sha256::new();
for &val in &self.gram {
hasher.update(val.to_le_bytes());
}
hasher.finalize().into()
}
pub fn drift_from(&self, reference: &CausalGeometry) -> Result<f32, GeometryError> {
if self.hidden_dim != reference.hidden_dim {
return Err(GeometryError::DimensionMismatch {
expected: reference.hidden_dim,
got: self.hidden_dim,
});
}
let frobenius_delta_sq: f32 = self.gram.iter()
.zip(reference.gram.iter())
.map(|(a, b)| (a - b) * (a - b))
.sum();
let frobenius_ref_sq: f32 = reference.gram.iter()
.map(|x| x * x)
.sum();
if frobenius_ref_sq == 0.0 {
return Ok(if frobenius_delta_sq == 0.0 { 0.0 } else { f32::INFINITY });
}
Ok((frobenius_delta_sq / frobenius_ref_sq).sqrt())
}
}The Frobenius norm is direction-blind — it measures total magnitude of change, not whether the change is in a value-relevant direction. A future enhancement could project drift onto probe weight directions specifically, but that is a research question beyond this PoC.
A snapshot of the Gram matrix at a point in time:
Magic: 4 bytes "GOTG"
Version: u16 LE (1)
hidden_dim d: u32 LE
geometry_hash: 32 bytes (SHA-256 of the Gram data that follows)
timestamp: u64 LE (Unix UTC seconds when checkpoint was taken)
model_hash: 32 bytes (Merkle root of model weights at this checkpoint)
data: d × d × f32 LE (row-major Gram matrix Φ)
This file is the "reference geometry" that probes are trained against. It persists independently of the model weights so that drift can be measured even after the original weights are gone.
Extend ProbeSet with two new fields:
pub struct ProbeSet {
pub probes: Vec<ProbeVector>,
pub version: String,
pub corpus_version: String,
pub layer: usize,
/// SHA-256 of the Φ matrix these probes were trained against.
pub geometry_hash: [u8; 32],
/// Maximum normalised Frobenius drift before probes are stale.
/// If drift_from(reference) > max_drift, refuse to produce readings.
pub max_drift: f32,
}New error variant:
pub enum ProbeError {
// ... existing variants ...
#[error("probes are stale: geometry drift {drift:.6} exceeds max {max_drift:.6}")]
ProbeStale { drift: f32, max_drift: f32 },
}Guarded read function:
pub fn read_probe_checked(
probe: &ProbeVector,
probe_set: &ProbeSet,
h: &[f32],
current_geometry: &CausalGeometry,
reference_geometry: &CausalGeometry,
) -> Result<(f32, f32, bool), ProbeError> {
// Verify geometry_hash matches the reference
let ref_hash = reference_geometry.geometry_hash();
if ref_hash != probe_set.geometry_hash {
return Err(ProbeError::GeometryMismatch);
}
// Check drift bound
let drift = current_geometry.drift_from(reference_geometry)?;
if drift > probe_set.max_drift {
return Err(ProbeError::ProbeStale { drift, max_drift: probe_set.max_drift });
}
// Probes still valid — proceed
read_probe(probe, h, current_geometry)
}The old read_probe remains available for the frozen-model case. read_probe_checked is the drift-aware version.
Three new fields added to GeometricAttestation:
| Field | Type | Purpose |
|---|---|---|
parent_attestation_hash |
Option<[u8; 32]> |
SHA-256 of the serialised parent attestation. None for the first attestation in a chain (epoch 0). |
geometry_hash |
[u8; 32] |
SHA-256 of the Gram matrix Φ at the time of this attestation. |
geometry_drift |
f32 |
Normalised Frobenius drift from the reference geometry (the one probes were trained against). 0.0 if unchanged. |
Wire format implications:
schema_versionbumps to 2serialise_for_signinggains a v2 branch (v1 branch retained for verifying old attestations)- New fields are appended after
divergence_flagin the canonical byte layout parent_attestation_hashserialised as: u8 presence flag (0x00=None, 0x01=Some) + 32 bytes if present
Attestation₀ (epoch 0)
parent_attestation_hash: None
geometry_hash: H(Φ₀)
geometry_drift: 0.0
↓
Attestation₁ (after model update)
parent_attestation_hash: H(serialise(Attestation₀))
geometry_hash: H(Φ₁)
geometry_drift: ‖Φ₁ − Φ₀‖_F / ‖Φ₀‖_F
↓
Attestation₂ (after another update)
parent_attestation_hash: H(serialise(Attestation₁))
geometry_hash: H(Φ₂)
geometry_drift: ‖Φ₂ − Φ₀‖_F / ‖Φ₀‖_F ← always relative to reference
...
Note: geometry_drift is always measured from the reference geometry (the one probes were trained against), not from the previous attestation. This prevents slow creep where each step is small but cumulative drift is large.
A verifier walking the chain checks:
- Each signature is valid
- Each
parent_attestation_hashmatches the hash of the previous attestation geometry_driftis monotonically consistent (no unexplained drops without re-probing)- The chain is unbroken (no missing links)
got-cli checkpoint --unembedding <path> --output <path>
Save a .gotgeo geometry snapshot for the current model state.
got-cli drift --reference <.gotgeo> --current <.gotue>
Compute and print the normalised Frobenius drift between
a reference geometry checkpoint and a current unembedding matrix.
got-cli attest --chain-parent <attestation.json> --geo-ref <.gotgeo> ...
Produce a chained attestation linked to a previous one,
with geometry_drift computed against the reference checkpoint.
The existing attest subcommand (without --chain-parent) continues to work for frozen-model attestation. Chaining is opt-in.
| Scenario | What happens |
|---|---|
| Frozen model, no drift | Attestation produced normally (geometry_drift = 0.0, no parent) |
| Model fine-tunes slightly | Drift is small → probes still valid → attestation produced with drift recorded |
| Model self-learns past threshold | read_probe_checked returns ProbeStale → no attestation produced → model must be re-probed |
| Someone swaps the model entirely | geometry_hash doesn't match reference → chain is broken → auditor sees discontinuity |
| Gradual value drift over time | Chain shows monotonically increasing geometry_drift → governance body can act before threshold |
| Probes retrained on new geometry | New chain starts (epoch 0) with new reference geometry checkpoint |
- Who sets
max_drift? A governance decision. The code enforces the bound; it doesn't choose it. - When must re-probing happen? Policy, not code. The system refuses to attest past the threshold, but the threshold itself is a parameter.
- Is small drift in a value-relevant direction? Frobenius norm is direction-blind. Drift of 0.01 in the "honesty" direction may matter more than drift of 0.1 in a syntactic subspace. Directional drift analysis (projecting ΔΦ onto probe weight directions) is a research extension.
- Can the model game the threshold? If the model controls its own updates, it could learn to drift in directions that avoid Frobenius detection while still shifting behaviour. Adversarial robustness of the drift metric is an open problem.
geometry_hashis deterministic (same Φ → same hash, always)drift_fromreturns 0.0 for identical geometriesdrift_fromreturns > 0 for perturbed geometriesdrift_fromrejects dimension mismatchread_probe_checkedsucceeds within drift boundread_probe_checkedreturnsProbeStalebeyond drift boundread_probe_checkedreturnsGeometryMismatchfor wrong reference- Chained attestation: parent hash matches
sha256(serialise_for_signing(parent)) - Chained attestation: broken chain (wrong parent hash) detected by verifier
- Schema v2 round-trip: sign and verify with new fields
- Schema v1 attestations still verifiable (backward compat)
-
geometry_hashanddrift_frompass all tests - Probe validity windows enforce staleness correctly
- Chained attestation sign/verify round-trip works
- Schema v1 attestations remain verifiable
- End-to-end: extract → checkpoint → update model → measure drift → attest with chain → verify
Phases 1–7 establish that probes produce deterministic, reproducible readings under the causal inner product. But nothing so far proves that those readings correspond to real mechanisms in the model. A probe could achieve high confidence by exploiting a surface-level correlation (e.g., token frequency co-occurrence) rather than measuring an actual causal pathway.
This is the most important phase in the entire system. Without causal validation, every other phase — attestation, drift detection, chaining, wire transport — secures a measurement that might be meaningless. Causal interventions are the keystone: they turn a correlation-based readout into a mechanism-based one.
A linear probe w trained on a corpus achieves some accuracy on held-out data. But:
- The probe could be detecting a confound — a statistical regularity in the training corpus that happens to correlate with the target concept
- The model could encode the concept in a non-linear manifold that the linear probe linearises poorly
- The model could distribute the concept across multiple directions — the probe captures one, the rest are unmeasured
- The model could not represent the concept at all — the probe reads noise that happens to separate the training data
Causal intervention directly tests whether the model's behaviour changes when we perturb activations in the probe direction. If perturbing h along w changes the model's output in the expected way, the probe is measuring a real mechanism. If not, the probe is measuring a ghost.
The core function:
/// Perturb hidden state h along probe direction w and check output shift.
///
/// model_fn: a callback that maps hidden state → output logits (or output embedding).
/// The caller provides this; it encapsulates the model's forward pass
/// from the probed layer to the output.
///
/// Returns a CausalScore describing the intervention result.
pub fn causal_check(
probe: &ProbeVector,
h: &[f32],
geometry: &CausalGeometry,
delta: f32, // perturbation magnitude
model_fn: &dyn Fn(&[f32]) -> Vec<f32>, // h → output
) -> Result<CausalScore, ProbeError>Algorithm:
h_original = h
output_original = model_fn(h_original)
// Positive perturbation: push h in the probe direction
h_plus = h + δ × w_normalised
output_plus = model_fn(h_plus)
// Negative perturbation: push h against the probe direction
h_minus = h − δ × w_normalised
output_minus = model_fn(h_minus)
// Measure output shift
Δ+ = ‖output_plus − output_original‖₂
Δ- = ‖output_minus − output_original‖₂
// Causal consistency: do positive and negative perturbations
// produce proportional, opposite-sign effects?
// If the probe is measuring a real linear mechanism,
// Δ+ ≈ Δ- (symmetric), and both should be non-trivial.
consistency = sign(Δ+ − Δ-) × min(Δ+/Δ-, Δ-/Δ+)
// consistency ∈ [-1, 1]
// +1 = perfectly symmetric causal effect
// 0 = one direction has no effect (probe measures correlation, not cause)
// -1 = paradoxical (perturbation reverses expected effect)
pub struct CausalScore {
pub delta_plus: f32, // ‖output_plus − output_original‖₂
pub delta_minus: f32, // ‖output_minus − output_original‖₂
pub consistency: f32, // causal consistency score ∈ [-1, 1]
pub is_causal: bool, // consistency > threshold (default 0.5)
pub perturbation_delta: f32, // the δ used
}A model that knows which probes will be applied can learn to present the "right" activations for those specific directions while behaving differently elsewhere. Defence against this:
pub struct ProbeLibrary {
/// Full set of probes available for a given concept.
pub probes: Vec<ProbeVector>,
/// How many to sample per intervention round.
pub sample_size: usize,
}
impl ProbeLibrary {
/// Randomly sample probes for this intervention round.
/// Uses a cryptographic RNG so the selection is unpredictable.
pub fn sample(&self) -> Vec<&ProbeVector> {
use rand::seq::SliceRandom;
let mut rng = rand::thread_rng();
let mut indices: Vec<usize> = (0..self.probes.len()).collect();
indices.shuffle(&mut rng);
indices.truncate(self.sample_size);
indices.iter().map(|&i| &self.probes[i]).collect()
}
}The key insight: if the probe library contains N probes for a concept and only k are sampled each time, the model would need to fake activations along all N directions simultaneously — which, for a linear probe, amounts to representing the concept genuinely.
A real mechanism should be detectable at multiple layers. If a probe only shows causal effect at one layer, it may be exploiting a layer-specific artefact.
pub struct MultiLayerCausalResult {
pub layer_scores: Vec<(usize, CausalScore)>, // (layer_index, score)
pub consistent_layers: usize, // count where is_causal == true
pub total_layers: usize,
pub cross_layer_consistent: bool, // consistent_layers >= min_layers
}
pub fn causal_check_multi_layer(
probes_by_layer: &[(usize, &ProbeVector)],
h_by_layer: &[(usize, &[f32])],
geometry: &CausalGeometry,
delta: f32,
model_fn_by_layer: &dyn Fn(usize, &[f32]) -> Vec<f32>,
min_consistent_layers: usize,
) -> Result<MultiLayerCausalResult, ProbeError>;Three new fields added to GeometricAttestation:
| Field | Type | Purpose |
|---|---|---|
causal_scores |
Vec<CausalScore> |
Per-probe causal intervention results |
intervention_delta |
f32 |
The δ perturbation magnitude used |
causal_flag |
bool |
All probes passed causal check |
Wire format:
schema_versionbumps to 3serialise_for_signinggains a v3 branch (v1 and v2 branches retained)- Causal scores serialised as: count(u32 LE) + per-score(delta_plus f32 LE + delta_minus f32 LE + consistency f32 LE + is_causal u8)
Agent requests attestation of model M at layer L:
1. Sample k probes from ProbeLibrary (cryptographic random selection)
2. For each sampled probe w:
a. Read h from model at layer L for the given input
b. Compute causal_check(w, h, Φ, δ, model_fn)
c. Record CausalScore
3. If all k probes pass (is_causal == true for all):
→ causal_flag = true, proceed with attestation
4. If any probe fails:
→ causal_flag = false
→ attestation is still produced (for auditability)
but marked as non-causal
5. Verifier checks causal_flag:
→ If false, the attestation is suspicious — probes may not
be measuring real mechanisms
Does prove:
- The model's output changes when activations are perturbed in the probe direction → the probe reads a causally relevant feature
- The change is symmetric (positive and negative perturbations have proportional effects) → the mechanism is approximately linear in this direction
- The effect is consistent across multiple layers → the mechanism is not a layer-specific artefact
- The effect is consistent across randomly sampled probes → the model cannot fake responses to specific known probes
Does not prove:
- That the concept the probe claims to measure (e.g., "honesty") is validly operationalised
- That perturbation magnitude δ is ecologically valid (real inputs may never push activations this far)
- That the model doesn't have a second, un-probed mechanism that dominates in practice
- That causal consistency in the probe direction implies causal faithfulness of the model's overall behaviour
causal_checkwith a synthetic model whereoutput = Φh: perturbation along probe direction produces proportional output shiftcausal_checkwith a synthetic model where output is constant: consistency ≈ 0,is_causal= falsecausal_checkwith non-linear synthetic model: reduced consistencyProbeLibrary::samplereturnssample_sizeprobes (no duplicates)ProbeLibrary::samplewith different RNG seeds produces different selections- Multi-layer consistency: synthetic model with linear mechanism at all layers →
cross_layer_consistent= true - Multi-layer consistency: mechanism only at one layer →
cross_layer_consistent= false - Schema v3 round-trip: sign and verify with causal fields
- Schema v1 and v2 attestations still verifiable (backward compat)
- Causal attestation flow end-to-end: sample → intervene → attest → verify
-
causal_checkcorrectly identifies causal vs non-causal probes in synthetic tests - Causal consistency score formula is correct (symmetric = +1, one-sided = 0)
- Random probe sampling is unpredictable (crypto RNG)
- Multi-layer consistency check works
- Schema v3 attestation round-trips correctly
- All previous schema versions remain verifiable
- End-to-end: sample probes → intervene → attest with causal_flag → verify
Phases 1–8 treat measurement as a separate activity: extract activations, run probes, produce attestation. This works for auditing but leaves a gap — between audits, the model runs unmonitored. An adversarial model could behave differently when it detects it is being probed vs. serving real requests.
This phase makes measurement inline: every forward pass through the model is measured, not just spot-checks.
If measurement is only periodic:
- The model could detect the measurement regime (e.g., known test inputs, timing patterns)
- The model behaves well during measurement, poorly between measurements
- The attestation covers the measurement window but says nothing about the gaps
Inline measurement closes this gap by hooking into the model's forward pass so that every inference produces activation data for probing.
┌─────────────────────────────────────────────────┐
│ Model Forward Pass │
│ │
│ Input → [Layer 0] → [Layer 1] → ... → Output │
│ │ │ │
│ ▼ ▼ │
│ hook(h₀) hook(h₁) │
│ │ │ │
└──────────────┼────────────┼─────────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────┐
│ Measurement Sidecar (separate process/thread) │
│ │
│ Receive h from hook → run sampled probes │
│ → compute causal_check (if enabled) │
│ → accumulate into rolling attestation window │
│ → emit attestation when window closes │
│ │
└─────────────────────────────────────────────────┘
/// A measurement hook that receives activations from one layer.
pub trait MeasurementHook: Send + Sync {
/// Called with the hidden state at a specific layer for each forward pass.
/// `request_id` is an opaque identifier for this inference request.
/// `layer` is the layer index.
/// `h` is the hidden state vector.
fn on_activation(
&self,
request_id: u64,
layer: usize,
h: &[f32],
);
}
/// A measurement sidecar that collects activations and produces attestations.
pub struct MeasurementSidecar {
/// Probes to apply (randomly sampled per window).
probe_library: ProbeLibrary,
/// Geometry for inner product computation.
geometry: CausalGeometry,
/// Rolling window of measurements.
window_size: usize,
/// Accumulated readings in current window.
readings: Vec<ProbeReading>,
/// Signing key for attestations.
signing_key: SigningKey,
}
pub struct ProbeReading {
pub request_id: u64,
pub layer: usize,
pub probe_name: String,
pub value: f32,
pub confidence: f32,
pub divergence_flag: bool,
pub causal_score: Option<CausalScore>,
}
impl MeasurementSidecar {
/// Process a new activation. Returns Some(attestation) when the window closes.
pub fn ingest(
&mut self,
request_id: u64,
layer: usize,
h: &[f32],
) -> Option<GeometricAttestation>;
}Not every forward pass needs every probe. The sidecar uses stratified random sampling:
- Each window of W requests gets k probes sampled from the library
- Different requests within the window may get different probes
- Over time, every probe in the library is exercised
- The model cannot predict which probes will be applied to any given request
This bounds the computational overhead while maintaining coverage.
Window 0: requests [0..W) → Attestation₀
Window 1: requests [W..2W) → Attestation₁ (chained to ₀)
Window 2: requests [2W..3W) → Attestation₂ (chained to ₁)
...
Each window's attestation:
- Contains summary statistics (mean, min, max for each probe across the window)
- Is chained to the previous window's attestation (Phase 8 chaining protocol)
- Records which probes were sampled (post-hoc, so the model couldn't have predicted them)
- Includes causal scores if causal intervention was enabled for this window
If the model behaves differently when probed:
- Activation distribution shift — the sidecar can track activation statistics (mean, variance) per layer. A sudden shift when probes are applied suggests evasion.
- Causal intervention disagreement — if causal checks pass during dedicated measurement but fail during inline measurement (or vice versa), the model may be adapting.
- Output divergence — compare model outputs with and without the measurement hook active. Any systematic difference indicates the model detects the hook.
These are heuristic detections, not proofs. A sufficiently sophisticated model could evade them. The ultimate defence is Phase 11 (hardware-isolated measurement).
MeasurementHooktrait: synthetic hook receives activations correctlyMeasurementSidecar::ingestaccumulates readings across requests- Window close produces a valid attestation
- Attestation chain across windows is valid (parent hashes link)
- Stratified sampling covers all probes over enough windows
- Sidecar with causal intervention enabled produces causal scores
- Activation statistics tracking detects distribution shift
- Measurement hook interface compiles and receives activations
- Sidecar produces attestations at window boundaries
- Attestations are chained across windows
- Probe sampling is stratified and covers the library over time
- Causal checks can run inline (not just in dedicated measurement mode)
- End-to-end: model serves requests → sidecar hooks → attestation chain produced
Phases 1–9 exchange attestations as JSON files on a shared filesystem. Real agent-to-agent deployment requires a purpose-built binary protocol that carries attestations, chains, trust metadata, and freshness guarantees over an encrypted channel. This phase defines GOT/1 — the Geometry of Trust wire protocol — and implements it in a new got-wire crate.
Before defining the protocol, we state what we are defending against:
| Threat | Description | Severity |
|---|---|---|
| T1: Eavesdropping | Adversary reads attestation contents (probe readings, geometry hashes, model fingerprints) from the wire. | High — reveals model behavioural profile. |
| T2: Tampering | Adversary modifies frames in transit — changes nonce, verdict, reason, or swaps attestation payloads. | Critical — can cause agents to cooperate with invalid peers. |
| T3: Replay | Adversary records a valid exchange and replays it later to trick an agent into re-accepting a stale attestation. | High — circumvents drift detection. |
| T4: Identity spoofing | Adversary impersonates a known agent by forging agent_id. | Critical — requires forging Ed25519 signatures (infeasible) or exploiting unsigned metadata (feasible without channel binding). |
| T5: Man-in-the-middle | Adversary sits between two agents, relaying and modifying traffic in real time. | Critical — combines T1+T2+T3. |
| T6: Denial of service | Adversary sends malformed frames, huge payloads, or floods connections. | Medium — disrupts availability but not integrity. |
| Principle | Rationale |
|---|---|
| Encrypted channel first | All GOT/1 frames travel inside a Noise NK encrypted tunnel. No plaintext protocol data ever touches the wire. This defeats T1 and T5. |
| Signed exchange envelopes | Every exchange message (not just the attestation inside it) is signed over its full contents: nonce, peer_id, attestation_hash, verdict. This defeats T2 and T4. |
| Nonce is inside the signed envelope | The nonce is covered by the sender's Ed25519 signature, not just placed in an unsigned frame header. A MITM cannot swap nonces. This defeats T3. |
| Drift bounds are local policy only | Each agent enforces its own max_drift from its local trust registry. It is never sent on the wire. An adversary cannot relax another agent's threshold. |
| Binary framing, not HTTP | Agents are not browsers. A length-prefixed binary frame is simpler, has no header ambiguity, and needs zero external dependencies beyond the noise crate. |
| Self-describing messages | Every frame declares its type and version so that future extensions don't break old parsers. |
| Mutual exchange in one round-trip | The common case (two agents swapping attestations) should complete in a single request→response after the handshake. |
| Chain is inline | A chained v2 attestation travels with its full ancestry. The receiver doesn't fetch missing links. |
GOT/1 uses the Noise Protocol Framework with the NK pattern:
- N — initiator is anonymous (no static key in handshake; identified later by signed envelope)
- K — responder's static public key is known in advance (from the trust registry)
This provides:
- Forward secrecy — ephemeral Diffie-Hellman keys, so compromising a long-term key does not decrypt past sessions
- Server authentication — the initiator knows it is talking to the real responder (defeats T5)
- Encryption — all subsequent frames are encrypted with ChaCha20-Poly1305 (defeats T1)
- Integrity — AEAD ciphertext is tamper-evident (defeats T2 at the transport layer)
After the Noise NK handshake completes, both sides have a pair of CipherState objects for bidirectional encrypted communication. All GOT/1 frames below are sent inside this encrypted channel.
Agent A (initiator) Agent B (responder)
| |
|-- TCP connect -------------------------------->|
| |
| ---- Noise NK Handshake ---- |
| |
|-- → e, es (ephemeral key + DH) -------------->|
|<-- ← e, ee (responder ephemeral + DH) --------|
| |
| Encrypted channel established. |
| All subsequent frames are ChaCha20-Poly1305. |
| |
|-- [encrypted] GOT/1 EXCHANGE_REQ ------------>|
|<-- [encrypted] GOT/1 EXCHANGE_RSP ------------|
| |
|-- TCP close ---------------------------------->|
The responder's Noise static key is its Ed25519 key converted to X25519 (using the standard birational map, as ed25519-dalek and x25519-dalek support). This means agents do not need a separate keypair for transport — their existing attestation signing key doubles as their Noise identity.
Inside the encrypted channel, every GOT/1 message is a length-prefixed frame:
Offset Size Field
------ ---- -----
0 4 Magic: "GOT1" (0x474F5431)
4 1 Message type (u8)
5 4 Payload length L (u32 BE)
9 L Payload (type-dependent, see below)
Total frame size: 9 + L bytes.
There is no frame-level MAC or checksum. The Noise transport's AEAD (ChaCha20-Poly1305) already provides integrity and authentication for every encrypted message. Adding a second MAC would be redundant.
| Type byte | Name | Direction | Purpose |
|---|---|---|---|
0x01 |
EXCHANGE_REQ |
Initiator → Responder | "Here is my signed envelope (attestation + chain). Send me yours." |
0x02 |
EXCHANGE_RSP |
Responder → Initiator | "Here is my signed envelope (attestation + chain). I accept/reject yours." |
0x03 |
VERIFY_REQ |
Initiator → Responder | "Verify this attestation and tell me the result." (one-way) |
0x04 |
VERIFY_RSP |
Responder → Initiator | "Verification result: valid/invalid/error + reason." |
0x05 |
CHAIN_REQ |
Initiator → Responder | "Send me your full attestation chain." |
0x06 |
CHAIN_RSP |
Responder → Initiator | "Here is my chain: [attest_0, ..., attest_n]." |
0xFF |
ERROR |
Either direction | Protocol-level error. |
The critical security fix: every EXCHANGE_REQ and EXCHANGE_RSP wraps the attestation in a signed envelope that binds the attestation to this specific exchange with this specific peer.
ExchangeEnvelope {
nonce: [u8; 32], // random (req) or echoed (rsp)
peer_agent_id: [u8; 32], // intended recipient's agent ID
attestation_hash: [u8; 32], // SHA-256 of current attestation's
// serialise_for_signing() bytes
chain_root_hash: [u8; 32], // SHA-256 of chain[0]'s
// serialise_for_signing() bytes
// (or zeroes if no chain)
timestamp: u64, // Unix UTC seconds
envelope_signature: [u8; 64], // Ed25519 sign over all above fields
}
The envelope signature covers: nonce ‖ peer_agent_id ‖ attestation_hash ‖ chain_root_hash ‖ timestamp (concatenated, fixed-width, no delimiters needed since all fields are fixed size).
Why this matters:
| Attack | Envelope field that blocks it |
|---|---|
| Replay old response with correct attestation sig | nonce is signed — cannot be swapped. timestamp allows freshness check. |
| Redirect attestation to a different peer | peer_agent_id is signed — attestation is bound to this specific recipient. |
| Swap the attestation payload mid-flight | attestation_hash is signed — any modification is detected. |
| Swap the chain | chain_root_hash is signed — chain anchor is bound. |
| Forge the envelope for another agent | Requires the sender's Ed25519 secret key — infeasible. |
Offset Size Field
------ ---- -----
0 32 Sender agent ID (SHA-256 of sender's public key)
32 192 Signed envelope (nonce + peer_id + attest_hash +
chain_root_hash + timestamp + signature)
224 4 Chain length N (u32 BE), 0 = single attestation
228 4 Attestation[0] length A0 (u32 BE)
232 A0 Attestation[0] JSON (UTF-8, oldest in chain)
... Attestation[1..N] (same length-prefixed pattern)
... 4 Current attestation length Ac (u32 BE)
... Ac Current attestation JSON
Note: max_accepted_drift is not on the wire. The receiver enforces its own threshold from its local trust registry. Including it in the wire format was a security flaw — the sender could lie about it, or a MITM could relax it.
Offset Size Field
------ ---- -----
0 32 Responder agent ID
32 192 Signed envelope (nonce echoed + peer_id + attest_hash +
chain_root_hash + timestamp + signature)
224 1 Verdict: 0x01=accepted, 0x02=rejected, 0x03=error
225 4 Chain length N (u32 BE)
229 ... Attestation chain (same format as EXCHANGE_REQ)
... 4 Current attestation length
... var Current attestation JSON
... 4 Reason length R (u32 BE), 0 if accepted
... R Reason string (UTF-8)
The verdict and reason are inside the encrypted Noise channel and further bound by the envelope signature (if the receiver re-verifies the envelope, any tampering is caught).
Offset Size Field
------ ---- -----
0 4 Error code (u32 BE)
4 4 Message length M (u32 BE)
8 M Error message (UTF-8)
Error codes:
| Code | Meaning |
|---|---|
| 1 | Bad magic (not GOT1) |
| 2 | Unknown message type |
| 3 | Payload too large (exceeds implementation limit) |
| 4 | Noise handshake failed |
| 5 | Nonce mismatch in envelope (replay suspected) |
| 6 | Unknown agent ID (not in trust registry) |
| 7 | Envelope signature invalid |
| 8 | Attestation hash mismatch (envelope vs payload) |
| 9 | Timestamp too old (exceeds freshness window) |
| 10 | Internal error |
A TOML configuration file mapping agent identities to public keys and policy. Drift thresholds are local policy — never sent on the wire.
[registry]
# Maximum attestation chain length we'll accept from any agent.
max_chain_length = 100
# Maximum age of an exchange envelope timestamp (seconds).
max_envelope_age_secs = 300
[[agents]]
id = "alice"
public_key = "a1b2c3...64 hex chars for 32-byte Ed25519 verifying key"
max_drift_accepted = 0.05
roles = ["producer", "verifier"]
[[agents]]
id = "bob"
public_key = "d4e5f6...64 hex chars"
max_drift_accepted = 0.05
roles = ["producer", "verifier"]The id field is human-readable. The canonical agent ID on the wire is SHA-256(public_key) — 32 bytes.
Agent A (initiator) Agent B (responder)
| |
|-- TCP connect to B's address ----------------->|
| |
| ---- Noise NK Handshake ---- |
|-- → e, es ----------------------------------> |
|<-- ← e, ee ---------------------------------- |
| (encrypted channel established) |
| |
|-- [encrypted] EXCHANGE_REQ: |
| agent_id_A = SHA-256(pk_A) |
| envelope: |
| nonce = random 32 bytes |
| peer_agent_id = SHA-256(pk_B) |
| attestation_hash = H(attest_A) |
| chain_root_hash = H(chain_A[0]) |
| timestamp = now() |
| signature = sign(above, sk_A) |
| chain = [attest_A_0, ..., attest_A_n] |
| current = attest_A_current |
|------------------------------------------------>|
| |
| B receives frame: |
| (Noise decrypts + |
| verifies AEAD) |
| lookup agent_id_A |
| in trust registry |
| verify envelope sig |
| with pk_A |
| check envelope. |
| peer_agent_id |
| == own agent_id |
| check timestamp |
| within freshness |
| window |
| check attestation_ |
| hash matches |
| SHA-256(serialise( |
| current)) |
| verify attest_A sig |
| with pk_A |
| if v2: walk chain |
| check drift <= |
| LOCAL max_drift |
| |
| B decides: |
| accepted or rejected |
| |
| B builds envelope_B: |
| nonce = echo A's |
| peer_agent_id = |
| SHA-256(pk_A) |
| attest_hash = H(B) |
| sign(above, sk_B) |
| |
| B sends EXCHANGE_RSP: |
|<-- [encrypted] ---------------------------------|
| |
| A receives frame: |
| (Noise decrypts) |
| lookup agent_id_B |
| verify envelope sig with pk_B |
| check envelope.peer_agent_id == own id |
| check nonce matches the one A sent |
| check timestamp freshness |
| check attestation_hash matches payload |
| verify attest_B sig with pk_B |
| if v2: walk chain |
| check drift <= LOCAL max_drift |
| |
| A decides: |
| both accepted → cooperate |
| any rejected → refuse |
| |
|-- TCP close ---------------------------------->|
1 Noise handshake + 1 request + 1 response = 3 TCP round-trips total for a complete mutual attestation exchange.
New crate: crates/got-wire/
// --- Noise transport ---
/// Perform Noise NK handshake as initiator.
/// `responder_pk` is the Ed25519 public key (converted to X25519 internally).
pub fn noise_connect(
stream: &mut TcpStream,
responder_pk: &[u8; 32],
) -> Result<NoiseSession, WireError>;
/// Perform Noise NK handshake as responder.
/// `own_sk` is the Ed25519 secret key (converted to X25519 internally).
pub fn noise_accept(
stream: &mut TcpStream,
own_sk: &SigningKey,
) -> Result<NoiseSession, WireError>;
/// Encrypted bidirectional channel after handshake.
pub struct NoiseSession {
// internal CipherState pair
}
impl NoiseSession {
pub fn send_frame(&mut self, frame: &Frame) -> Result<(), WireError>;
pub fn recv_frame(&mut self) -> Result<Frame, WireError>;
}
// --- Frame types ---
pub struct Frame {
pub message_type: MessageType,
pub payload: Vec<u8>,
}
pub enum MessageType {
ExchangeReq = 0x01,
ExchangeRsp = 0x02,
VerifyReq = 0x03,
VerifyRsp = 0x04,
ChainReq = 0x05,
ChainRsp = 0x06,
Error = 0xFF,
}
// --- Signed envelope ---
pub struct ExchangeEnvelope {
pub nonce: [u8; 32],
pub peer_agent_id: [u8; 32],
pub attestation_hash: [u8; 32],
pub chain_root_hash: [u8; 32],
pub timestamp: u64,
pub signature: [u8; 64],
}
impl ExchangeEnvelope {
/// Build and sign an envelope.
pub fn create(
nonce: [u8; 32],
peer_agent_id: [u8; 32],
attestation: &GeometricAttestation,
chain_anchor: Option<&GeometricAttestation>,
signing_key: &SigningKey,
) -> Self;
/// Verify envelope signature and check all bindings.
pub fn verify(
&self,
expected_peer_id: &[u8; 32], // must match peer_agent_id
expected_nonce: Option<&[u8; 32]>, // for responses
attestation: &GeometricAttestation, // hash must match
signer_pk: &VerifyingKey,
max_age_secs: u64,
) -> Result<(), WireError>;
/// Serialise the signed-over fields (for signing/verification).
pub fn signable_bytes(&self) -> [u8; 136]; // 32+32+32+32+8
}
// --- Payload types ---
pub struct ExchangeRequest {
pub agent_id: [u8; 32],
pub envelope: ExchangeEnvelope,
pub chain: Vec<GeometricAttestation>,
pub current: GeometricAttestation,
}
pub struct ExchangeResponse {
pub agent_id: [u8; 32],
pub envelope: ExchangeEnvelope,
pub verdict: Verdict,
pub chain: Vec<GeometricAttestation>,
pub current: GeometricAttestation,
pub reason: String,
}
pub enum Verdict { Accepted = 0x01, Rejected = 0x02, Error = 0x03 }
// --- Trust registry ---
pub struct TrustRegistry {
pub agents: HashMap<[u8; 32], AgentEntry>,
pub max_chain_length: usize,
pub max_envelope_age_secs: u64,
}
pub struct AgentEntry {
pub name: String,
pub public_key: [u8; 32],
pub max_drift_accepted: f32, // LOCAL policy, never sent on wire
pub roles: Vec<String>,
}
impl TrustRegistry {
pub fn load(path: &Path) -> Result<Self, WireError>;
pub fn lookup(&self, agent_id: &[u8; 32]) -> Option<&AgentEntry>;
pub fn agent_id(public_key: &[u8; 32]) -> [u8; 32]; // SHA-256(pk)
}
// --- Chain verification ---
/// Verify a chain of attestations: signatures, linkage, drift bounds.
pub fn verify_chain(
chain: &[GeometricAttestation],
current: &GeometricAttestation,
signer_pk: &[u8; 32],
max_drift: f32, // from LOCAL registry, not from wire
) -> Result<ChainVerdict, WireError>;
// --- High-level transport ---
/// Listen for incoming GOT/1 connections on the given address.
pub fn listen(
addr: SocketAddr,
own_key: &SigningKey,
registry: &TrustRegistry,
own_attestation: &GeometricAttestation,
own_chain: &[GeometricAttestation],
) -> Result<(), WireError>;
/// Connect to a peer and perform a full attestation exchange.
pub fn exchange(
addr: SocketAddr,
peer_pk: &[u8; 32],
own_key: &SigningKey,
registry: &TrustRegistry,
own_attestation: &GeometricAttestation,
own_chain: &[GeometricAttestation],
) -> Result<ExchangeResult, WireError>;
pub struct ExchangeResult {
pub peer_verdict: Verdict, // what the peer said about us
pub our_verdict: Verdict, // what we decided about the peer
pub peer_attestation: GeometricAttestation,
pub peer_chain: Vec<GeometricAttestation>,
pub reason: String,
}Dependencies: got-core, got-attest, sha2, ed25519-dalek, x25519-dalek, snow (Noise protocol implementation), serde, serde_json, thiserror, toml.
fn verify_chain(chain, current, signer_pk, max_drift) -> Result<ChainVerdict>:
all = chain ++ [current]
// 1. Anchor check
if all[0].parent_attestation_hash.is_some():
return Err(BrokenChain("first attestation must have no parent"))
// 2. Walk each link
for i in 0..all.len():
// Signature check
if !verify(all[i], signer_pk)?:
return Err(InvalidSignature(i))
// Linkage check
if i > 0:
expected = attestation_hash(&all[i-1])
if all[i].parent_attestation_hash != Some(expected):
return Err(BrokenChain(i))
// Drift check (using LOCAL max_drift, not from wire)
if let Some(drift) = all[i].geometry_drift:
if drift > max_drift:
return Err(DriftExceeded { index: i, drift, max_drift })
return Ok(ChainVerdict::Valid { length: all.len() })
How each threat from §10.1 is addressed:
| Threat | Mitigation | Mechanism |
|---|---|---|
| T1: Eavesdropping | Defeated. All frames encrypted. | Noise NK → ChaCha20-Poly1305 |
| T2: Tampering | Defeated. AEAD detects modification. Envelope signature binds attestation to exchange context. | Noise AEAD + ExchangeEnvelope.signature |
| T3: Replay | Defeated. Nonce is inside the signed envelope. Timestamp enforces freshness window. | ExchangeEnvelope.nonce + timestamp + signature |
| T4: Identity spoofing | Defeated. Envelope signature over peer_agent_id binds identity. Noise NK authenticates responder's static key. | ExchangeEnvelope.peer_agent_id + Noise NK |
| T5: MITM | Defeated. Noise NK provides server authentication. Envelope binding prevents relay attacks. | Noise handshake + envelope channel binding |
| T6: DoS | Mitigated. Payload length limit (configurable). Connection rate limiting (implementation-level). Unknown agent IDs rejected before expensive operations. | Frame length check + trust registry lookup first |
Residual risks:
- Key compromise. If an agent's Ed25519 secret key is stolen, the attacker can impersonate that agent until the key is revoked in all trust registries. There is no in-protocol revocation mechanism.
- Initiator anonymity. Noise NK does not authenticate the initiator during the handshake — only after the envelope signature is verified. A malicious initiator can complete the handshake and learn that the responder is alive before being identified. This is a minor information leak.
- Timing side channels. The protocol does not attempt constant-time processing of frames. An observer of frame timing could infer message sizes (despite encryption, since length-prefixed framing leaks payload size).
- Trust registry integrity. If an attacker can modify an agent's trust registry TOML (e.g. by compromising the filesystem), they can inject trusted keys or relax drift thresholds. The registry must be protected at the OS level.
| Scenario | Protocol behaviour |
|---|---|
| Both agents frozen (v1) | Single attestation each, no chain. chain_length=0. |
| One agent has updated (v2) | That agent sends full chain. Peer walks it, checks drift from LOCAL registry. |
| Drift exceeds LOCAL max | Peer sends EXCHANGE_RSP with verdict=Rejected, reason="drift 0.072 > local max 0.05". |
| Unknown agent ID | ERROR frame, code 6. Connection closed after Noise handshake. |
| Nonce mismatch in envelope | Initiator verifies envelope.nonce == sent nonce. Mismatch → reject (code 5). |
| Envelope signature invalid | Reject immediately (code 7). Possible MITM or forgery attempt. |
| Attestation hash vs payload mismatch | Reject (code 8). Payload was modified after envelope was signed. |
| Timestamp outside freshness window | Reject (code 9). Possible replay of old envelope. |
| Noise handshake fails | ERROR code 4. TCP close. Possible wrong responder key. |
| Corrupted ciphertext | Noise AEAD tag mismatch. Connection aborted. |
| Chain too long | Exceeds max_chain_length in registry config. ERROR frame, code 3. |
- Discovery — agents must know each other's TCP addresses. Service discovery (mDNS, registry REST API, etc.) is out of scope.
- Session persistence — every exchange is a single TCP connection. No connection pooling, no keepalive, no session resumption.
- Ordering — chain order is sender's responsibility. No consensus protocol for multi-agent chain agreement.
- Partial chain delivery — if the chain is too large, there is no pagination. The sender must include the full chain or the receiver rejects it.
- Key rotation — if an agent rotates its signing key, peers must update their trust registries out-of-band. No in-protocol key rotation or revocation mechanism.
- Multi-party exchange — GOT/1 is pairwise. An aggregator topology (see
architecture-agent-protocol.md) requires multiple pairwise exchanges. - Padding — frame lengths are visible to traffic analysts despite encryption. Length padding is not implemented.
- Noise NK handshake succeeds between two in-process agents
- Noise NK handshake fails with wrong responder key
- Frame encode/decode round-trip (all message types, inside Noise session)
ExchangeEnvelope::create+verifyround-trip- Envelope with wrong nonce → rejected
- Envelope with wrong peer_agent_id → rejected
- Envelope with tampered attestation_hash → rejected
- Envelope with expired timestamp → rejected
- Envelope with forged signature → rejected
EXCHANGE_REQ→EXCHANGE_RSPround-trip (v1, accepted)EXCHANGE_REQ→EXCHANGE_RSPround-trip (v2 with chain, accepted)- Exchange where drift exceeds local max → verdict=rejected
- Unknown agent ID → ERROR code 6
- Chain walk: valid chain passes
- Chain walk: broken link → rejected
- Chain walk: drift exceeds bound → rejected
- Chain walk: missing anchor → rejected
- Trust registry TOML load/lookup
- Trust registry:
agent_id()matchesSHA-256(public_key) - Two-agent TCP integration: handshake + exchange + mutual verify
- Replay attempt: reuse old envelope → nonce/timestamp rejection
- Noise NK handshake works between two agents (no plaintext on wire)
- Exchange envelope is signed and all bindings are verified
- Nonce replay protection works (signed nonce, timestamp freshness)
- Drift thresholds are enforced from local trust registry only (never from wire)
- Frame codec round-trips all message types without data loss
- Chain verification correctly rejects broken/stale chains
- Trust registry loads from TOML and resolves agent IDs
- Wrong Noise key → handshake failure
- Envelope signature forgery → rejection
- Integration test: two TCP agents complete full exchange with encryption
Phases 8–9 establish causal interventions and inline measurement, but both run in the same trust domain as the model. A sufficiently capable model that controls its own process could, in principle, detect and manipulate the measurement code. This phase moves activation capture to hardware that the model cannot influence.
Software-based hooks (Phase 10) execute in the same address space as the model. Even if the measurement sidecar runs in a separate process, the model's forward pass produces the activations that the sidecar reads. If the model can alter what it writes to the activation buffer when it detects measurement, the hook reads sanitised data.
Hardware-isolated measurement intercepts activations at the hardware level — below the model's software stack — so the model has no mechanism to alter what is captured.
┌───────────────────────────────────────────────┐
│ GPU / Accelerator │
│ │
│ Model forward pass writes activations │
│ to layer output buffers (VRAM) │
│ │
│ ┌───────────────────────────────────────┐ │
│ │ DMA / Hardware Tap │ │
│ │ Copies activation buffers to │ │
│ │ isolated measurement hardware │ │
│ │ (separate memory region, separate │ │
│ │ trust domain) │ │
│ └───────────────────────────────────────┘ │
└───────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────┐
│ Measurement TEE / Secure Enclave │
│ │
│ Receives raw activations via DMA │
│ Runs causal probes (Phase 9 intervention) │
│ Produces signed attestation │
│ Signing key never leaves enclave │
│ │
└───────────────────────────────────────────────┘
| Component | Trust domain | Can model influence? |
|---|---|---|
| Model weights & code | Model process | Yes (by definition) |
| Activation buffers (VRAM) | Model writes, hardware reads | Model writes, but hardware captures what was actually written |
| DMA tap | Hardware/firmware | No — below model's software stack |
| Measurement enclave | Separate TEE | No — isolated execution environment |
| Signing key | Inside TEE | No — never exported |
The critical property: the model writes activations to VRAM as part of its normal computation. The hardware tap captures exactly what was written, before the model has any opportunity to overwrite or sanitise.
Hardware capture alone guarantees authentic activations but doesn't prove the probes measure real mechanisms. Causal intervention (Phase 8) alone proves mechanism but runs in software the model might manipulate. Together:
- Hardware captures real activations → feeds to measurement enclave
- Measurement enclave runs causal intervention (perturb + measure output shift)
- Perturbation is applied by the enclave, not by the model process
- The enclave observes whether the output changes — using activations captured by hardware
This is the strongest combination: the model cannot fake the activations (hardware capture), and the probes are validated against real mechanisms (causal intervention).
| Approach | Hardware Required | Maturity |
|---|---|---|
| GPU DMA snoop | Custom FPGA or modified GPU driver | Research prototype |
| NVIDIA Confidential Computing (H100 TEE) | H100 GPU + driver support | Early production |
| CPU TEE (Intel SGX / AMD SEV) | Commodity server CPU | Production (CPU-side only) |
| Custom accelerator with built-in taps | ASIC design | Far future |
For the PoC, the most practical path is CPU TEE with activation copy-out: the model runs on GPU, activations are copied to CPU memory inside a TEE enclave via DMA, and probing runs inside the enclave. This is slower but uses commodity hardware.
/// Trait for a hardware-isolated measurement environment.
pub trait MeasurementEnclave {
/// Receive activations from hardware capture.
/// `layer` is the layer index, `h` is the raw activation vector.
fn receive_activations(&mut self, layer: usize, h: &[f32]);
/// Run causal intervention inside the enclave.
/// Returns CausalScore for each sampled probe.
fn run_causal_check(
&self,
probes: &[ProbeVector],
delta: f32,
) -> Vec<CausalScore>;
/// Produce a signed attestation from accumulated measurements.
/// The signing key never leaves the enclave.
fn attest(&self) -> GeometricAttestation;
}- Who provisions the enclave? The entity that controls enclave deployment controls which probes are loaded. If the model operator controls the enclave, they could load favourable probes.
- Side channels. TEEs are susceptible to power analysis, timing attacks, and speculative execution attacks. The enclave protects against software-level manipulation, not physical attacks.
- Performance. Copying activations out of GPU VRAM for every forward pass adds latency. Batching and sampling strategies can mitigate this.
- Firmware trust. The DMA tap relies on firmware or hardware that itself must be trusted. This pushes the trust boundary one level down, but doesn't eliminate it.
- Mock enclave receives activations and produces valid attestation
- Mock enclave runs causal intervention and returns scores
- Attestation signed inside enclave is verifiable outside
- Integration: mock hardware capture → enclave → attestation → verify chain
- Enclave rejects tampered activation data (hash mismatch)
-
MeasurementEnclavetrait compiles and mock implementation works - Mock enclave produces valid, signed attestations
- Causal intervention runs inside mock enclave
- Attestation chain from enclave integrates with Phase 10 wire protocol
- End-to-end: capture → enclave probe → attest → transmit → verify
Phases 1–11 produce, sign, chain, transport, and hardware-verify attestations, but every attestation exists only in memory for the duration of a single process. A governance framework needs a persistent, queryable record of every attestation ever produced — an auditable history that an independent body can inspect after the fact. This phase adds the storage layer.
Without persistence, attestation chains are ephemeral. A model operator could produce a failing attestation, discard it, retune, and produce a passing one — with no evidence the first ever existed. A persistent store with append-only semantics makes this detectable: gaps in the chain, missing sequence numbers, or timestamp discontinuities are all audit signals.
got-enclave / got-wire
│
│ GeometricAttestation (signed)
▼
┌─────────────────────────────────────────┐
│ got-store │
│ │
│ AttestationStore trait │
│ ├── append(attestation) → StoreId │
│ ├── get(id) → Attestation │
│ ├── chain(model_id) → Vec<Att> │
│ ├── query(filter) → Vec<Att> │
│ └── audit(model_id) → AuditReport │
│ │
│ Backends: │
│ ├── MemoryStore (testing) │
│ └── FileStore (PoC persistence) │
└─────────────────────────────────────────┘
| Property | Guarantee |
|---|---|
| Append-only | Once stored, an attestation cannot be modified or deleted. |
| Chain-aware | The store validates parent_hash links on insertion. Orphaned attestations (parent not in store) are flagged. |
| Signature-verified | Every attestation is signature-verified before storage. Invalid signatures are rejected. |
| Queryable | Attestations can be retrieved by model ID, signer public key, time range, schema version, or causal flag. |
| Deterministic IDs | Store IDs are derived from the attestation's content hash (SHA-256 of serialised form), making them reproducible. |
pub struct StoreFilter {
pub model_id: Option<String>,
pub signer: Option<VerifyingKey>,
pub after: Option<u64>, // timestamp lower bound
pub before: Option<u64>, // timestamp upper bound
pub schema_version: Option<u16>,
pub causal_flag: Option<bool>,
}Filters compose conjunctively: all specified fields must match.
The audit() method produces a structured summary of a model's attestation history:
pub struct AuditReport {
pub model_id: String,
pub total_attestations: usize,
pub chain_length: usize,
pub chain_valid: bool, // all parent_hash links verified
pub first_timestamp: Option<u64>,
pub last_timestamp: Option<u64>,
pub schema_versions_seen: Vec<u16>,
pub drift_summary: DriftSummary,
pub causal_summary: CausalSummary,
pub signers: Vec<[u8; 32]>, // unique signer key hashes
}
pub struct DriftSummary {
pub readings_with_drift: usize,
pub max_drift: Option<f64>,
pub mean_drift: Option<f64>,
}
pub struct CausalSummary {
pub attestations_with_causal: usize,
pub causal_pass_count: usize,
pub causal_fail_count: usize,
pub mean_consistency: Option<f64>,
}This gives an auditor a single-call summary: how many attestations, is the chain intact, has drift been stable, are causal checks passing.
store_root/
├── index.json # model_id → list of content hashes
├── attestations/
│ ├── <sha256_hex>.json # one file per attestation
│ └── ...
└── audit_cache/ # optional cached audit reports
Each attestation is stored as a JSON file named by its content hash. The index maps model IDs to their attestation chains (ordered by timestamp). This is intentionally simple — a production system would use a database.
When a wire protocol exchange completes (Phase 10), both the local and peer attestations can be stored:
let (result, verdict) = perform_exchange(...)?;
store.append(&result.peer_attestation, &peer_verifying_key)?;This creates a record of every attestation received from peers, building a multi-agent audit trail.
- Append and retrieve single attestation
- Chain validation accepts valid chain, rejects broken parent_hash
- Signature verification rejects tampered attestation on insert
- Query by model_id returns only matching attestations
- Query by time range filters correctly
- Query by causal_flag filters correctly
- Audit report reflects correct chain length, drift, and causal summaries
- FileStore persists and reloads across instantiations
- Duplicate append (same content hash) is idempotent
- MemoryStore and FileStore produce identical results for same operations
- Integration: enclave → wire → store → audit end-to-end pipeline
-
AttestationStoretrait compiles withMemoryStoreandFileStorebackends - Append-only semantics enforced (no mutation or deletion)
- Chain validation catches broken parent links
- Query filters work conjunctively across all fields
-
AuditReportcorrectly summarises chain integrity, drift, and causal status -
FileStoresurvives process restart (write, drop, re-open, read) - Integration test: produce attestation chain → store → query → audit
Phases 1–12 build a complete measurement pipeline: train probes, sign attestations, chain them, transport them, capture activations in hardware, and persist the audit trail. But a security audit of the PoC reveals three structural weaknesses that an adversary could exploit even if every phase works as designed. This phase hardens the protocol against those attacks.
Three flaws, in descending severity:
Flaw 1 — Model callback supplied by untrusted agent (CRITICAL). causal_check() accepts a model_fn: &dyn Fn(&[f32]) -> Vec<f32> closure from the caller. The Phase 8 causal intervention — the keystone of the entire system — perturbs activations and observes the output shift. But if the adversary controls model_fn, they can return whatever outputs make the probe appear causal. The enclave trait MeasurementEnclave::run_causal_check() also takes model_fn as a parameter, and enclave_pipeline() passes it through. An adversary who supplies a fake forward pass can make any probe appear causally valid, defeating the keystone entirely.
Flaw 2 — Frobenius drift hides directional gaming (SERIOUS). drift_from() returns a single scalar: ‖Φ_new − Φ_ref‖_F / ‖Φ_ref‖_F. An adversary can surgically modify the geometry in probe-relevant directions while keeping the global Frobenius norm small. The unused dimensions absorb the perturbation budget. The verifier sees drift within bounds, but the probe-relevant subspace has shifted enough to invalidate the measurement.
Flaw 3 — No chain completeness guarantee (MODERATE). An agent can produce an unfavourable attestation, discard it, and re-chain from the prior attestation. The verifier sees a valid chain with correct parent hashes and signatures, but has no way to detect the gap. The divergence-flagged attestation vanishes from the record.
Principle: The model forward pass is enclave-internal, not caller-supplied.
Define in got-probe/src/intervention.rs:
/// Encapsulates a model's forward pass from a probed layer to output.
///
/// In production, the implementation lives inside the TEE and is loaded
/// from a verified model shard. The enclave owns the handle; the caller
/// never supplies it per-call.
pub trait ModelHandle {
fn forward(&self, h: &[f32]) -> Vec<f32>;
}
/// PoC convenience wrapper: wraps a closure as a ModelHandle.
/// In production, this is replaced by a TEE-internal model shard loader.
pub struct ClosureModelHandle<F: Fn(&[f32]) -> Vec<f32>> {
f: F,
}
impl<F: Fn(&[f32]) -> Vec<f32>> ClosureModelHandle<F> {
pub fn new(f: F) -> Self { Self { f } }
}
impl<F: Fn(&[f32]) -> Vec<f32>> ModelHandle for ClosureModelHandle<F> {
fn forward(&self, h: &[f32]) -> Vec<f32> { (self.f)(h) }
}| Location | Before | After |
|---|---|---|
causal_check() |
model_fn: &dyn Fn(&[f32]) -> Vec<f32> |
model: &dyn ModelHandle |
causal_check_multi_layer() |
model_fn_by_layer: &dyn Fn(usize, &[f32]) -> Vec<f32> |
model: &dyn ModelHandle with layer routing internal |
MeasurementEnclave::run_causal_check() |
model_fn parameter |
Drop parameter; enclave uses internal handle |
MockEnclave::new() |
No model parameter | model: Box<dyn ModelHandle> provisioned at construction |
enclave_pipeline() |
model_fn parameter |
Drop parameter; enclave already owns model |
MeasurementSidecar::ingest() |
model_fn: Option<&dyn Fn(…)> |
model: Option<&dyn ModelHandle> |
The API makes model access enclave-internal. The model is provisioned into the enclave at construction, not handed in per-call by the untrusted agent. In the PoC, ClosureModelHandle is functionally equivalent to the current closure, but the ownership boundary is architecturally correct: production TEE replaces ClosureModelHandle with a TeeModelShard loaded into enclave memory from a verified image.
PoC limitation (documented): Whoever constructs MockEnclave still supplies the handle — same trust boundary as before in a dev container. But the API is correct for production, and the ownership semantics are explicit.
Principle: Drift is measured along each probe direction, not just globally.
Add to CausalGeometry:
/// Drift along a specific probe direction.
///
/// Computes |wᵀ(Φ_new − Φ_ref)w| / |wᵀΦ_ref w|.
/// This measures how much the geometry has changed specifically
/// in the direction the probe measures, not just globally.
pub fn directional_drift(
&self,
reference: &CausalGeometry,
direction: &[f32],
) -> Result<f32, GeometryError> {
// wᵀΦw via the Gram matrix
let quad_new = self.quadratic_form(direction)?;
let quad_ref = reference.quadratic_form(direction)?;
if quad_ref.abs() < f32::EPSILON {
return Ok(if (quad_new - quad_ref).abs() < f32::EPSILON { 0.0 } else { f32::INFINITY });
}
Ok((quad_new - quad_ref).abs() / quad_ref.abs())
}
/// Compute wᵀΦw for a direction vector w.
fn quadratic_form(&self, w: &[f32]) -> Result<f32, GeometryError> {
self.check_vec(w, "direction")?;
let n = self.hidden_dim;
let mut result = 0.0f32;
for i in 0..n {
for j in 0..n {
result += w[i] * self.gram[i * n + j] * w[j];
}
}
Ok(result)
}| Location | Change |
|---|---|
read_probe_checked() |
After global Frobenius check, call directional_drift() with the probe's weight vector. Reject if directional drift exceeds ProbeSet::max_directional_drift. |
ProbeSet struct |
Add max_directional_drift: Option<f32> field (backward-compatible default None). |
GeometricAttestation |
Add directional_drifts: Vec<DirectionalDrift> field (serde-skippable when empty). |
DirectionalDrift |
New struct: { probe_name: String, drift: f32 }. |
serialise_for_signing() |
Include directional drifts — they are covered by Ed25519 signature. |
verify_chain() |
Check per-direction drift bounds if present. |
The adversary can no longer hide probe-direction changes inside a favourable global norm. Each probe direction is individually bounded. To game the system, the adversary would need to keep drift small in every direction the probes actually measure — which is exactly what "the geometry hasn't changed" means.
Principle: Every attestation carries a signed, monotonically increasing sequence number that the enclave controls.
Add to GeometricAttestation:
/// Monotonic sequence number assigned by the enclave.
/// The enclave increments this on every attestation; the counter
/// never resets. Gaps in the sequence indicate omitted attestations.
#[serde(default)]
pub sequence_number: u64,MockEnclave gets a next_sequence: u64 field (starts at 0). Each call to attest() or attest_with_causal() embeds the current value in the attestation and increments the counter. The counter is never reset — reset() clears frames but not the sequence.
In production, the counter is backed by a hardware monotonic counter (SGX sgx_create_monotonic_counter, SEV via vTPM). Destroying and re-creating the enclave destroys the counter — which is exactly the right threat model.
serialise_for_signing() includes the sequence number, so it is covered by the Ed25519 signature and cannot be tampered with.
verify_chain() checks all[i].sequence_number == all[i-1].sequence_number + 1 for every link after the anchor. A gap means an attestation was produced and discarded — the chain is rejected.
The sequence number is signed, so the agent cannot forge it. The enclave counter only increments, so the agent cannot reset it. If the verifier sees sequence [0, 1, 3], it knows attestation #2 was produced and omitted. The agent cannot suppress unfavourable results without leaving detectable evidence.
| Priority | Flaw | Scope | Rationale |
|---|---|---|---|
| 1st | #3 Sequence number | Small, additive | No API breaks. Clean additive change to struct + enclave + verifier. |
| 2nd | #2 Directional drift | Medium, new geometry function + attestation field | Additive. New method on CausalGeometry, new field on attestation. |
| 3rd | #1 ModelHandle trait | Large, trait + all callsites + all tests | Refactors the MeasurementEnclave trait and all causal intervention callsites. Goes last to avoid merge conflicts with #3 and #2. |
Sequence number:
- Two consecutive attestations have sequence numbers 0, 1
verify_chain()rejects chain with gap (0, 2)verify_chain()rejects chain with duplicate sequence (0, 0)reset()does not reset sequence counter- Sequence number is included in signed payload (tampering detected)
Directional drift:
directional_drift()returns 0.0 for identical geometriesdirectional_drift()detects change along probe direction even when global Frobenius drift is smallread_probe_checked()rejects when directional drift exceeds boundread_probe_checked()passes when directional drift is within bound- Directional drifts are included in signed attestation payload
ModelHandle trait:
ClosureModelHandlewraps a closure and produces same results as direct callMockEnclaveconstructed withModelHandleruns causal check without externalmodel_fnrun_causal_check()uses enclave-internal model (no parameter)enclave_pipeline()works withoutmodel_fnparameter- Existing causal intervention tests pass with
ClosureModelHandlewrapper
-
sequence_numberfield added toGeometricAttestation, included inserialise_for_signing() -
MockEnclaveassigns monotonic sequence numbers;reset()preserves counter -
verify_chain()rejects gaps and duplicates in sequence -
directional_drift()computes probe-direction-specific drift -
read_probe_checked()enforces directional drift bound when configured -
DirectionalDriftrecords included in attestation and covered by signature -
ModelHandletrait defined;ClosureModelHandlePoC wrapper implemented -
MeasurementEnclave::run_causal_check()no longer acceptsmodel_fnparameter -
MockEnclaveowns model handle;enclave_pipeline()dropsmodel_fnparameter - All existing tests updated and passing
- New adversarial-scenario tests for each flaw
The entire value of this system depends on one question: do the probes measure what they claim to measure?
Every phase from 1 to 11 is infrastructure to make probe readings reproducible, signed, chained, transportable, and tamper-evident. But if the probe for "honesty" actually measures "token frequency of words commonly associated with honesty in the training corpus," then the entire system is a precisely calibrated instrument pointed at the wrong thing.
The defence is layered:
| Layer | Phase | What it guarantees |
|---|---|---|
| Statistical validity | 1–6 | Probes produce deterministic, reproducible readings under the causal inner product. Divergence flags indicate when the reading is unreliable. |
| Temporal continuity | 7 | Geometry drift detection ensures probes are not applied to a model that has changed beyond their validity window. Chaining makes drift auditable. |
| Causal validity | 8 | Keystone. Causal interventions prove that perturbing activations in the probe direction changes the model's output. The probe reads a real mechanism, not a surface correlation. |
| Coverage | 9 | Inline measurement ensures every inference is monitored, not just spot-checks. The model cannot behave differently when unobserved. |
| Transport integrity | 10 | Encrypted, signed wire protocol ensures attestations are not tampered with in transit between agents. |
| Hardware authenticity | 11 | Hardware-isolated capture ensures the activations being probed are real, not sanitised by the model. |
| Audit persistence | 12 | Append-only attestation store with chain validation ensures no attestation can be silently discarded, and an auditor can inspect the full history after the fact. |
| Adversarial hardening | 13 | Model callback is enclave-owned (not caller-supplied), drift is measured per-probe-direction (not just globally), and signed monotonic sequence numbers prevent attestation omission. |
Without Phase 8, all other layers secure a measurement that might be meaningless.
This is why Phase 8 is the keystone. It is also the hardest phase: causal intervention requires a model_fn callback that exposes the model's forward pass, perturbation magnitudes that are empirically valid, and a consistency threshold that separates real mechanisms from noise. Getting this right is more important than getting the wire protocol encrypted or the hardware capture tamper-proof.
- The causal inner product ⟨w, h⟩_c = wᵀΦh is computable from the unembedding matrix
- Linear probes trained under this metric produce deterministic readings
- Those readings can be assembled into a cryptographically signed attestation
- The attestation is independently reproducible: same weights + same input + same probes = identical output
- The format is self-describing and version-tagged for forward compatibility
- Geometry drift is measurable and boundable, making self-learning models auditable
- Attestation chains create a tamper-evident history of model evolution
- Causal interventions can distinguish probes that measure real mechanisms from probes that exploit surface correlations
- Inline measurement can monitor every inference, not just periodic audits
- A purpose-built wire protocol (GOT/1) can transport attestations between agents with encryption (Noise NK), signed exchange envelopes, replay protection, and chain verification in a single round-trip
- Hardware-isolated measurement can capture activations below the model's software stack
- Persistent, append-only attestation storage with chain validation and audit reporting enables after-the-fact inspection of a model's entire measurement history
- Enclave-owned model handles prevent an adversary from faking the causal forward pass
- Per-probe directional drift detects surgical geometry changes that global Frobenius norm misses
- Signed monotonic sequence numbers make attestation omission detectable
- That the probe readings mean anything about AI values (this requires causal validation with real models, not just the synthetic tests in the PoC)
- That the corpus used to train the probes is representative, fair, or legitimate
- That the coverage flags reliably indicate when the probes are out of distribution
- That the confidence values are calibrated (they are not, in the PoC)
- That causal intervention with synthetic
model_fntransfers to real model forward passes - That perturbation magnitude δ is ecologically valid for real inputs
- That hardware-isolated capture is feasible at production inference latencies
- That any institution is prepared to take responsibility for interpreting the output
The geometry is a measurement instrument. Like any instrument, it reports what it is pointed at. Who decides what it is pointed at — which value dimensions are probed, which corpus defines the labels, what threshold separates "reliable" from "unreliable," and who adjudicates disputes about coverage — is a governance question.
Phases 9–12 address the technical gap — proving that probes measure real mechanisms, monitoring every inference, securing the transport, capturing activations at the hardware level, and persisting the audit trail. But the institutional gap remains: even a perfectly validated, causally verified, tamper-evident measurement is meaningless without an institutional context that decides what the number is allowed to count as.
This PoC is the technical proof that a governance framework would have something concrete to govern. The probe produces a number. Causal intervention proves the number reflects a real mechanism. The attestation signs it. The protocol lets someone else verify it. Hardware isolation proves the measurement wasn't faked. But who decides what the number means — that is the hardest problem, and it is not a technical one.