Skip to content

feat(kv-cache): encrypted SSD KV cache — primitives, verification, and flag-gated engine integration#237

Open
anupsv wants to merge 27 commits into
masterfrom
feat/ssd-kv-cache
Open

feat(kv-cache): encrypted SSD KV cache — primitives, verification, and flag-gated engine integration#237
anupsv wants to merge 27 commits into
masterfrom
feat/ssd-kv-cache

Conversation

@anupsv
Copy link
Copy Markdown
Contributor

@anupsv anupsv commented May 29, 2026

Summary

Encrypted-at-rest KV cache for the Swift provider: persist prefill KV to
disk encrypted, reload after restart/eviction to skip re-prefill. 12
commits, ~3.5k LOC, 103 KV-cache tests green.

DRAFT — two gates before merge:

  1. Submodule PR must merge first. This branch bumps
    libs/mlx-swift-lm to a branch (feat/encrypted-prefix-cache-persistence)
    adding the additive PrefixCachePersistence hook. That submodule PR
    has to land before this can go to master.
  2. 2-Mac smoke test pending. End-to-end behavior with the flag ON
    can't run in CI — needs the M5 Thunderbolt pair. All unit/integration
    coverage here runs on swift test.

What's in it

Crypto primitives (P0)EncryptedKVStore (AES-256-GCM, per-file
random DEK, HKDF-derived per-chunk nonces, metadata-as-AAD tamper
binding, atomic write + dir fsync) and KVCacheKEK (envelope: per-file
DEK wrapped by an SE-derived KEK held in Keychain). Async + sync
(writeSync/readSync) paths share the format.

Cache machinery (P1–P3)PrefixCacheRAM (LRU), PrefixDigest +
PrefixCacheIndex (exact-checkpoint lookup), PrefixCacheManager
(orchestration actor with the MB-1 model-binding guard), and
KVCacheSerializer ([KVCache] ↔ encryptable bytes, bf16-exact).

Live integration (Path 2) — backs the engine's own in-GPU block
prefix cache with our encryption: a PrefixCachePersistence hook
(submodule) calls EncryptedPrefixCachePersistence on block evict/load,
so evicted blocks are AES-GCM-encrypted to SSD and reloaded instead of
re-prefilled. Gated behind the default-off DARKBLOOM_PREFIX_CACHE flag.

Verified before building

  • Cross-model cacheability + the rotating-cache snapshot/restore hazard
    resolved empirically (a review claimed temporalOrder scrambles on
    restore; tests prove it's correct when state+metaState round-trip
    together).
  • Qwen3.5/3.6/3.7 MoE + Gemma-4 26B-A4B + GPT-OSS-20B confirmed from
    source: all hybrid (sliding/recurrent), so the cache is exact-
    checkpoint
    (no arbitrary longest-prefix); Mamba layers are RAM-only
    (recurrent state isn't a per-token prefix). Adversarial review of P0–P2
    found 0 critical / 0 high; fixes applied.

⚠️ SECURITY — TB-007 (must read before enabling)

Severity: HIGH (inherent, when the flag is enabled) — mitigated to LOW as-shipped.

  • Inherent — HIGH: cross-tenant information disclosure (TTFT timing oracle + shared prefix blocks) in a multi-tenant provider. It breaks tenant isolation, a core privacy property. It is a timing/inference side-channel, not direct content exfiltration (a tenant cannot read another's KV without already knowing the exact tokens) — hence HIGH, not CRITICAL.
  • Residual — LOW (as shipped): the cache is default-OFF, opt-in only via DARKBLOOM_PREFIX_CACHE, and enabling it requires an explicit operator threat-model sign-off. Flag off (the default) ⇒ no cache ⇒ no exposure.

The engine prefix cache was deliberately disabled (TB-007) for a
cross-tenant data-leak / TTFT side-channel: the provider cannot see
tenant identity
, so the cache is shared across consumers. This PR adds
encryption-at-rest (disk-theft defense) but does NOT close the
in-process cross-tenant sharing/timing channel. It ships only behind
the default-off flag and requires an explicit operator threat-model
sign-off. Flag off = today's exact behavior (engine prefixCache: nil).

Notes / follow-ups

  • The block-level engine integration supersedes the checkpoint-level
    PrefixCacheManager/Index/Digest/RAM for the live path (the
    engine already owns lookup/LRU/indexing). Those remain for non-engine
    use; only EncryptedKVStore + KVCacheSerializer + KEK are on the
    live path.
  • Model binding uses modelId; weight-hash binding (invalidate on weight
    change under the same id) is a follow-up.
  • Design + threat model: docs/ssd-kv-cache-design.md.

View with Codesmith Autofix with Codesmith
Need help on this PR? Tag @codesmith with what you need. Autofix is disabled.

Gajesh2007 and others added 14 commits May 27, 2026 22:29
…arrival path

Provider (Swift):
- Add NetworkPowerAssertion: IOKit assertions for NetworkClientActive
  and BackgroundTask, acquired for the entire provider session
- Keeps the macOS network stack alive during sleep so APN pushes
  (courier.push.apple.com:5223) can be delivered for MDM commands
- No root required — uses IOPMAssertionCreateWithName API

Coordinator:
- Increase SecurityInfo timeout from 30s to 90s to cover Power Nap
  cycles (every ~15 minutes on AC power)
- Add OnLateSecurityInfoCallback: when a SecurityInfo webhook arrives
  after the 90s timeout, retroactively upgrade the self_signed provider
  to hardware trust instead of silently dropping the response
- Wire up the callback in main.go alongside the existing SetOnMDA
  callback, using the same ForEachProvider + serial match pattern
…in-backed

Reflects commit 4a0dae5 (PersistentEnclaveKey.swift). Key changes:

- TB-003 how_it_works: document Security framework persistent key, access group
  SLDQ2GJ6TL.io.darkbloom.provider, kSecAttrIsPermanent, and the
  errSecMissingEntitlement fallback behaviour on patched binaries
- TB-003 current_limitations: add two new limitations — team-scoped cross-binary
  keychain access, and silent ephemeral fallback that defers rejection to the
  coordinator rather than failing at the process boundary
- TB-009 how_it_works: rewrite SE key lifecycle section to reflect persistent
  identity across restarts; rotation now requires explicit keychain deletion
- T-013 (binary tampering) mitigations: add keychain access group enforcement as
  a fourth, implemented mitigation; update detection_hint
- T-033 (attestation replay) affected_files: add PersistentEnclaveKey.swift and
  AttestationSigner.swift; update mitigation wording
- T-035 (repudiation after rotation) description, mitigations, detection_hint:
  reframe rotation as an explicit operator action rather than automatic per
  launch; note kSecAttrIsPermanent as a positive mitigation; add open finding
  that coordinator cannot detect opportunistic keychain delete + re-registration

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
First phase of SSD KV cache per docs/ssd-kv-cache-design.md. P0
delivers the cryptographic plumbing only; no integration with
BatchScheduler yet (that's P3).

Adds:

  - `KVCache/KeyWrappingService` — protocol abstracting wrap/unwrap
    of the KEK at rest. Two impls: `InMemoryKeyWrappingService`
    (tests) and `SecureEnclaveKeyWrappingService` (prod; ECIES
    via the existing persistent SE identity using
    `.eciesEncryptionStandardX963SHA256AESGCM`).

  - `KVCache/WrappedKEKStorage` — protocol abstracting the at-rest
    persistence layer. `KeychainWrappedKEKStorage` for prod (data-
    protection Keychain with `kSecAttrAccessibleAfterFirstUnlock-
    ThisDeviceOnly`), `InMemoryWrappedKEKStorage` for tests. Two
    layers (wrap + storage) so the entitlement-gated Keychain path
    is independently swappable from the SE-bound wrap path; both
    are individually exercised in tests.

  - `KVCache/KVCacheKEK` — actor that generates/persists the KEK
    on first run, holds the unwrapped key in actor-isolated state,
    and exposes per-file DEK wrap/unwrap with AAD binding.

  - `KVCache/EncryptedKVStore` — on-disk format codec for the
    `.darkbloom-kv` file. Layout: magic ‖ version ‖ flags ‖
    file_IV ‖ wrapped_DEK ‖ metadata-JSON ‖ encrypted chunks. Per-
    chunk nonces derived via HKDF-SHA256 from (DEK, file_IV,
    chunk_index) — no nonce reuse, no nonce storage. Metadata
    bytes are bound as AAD on every chunk seal AND on the DEK
    wrap, so tampering with any field surfaces as an authentication
    failure on the first chunk decrypt (and on DEK unwrap too,
    via belt-and-suspenders).

  - `Security/PersistentEnclaveKey+ECIES.swift` — adds
    `eciesEncrypt/eciesDecrypt` to the existing SE identity. The
    only modification to existing code is bumping `privateKey` from
    `private` to `internal` so the extension can reach it; the SE
    still owns the key material.

Threat model + format details: docs/ssd-kv-cache-design.md.

Tests:
  - KeyWrappingServiceTests — 7 cases: wrap roundtrip, unique
    ciphertexts per call, tamper detection (ct + tag), wrong key,
    malformed input, empty plaintext.
  - KVCacheKEKTests — 10 cases including SE-roundtrip
    + Keychain-storage roundtrip (both skip on missing
    entitlement, same pattern as existing PersistentEnclaveKey
    tests).
  - EncryptedKVStoreTests — 13 cases covering single/multi-chunk
    roundtrip, metadata-only read, tamper at magic / version /
    metadata / chunk-ct levels, truncation, wrong-KEK,
    chunk-count/size mismatch, and the HKDF nonce derivation
    determinism + diversity properties.

All 32 tests pass (2 skip on missing keychain-access-groups
entitlement in unsigned debug builds; same pattern as
PersistentEnclaveKeyTests).
Corrects a misleading claim in the SSD KV cache design: §5.1
previously stated that `model_hash` in the metadata AAD "binds the
cache to a specific model file — reloading after a model upgrade fails
closed." That overstated what the crypto does.

The AES-GCM layer uses the file's OWN metadata (read back from disk) as
AAD on both the DEK unwrap and every chunk seal. This proves the bytes
weren't altered since write (tamper-evidence) but does NOT verify the
file belongs to the currently-loaded model: a structurally valid cache
file authored for model A decrypts cleanly while model B is loaded,
because both supply the file's embedded metadata as AAD. The cipher
cannot distinguish "right model" from "wrong model."

Adds:
  - §5.1 rewritten to spell out what the AAD does (tamper-evidence)
    vs does NOT (model-binding).
  - §8.1.1 new section defining invariant MB-1: PrefixCacheManager
    MUST check meta.modelHash == currentLoadedModelHash (and an
    architectural-shape guard for 12-char hash-prefix collisions)
    BEFORE unwrapping the DEK or decrypting. Includes the guard
    pseudocode and three P3 regression tests that must fail if the
    guard is removed.
  - §10 two new failure-mode rows (cross-model file, shape mismatch),
    each noting they're caught by MB-1, not the crypto.
  - §14 P3 phase now explicitly includes MB-1 + its rejection tests.
  - docs/ssd-kv-cache-model-binding.{mmd,svg} — read-path flowchart
    showing where MB-1 sits (between metadata-read and decrypt) and a
    callout box on what the AES-GCM layer does vs doesn't guarantee.
    Animated SVG (draws on in a browser; renders static when embedded).

No code change — P0 crypto is unaffected; MB-1 is an application-layer
guard that lands with the P3 BatchScheduler integration.
…dow models

Empirically resolves the open correctness question gating prefix cache
for sliding-window models (GPT-OSS-20B, Gemma-4 26B-A4B MoE). Two
review agents disagreed on whether RotatingKVCache survives
snapshot -> restore -> resume; an adversarial lens claimed
temporalOrder() scrambles token order on restore + multi-token prefill.

These tests settle it from runtime behaviour. Each token t is encoded
as K=t, V=t+100 so the returned (keys,values) reveal exact token order;
any scrambling shows as a content mismatch vs a never-reset reference.

RotatingKVCacheRestoreTests (single-stream, the circular-buffer case):
  - multi-token prefill after restore matches reference (the adversary's
    exact scenario — wrap the buffer, snapshot, restore, prefill)
  - single-token decode after restore matches
  - pre-wrap restore matches
  - omitting metaState on restore DOES corrupt order — proving idx/offset
    serialization is load-bearing (the failure mode is real but only
    under misuse: state without metaState)

BatchRotatingExtractRoundtripTests (the batched path our design uses):
  - extract(row) isolates the correct row (no cross-row leakage)
  - extract(row) -> snapshot -> restore -> resume matches extract ->
    resume, for both multi-token and single-token continuations

Verdict: rotating-cache restore is CORRECT when state AND metaState are
restored together (the loadPromptCache contract). BatchRotatingKVCache
keeps a linear front-trimmed buffer (no circular wrap), and its
extract(idx) returns a single-stream RotatingKVCache with a fully
populated metaState (keep,maxCacheSize,step,offset,idx) — so our
design path (extract row -> single-stream -> serialize) sidesteps the
batched cache's own incomplete metaState. The metaState-sync
requirement is now a guarded invariant.

All 7 tests pass. No production code changed.
Rewrites §4.4 and adds §4.5 to reflect what was verified against the
actual mlx-swift-lm source + the empirical rotating-restore tests.

Corrections to the earlier draft:
  - "slice the first N columns" was wrong: caches are 4-D
    [B,kvHeads,seq,headDim], sliced on axis 2, and the snapshot is taken
    by extractBatched(row) from the live batched caches — not naive
    slicing.
  - Arbitrary longest-prefix match (old O2) is DROPPED for the models we
    serve. Recurrent (Mamba/GatedDeltaNet) and sliding-window layers
    cannot be sliced to a shorter prefix, so reuse is EXACT-CHECKPOINT
    only: hit when the incoming prompt's prefix is byte-identical to a
    cached checkpoint boundary (e.g. end of a system prompt). Covers the
    dominant shared-system-prompt case.

New §4.5 — verified per-model cacheability (cache type detected at LOAD
time, never hardcoded; MoE is irrelevant — it only changes the FFN):
  - Qwen3.5 MoE / Qwen3-Next: hybrid MambaCache + KVCacheSimple →
    exact-checkpoint (recurrent state restorable at boundary, not
    sliceable).
  - Gemma-4 26B-A4B MoE: sliding RotatingKVCache(512) + full; only 15
    non-shared caches to snapshot (20 layers KV-share, auto-reconstructed).
  - GPT-OSS-20B MoE: sliding RotatingKVCache(128) + full; attention
    sinks are learned weights, not KV state → no snapshot impact.
  - Unsupported cache types (Chunked/Quantized/CacheList/DeepseekV4
    pooling) gated out at load time → cold path, no error.

Invariant MS-1 (metaState-sync): extracted caches must persist
metaState in sync with state (RotatingKVCache's idx/offset drive
temporalOrder). EncryptedKVStoreMetadata already has the metaState
field; regression-guarded by omittingMetaStateOnRestoreCorruptsOrder.

Also: status line updated (P0 landed + cacheability verified), O2 marked
dropped, per-request flow diagram shows exact-checkpoint + MB-1 guard,
§15 records what verification resolved + adds [Q8].
The decrypted RAM tier of the SSD KV cache (design §4.1). Holds
recently-used prefix KV snapshots as live [any KVCache] (one extracted
single-stream cache per layer) so a repeat request whose prompt prefix
is byte-identical to a cached checkpoint hits RAM instead of decrypting
from SSD or running a cold prefill.

Scope: RAM only — no SSD, no encryption, no BatchScheduler wiring (those
are P2-P4). Plain final class (non-Sendable: holds MLXArrays), to be
owned by the PrefixCacheManager actor in P3; tests run single-threaded.

Design points:
  - Keyed by (modelHash, prefixDigest). modelHash is the locally-computed
    weight hash, so a lookup for model B structurally cannot return model
    A's entry — the RAM-tier half of MB-1.
  - get() returns copy() of each stored cache (upstream's blessed clone,
    state.map { $0[.ellipsis] }), so a consumer can seed a batch row and
    decode into the returned caches without corrupting the stored
    snapshot. This is the load-bearing invariant.
  - LRU eviction by a monotonic use-counter (not wall-clock — keeps it
    deterministic, avoids Date.now()), bounded by BOTH an entry count and
    a byte budget (measured via MLXArray.nbytes over each cache's state).
  - clear(modelHash:) for model unload; entriesForFlush(modelHash:)
    returns copies for the P4 SSD-flush path without removing.

Tests (9, all pass):
  - hit/miss + tokenCount round-trip
  - MB-1: lookup keyed by modelHash (no cross-model bleed)
  - distinct digests are distinct entries
  - getReturnsIndependentCopy: mutate a returned copy (append 5 tokens),
    assert the stored snapshot stays at its original offset — proves
    copy()-on-get protects the snapshot
  - LRU eviction by entry count (recently-used survives, LRU evicted)
  - eviction by byte budget
  - clear(modelHash:) drops only that model
  - put replaces an existing key without leaking byte accounting
  - entriesForFlush returns per-model copies without removing
The exact-checkpoint lookup layer (design §4.4, §7; [Q3] resolved to
JSON, not SQLite). No SSD I/O of cache payloads and no BatchScheduler
wiring yet — that's P3 (held for review).

PrefixDigest — checkpoint keys:
  - Checkpoints at the O9 boundaries (256/512/1024/2048/4096/8192).
  - For a prompt's token array, computes SHA-256 of the first c tokens
    at each checkpoint c <= count, in a SINGLE pass by snapshotting the
    rolling hash at each boundary. Proven (checkpointDigestEquals-
    IndependentPrefixHash) to equal an independent hash of the first c
    tokens — so a longer cached prefix is findable from a shorter shared
    one, and two prompts sharing a system prompt agree on every
    checkpoint digest within the shared region.
  - Tokens hashed as little-endian Int64 after a domain-separation tag;
    stable across machines, can't collide with other SHA uses.

PrefixCacheIndex — JSON-persisted, in-RAM:
  - Maps (modelHash, digestHex) -> {relativePath, tokenCount, fileBytes,
    createdAt, lastHitAt, hitCount}.
  - findLongestCheckpoint(modelHash:tokens:): computes the prompt's
    checkpoint digests and returns the entry for the LONGEST checkpoint
    present for that model — the exact-checkpoint match. Partitioned by
    modelHash (MB-1: model B can't match model A's entries).
  - record / touch / remove / removeModel / entriesLRUFirst (eviction
    order for P6) / rebuild(from:) (recover when JSON missing/corrupt).
  - Atomic JSON write-back on save(); dirty-tracked. A corrupt index
    file is treated as empty (logged), not fatal — SSD files are self-
    describing and the index rebuilds from them.
  - Timestamps passed in by the caller (now: Int64), keeping the type
    deterministic and clock-free for tests.

Tests (16, all pass): digest determinism + prefix-sensitivity,
single-pass == independent-hash equivalence, shared-prefix agreement,
boundary/token-count handling; index exact + longest-checkpoint match,
divergent-prefix miss, MB-1 model scoping, touch metadata, remove/
removeModel, LRU ordering, JSON persistence round-trip, corrupt-file
recovery, rebuild.
A 5-dimension adversarial review (findings verified by refutation)
found 0 critical, 0 high, 6 medium, 9 low. Fixes for the cheap/
clearly-correct ones; the two genuinely-P3 items (PCR-1 sending-returns
across the actor boundary, XC-3 MB-1 guard + tests) are deferred to P3
where the PrefixCacheManager actor lands. KV-3 (redundant magic field)
is won't-fix.

Crypto / format:
  - KV-1, XC-2: fix 3-way doc drift on per-chunk nonce derivation. The
    code uses HKDF-Expand-only (Extract skipped — DEK is already a
    uniform 256-bit key, RFC 5869 §3.3) with file_IV folded into `info`,
    NOT file_IV as an HKDF salt. Header + byte-layout comments now match
    the implementation and the function doc. (No crypto change — write
    and read already shared deriveChunkNonce, so round-trip was correct.)
  - KV-2: fsync the containing directory (F_FULLFSYNC, fsync fallback)
    after the atomic rename in EncryptedKVStore.write, so a just-renamed
    cache file is durable across power loss. Best-effort: a miss only
    costs a cold prefill.
  - KV-4: SecureEnclaveKeyWrappingService.unwrap classifies auth
    failures by structured OSStatus (errSecDecode/errSecAuthFailed/
    errSecParam) instead of substring-matching a locale-dependent
    error string.

RAM tier:
  - PCR-2: PrefixCacheRAM.put refuses (and counts) an entry whose own
    size exceeds maxBytes instead of storing-then-self-evicting into a
    silent no-op. put now returns Bool (@discardableResult).
  - PCR-3: byte accounting uses innerState() (physical, step-allocated
    buffers) instead of state() (trimmed logical view), so maxBytes
    bounds true resident RAM.

Index / digest:
  - PCI-1: PrefixCacheIndex.save writes via Data.write(.atomic) directly
    (Foundation does aux-file + rename) instead of a manual tmp-<uuid> +
    replaceItemAt, which could leak a UUID-named orphan on a crash
    between write and replace with no sweep.
  - PCI-3, XC-4: entriesLRUFirst adds a deterministic secondary key
    (digestHex) so equal-lastHitAt entries order stably.
  - PD-1: PrefixDigest.checkpoints dedups boundaries so a duplicated
    caller-supplied boundary isn't double-emitted.

Tests:
  - XC-1: new batchRotatingExtractMatchesIndependentSingleStreamReference
    — builds the reference WITHOUT extract() (an independent single-
    stream RotatingKVCache fed row 0's tokens) and compares resume,
    proving extract() is semantically equivalent, not merely idempotent.
  - new: ramRejectsEntryLargerThanByteBudget (PCR-2),
    indexLRUTieBreakIsDeterministic (PCI-3).
  - ramPutReplacesExistingKeyWithoutLeakingBytes rewritten to assert the
    real no-leak invariant under physical byte accounting (PCR-3).

All KV cache tests pass (46 in the affected set; full P0-P2 suite green).
The missing primitive the SSD tier needs: convert an extracted
[any KVCache] (one single-stream cache per layer, from
BatchedCache.extractBatched) into raw byte chunks + a layout
descriptor, and back. Chunks feed straight into EncryptedKVStore so
plaintext KV NEVER touches disk — we deliberately do NOT route through
upstream savePromptCache (it writes a plaintext .safetensors and its
reconstruction helper is private). Byte round-trip via
MLXArray.asData(.copy) / MLXArray(data:shape:dtype:) is dtype-agnostic,
so bf16 round-trips exactly.

Reconstruction uses each cache type's PUBLIC state + metaState setters.

Scope — SSD-serializable: KVCacheSimple + RotatingKVCache (the
attention + sliding-window caches Gemma-4 26B-A4B and GPT-OSS-20B use,
plus all pure-attention models).

NOT SSD-serializable: MambaCache / ArraysCache (recurrent). Their
metaState setter deliberately traps (assertionFailure) and the real
reconstruction path (ArraysCache.restoreFromMetaState) is `internal` to
MLXLMCommon — unreachable from ProviderCore. Rebuilding recurrent state
via the partial public API can't be verified correct without running
the model, and a wrong recurrent state silently emits garbage tokens,
so the serializer REFUSES recurrent caches (and any hybrid stack
containing one) rather than guess. Consequence: hybrid models
(Qwen3.5/Next) get the RAM tier only — which uses copy(), no
serialization — not SSD persistence. SSD-for-Mamba is a documented
follow-up gated on upstream exposing a public reconstruction.
(This constraint was found by the test suite: an earlier attempt to set
MambaCache.metaState tripped the upstream assertionFailure — caught
before it could become a latent bug.)

Also unsupported: ChunkedKVCache, QuantizedKVCache, CacheList. serialize
throws on any unsupported layer; the P3 manager's load-time capability
gate keeps them out first.

Tests (7, all pass): KVCacheSimple state round-trip; bf16 exact
fidelity; resume-equivalence for KVCacheSimple AND wrapped
RotatingKVCache (reconstructed cache continues generation identically);
Mamba rejected (RAM-only); Chunked rejected + attention/sliding stack
accepted; full end-to-end serialize -> EncryptedKVStore.write (layout in
metaState) -> read -> deserialize.
The three-tier orchestration layer (design §4), one manager per loaded
model. Standalone and fully tested; the BatchScheduler wiring is the
NEXT step and is deliberately not in this commit (review checkpoint
before touching live inference).

Closes the two review findings deferred from P0-P2:
  - XC-3: the MB-1 model-binding guard now EXISTS and is enforced on the
    SSD load path — readMetadataOnly first, verify metadata.modelHash ==
    binding.modelHash AND the architectural shape (numLayers/kvHeads/
    headDim) BEFORE unwrap/decrypt, drop the index entry + count the
    mismatch on failure. This catches a wrong-model file the crypto
    cannot (a valid file from another model decrypts cleanly because the
    AAD is its own metadata).
  - PCR-1: the non-Sendable [any KVCache] crosses the actor boundary via
    documented @unchecked Sendable transfer types (PrefixLookupResult
    out, SendableKVCaches in). `sending` is unusable here because values
    produced through the actor-isolated PrefixCacheRAM are inferred into
    the actor's region; the boxes are sound because the caches are always
    fresh (RAM hits copy(), SSD hits freshly deserialized) and single-
    owner — matching the codebase's existing UncheckedSendable idiom.

Behavior:
  - lookup(tokens:): exact-checkpoint match, RAM tier (longest checkpoint
    first) then SSD tier (MB-1-guarded read -> deserialize -> promote to
    RAM). Returns fresh caller-owned caches + tokenCount + tier.
  - store(tokens:checkpointLength:caches:): write-back to RAM only.
  - flushToSSD(): serialize RAM entries not already on SSD -> encrypt via
    EncryptedKVStore (layout JSON in metaState) -> record + save index.
    Skips non-serializable stacks defensively.
  - capability gate: ssdEnabled requires index+kek+cacheDir; a model
    whose caches aren't KVCacheSerializer-supported (Mamba hybrids) runs
    RAM-only. Timestamps injected (now:) for determinism.

Tests (8): RAM hit + longest-checkpoint-wins; full SSD round-trip
(store -> flush -> clearRAM -> SSD hit -> promote-to-RAM); restart
persistence across manager instances; MB-1 rejects a cross-model file
even when the index points B at A's file (the symlink/collision case) —
asserts modelMismatches counted; SSD-disabled-without-backing; miss on
sub-checkpoint prompt.

Full KV cache suite green: 98 tests.
The encrypted SSD backend for the engine's in-GPU block prefix cache
(Path 2). Conforms to MLXLMCommon.PrefixCachePersistence (added to the
submodule): the engine calls saveBlock on LRU eviction and loadBlock on
a block-hash miss, so evicted blocks are AES-GCM-encrypted to disk
(surviving eviction AND restart) instead of dropped + re-prefilled.

Reuses our crypto primitives — EncryptedKVStore + KVCacheSerializer +
the KEK — keyed by the engine's content-addressed block hash. (Note:
this block-level integration supersedes the checkpoint-level
PrefixCacheManager/Index/Digest/RAM for the live path — the engine
already owns lookup/LRU/indexing; those layers remain for any non-engine
use but aren't on this path.)

EncryptedKVStore: refactored the body-build / header-assemble / atomic-
write / chunk-decrypt into shared sync helpers, and added writeSync /
readSync taking a SymmetricKey KEK directly. The engine step loop is
synchronous and can't await the KVCacheKEK actor, so the persistence
holds an already-unwrapped KEK and does synchronous crypto + I/O. The
async write/read now delegate to the same helpers — format is identical
(the 15 EncryptedKVStore tests still pass).

EncryptedPrefixCachePersistence: sync saveBlock/loadBlock; MB-1 guard on
load (metadata.modelHash + shape before decrypt); KVCacheSimple-only
(matches the engine's prefix cache); best-effort save (never throws).

SECURITY (TB-007): this adds encryption-at-rest (disk-theft defense) but
does NOT close the in-process cross-tenant sharing / TTFT side-channel —
the provider can't see tenant identity. Default-off flag + explicit
threat-model sign-off required (the BatchScheduler flag is the next
commit). Documented in the type header.

Tests (5): save/load round-trip; on-disk bytes are encrypted (DBKV magic
present, no plaintext); MB-1 rejects wrong model; wrong KEK -> nil (no
crash); and END-TO-END through the real upstream PrefixCache — maxBlocks=1
forces eviction (saveBlock) then fetch reloads from encrypted SSD
(loadBlock). EncryptedKVStore async suite still green.
Wires the encrypted prefix cache into live inference behind the
DARKBLOOM_PREFIX_CACHE env flag (default OFF — unset = exact current
behavior, prefixCache nil).

When set, makeBatchedEngine builds an EncryptedPrefixCachePersistence
and passes it as the engine's Scheduler.prefixCache, so the engine's
in-GPU block cache is backed by AES-GCM-encrypted SSD storage (evicted
blocks persist + survive restart; fetch misses reload from disk).

Guards before enabling:
  - architecture must expose numLayers/kvHeads/headDim (from the
    config.json already parsed in snapshotContainer) — else disabled.
  - KEK must be Secure-Enclave-wrapped + Keychain-persisted (so files
    survive restart). If unavailable (no SE / entitlement) we REFUSE to
    enable rather than fall back to an ephemeral key that would silently
    break restart-reuse.
  - per-model dir keyed by sha256(modelId)[:12] under the OS cache dir.

SECURITY (TB-007): enabling re-opens the cross-tenant data-leak / TTFT
side-channel that was deliberately gated — the provider cannot see
tenant identity, so the cache is shared across consumers. This commit
adds encryption-at-rest (disk-theft defense) but does NOT close the
in-process channel. Ships ONLY behind the default-off flag with an
explicit operator threat-model sign-off. Loud warning logged on enable.

modelId is used as the model-binding key; weight-hash binding (to
invalidate on a weight change under the same id) is a documented
follow-up.

Full KV cache suite green: 103 tests. End-to-end live behavior on real
hardware still needs the 2-Mac smoke test (cannot run in CI).
Points libs/mlx-swift-lm at the additive PrefixCachePersistence change
(branch feat/encrypted-prefix-cache-persistence, 6f79f04) that the
encrypted prefix-cache wiring depends on. The submodule change is
default-nil (no behavior change unless darkbloom passes a persistence
backend), so this bump is safe with the flag off.

NOTE: the submodule change lives on a branch, not yet merged to the
submodule's main. The corresponding submodule PR must merge before this
parent change can land on master.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
d-inference Ready Ready Preview May 29, 2026 9:22pm
d-inference-console-ui-dev Ready Ready Preview May 29, 2026 9:22pm
d-inference-landing Ready Ready Preview May 29, 2026 9:22pm

Request Review

anupsv added 2 commits May 28, 2026 19:39
Opt-in integration test (gated by DARKBLOOM_MODEL_TEST_DIR, so CI /
plain `swift test` skip it) that drives the REAL engine with a REAL
model through our encrypted prefix-cache path. Builds the BatchedEngine
with an EncryptedPrefixCachePersistence (injected in-memory KEK), then:

  - generates a prompt with cache OFF (reference),
  - generates the same prompt twice with cache ON,
  - asserts the cache-ON greedy (temp 0) output is byte-IDENTICAL to the
    reference — i.e. prefix KV reuse does not corrupt generation,
  - asserts the cache engaged (hits > 0, tokens_saved > 0).

Validated on a real M5 Max with mlx-community/Llama-3.2-1B-Instruct-bf16:
output matched the uncached reference exactly and the cache saved 256
prefill tokens (one full block) on the repeat request.
…validation

Adds PersistentEnclaveKey.makeTransient() — a non-persisted Secure
Enclave key (kSecAttrIsPermanent=false, no keychain access group). Since
it never touches a keychain access group it needs NO
keychain-access-groups entitlement, so SE crypto (sign / ECIES) can be
exercised on real hardware from unsigned builds. (loadOrCreate, which
persists under the team access group, still requires a signed build —
that's the same SecItem path the production attestation key uses.)

Adds kv-se-harness — a TEST-ONLY standalone executable (not a product,
not shipped) that uses a transient SE key to validate the encrypted
KV-cache key path on real hardware without code signing:
  - ECIES wrap (public) + unwrap (SE-private) via the new
    PersistentEnclaveKey+ECIES / SecureEnclaveKeyWrappingService code,
  - KVCacheKEK generate -> SE-wrap -> store -> read -> SE-unwrap,
    recovering identical key material,
  - DEK wrap/unwrap under the recovered KEK,
  - tamper rejection (flipped wrapped-DEK byte fails auth).

Validated on a real M5 Max (unsigned): all five checks PASS, closing the
SE-crypto validation gap. The only piece still requiring a signed build
is keychain PERSISTENCE of the key (production-proven attestation path),
not any code this branch introduced.
@anupsv
Copy link
Copy Markdown
Contributor Author

anupsv commented May 29, 2026

Hardware validation — M5 Max (m5-max-128gb-1, macOS 26.4.1, Swift 6.3.1)

Tested on a real M5 Max (real Metal / MLX / Secure Enclave), not just CI.

Full KV-cache suite: 103/103 pass on real Metal — including the end-to-end encrypted evict → AES-GCM encrypt → SSD → reload through the real engine PrefixCache, bf16 byte-exact round-trips, and the on-disk-is-encrypted check.

Live model path (new LivePrefixCacheModelTests, env-gated, real mlx-community/Llama-3.2-1B-Instruct-bf16):

  • cache-ON greedy output byte-identical to the cache-OFF reference → prefix KV reuse does not corrupt generation;
  • cache engaged: hits=1, 256 prefill tokens saved (one full block) on the repeat request.

Secure-Enclave crypto (new kv-se-harness, transient SE key → no entitlement → unsigned build): ECIES wrap/unwrap on the real SE, KEK generate→SE-wrap→store→read→SE-unwrap (identical material), DEK round-trip, and tamper rejection — all PASS on real hardware.

Still requires a signed build (not exercised here): keychain persistence of the SE key under the team access group. macOS AMFI SIGKILLs an ad-hoc binary carrying a restricted keychain-access-groups entitlement, so this needs a real Developer-ID-signed bundle. Note: this is the same SecItem path the shipping attestation key already uses — no new logic in this PR depends on it beyond what's already in production.

Net: the entire data path + SE crypto introduced by this PR is hardware-validated; the one unexercised piece is production-proven keychain storage that only a signed build (release CI) can run.

Master was force-updated, diverging the branch base. Single conflict in
coordinator/cmd/coordinator/main.go — purely from the rewritten base,
NOT from any KV-cache change (this branch is provider-swift only).

The conflict was two forms of the same SetOnLateSecurityInfo callback:
the branch carried the older inherited version (LookupDevice HTTP inside
the per-provider lock); master refactored it to collect candidates under
the lock and do the HTTP lookups outside it. Resolved by taking master's
version of the whole file — same feature, better lock discipline, and
the branch has no independent change there.

Submodule pointer (libs/mlx-swift-lm @ 6f79f04, the PrefixCachePersistence
hook) preserved. Coordinator builds clean post-merge.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 29, 2026

This PR introduces an opt-in encrypted SSD KV-cache (prefix cache) backed by a SE-wrapped KEK, wires weightHash through to the engine for cache identity binding, and adds a #if DEBUG-only transient SE key factory; the changes are net-neutral-to-positive on security posture for the production path, but the new SSD cache feature introduces attack surface not fully covered by existing threats.


Trust boundaries touched

TB Name Relevance
TB-003 Provider operator vs. process New KV cache files written to SSD; KEK lifecycle via SE
TB-007 Provider inference engine Prefix cache sharing changes cross-tenant isolation assumptions
TB-009 Apple attestation chain SE key reused for ECIES KEK-wrap, expanding its use beyond signing

Per-threat assessment

T-007 — Provider serves manipulated model outputs (BatchScheduler.swift)
✅ Strengthens. The weightHash is now passed all the way into makeBatchedEngine and used as the cache identity key for the prefix-cache persistence layer (line makeEncryptedPrefixPersistenceIfEnabled). A re-download under the same modelId with different weights will produce a different weightHash and therefore miss the cache, preventing stale KV from being served as if it came from the new weights. This is a positive improvement to the weight-binding story, though coordinator-side enforcement remains fail-open (SEC-007, unchanged).

T-028 — Residual inference data in GPU memory (BatchScheduler.swift)
⚠️ Weakens slightly. Previously prefixCache: nil was hardcoded (marked // SECURITY: TB-007). The cache is now conditionally live when DARKBLOOM_PREFIX_CACHE=1. Prefix blocks contain decoded KV-cache tensors, which are themselves derived from decrypted prompt tokens. The comment in the diff explicitly acknowledges "cross-tenant sharing / TTFT side-channel" is NOT closed by this change. The RAM-resident prefix cache (PrefixCacheRAM) holds KV tensors across requests; a subsequent tenant's request that partially matches a prior prompt's prefix will read those tensors. This is a new in-memory cross-tenant data residue path that did not exist before this PR. The feature is opt-in and flag-gated, which limits exposure, but operators who enable it in production should understand this is explicitly out-of-threat-model.

T-008 — Provider sends plaintext SSE chunks on encryption failure (ProviderLoop.swift)
ℹ️ Neutral. The only change here is passing weightHash: modelInfo.weightHash to loadModel. No change to encryption error handling.

T-009 — Swift provider excluded from private-text routing (ProviderLoop.swift)
ℹ️ Neutral. No change to privacy flag reporting or the routing gate.

T-033 — Attestation blob replay (PersistentEnclaveKey.swift)
ℹ️ Neutral. makeTransient() is #if DEBUG-gated and confirmed compiled out of release builds. Transient keys cannot satisfy coordinator attestation (different public key fingerprint), so no replay surface is introduced in production.

T-035 — Provider denies actions after key rotation (PersistentEnclaveKey.swift)
ℹ️ Neutral. privateKey visibility changed from private to internal to allow PersistentEnclaveKey+ECIES to call SecKeyCreateDecryptedData. The private material never leaves the SE; this is a handle-level change. No new export path is opened.


New attack surface NOT covered by existing threats

NEW-001 — SSD KV cache files as an exfiltration / tampering target (not covered)

Files: provider-swift/Sources/ProviderCore/KVCache/EncryptedKVStore.swift, EncryptedPrefixCachePersistence.swift, KVCacheKEK.swift, KeyWrappingService.swift (not shown in diff)

The encrypted SSD cache persists KV tensors derived from decrypted consumer prompts to disk. The threat model notes the operator has full filesystem read access (TB-003) and the X25519 key was historically the highest-risk on-disk secret. Now there is a second category of on-disk data: SE-encrypted KV blobs keyed by a Keychain-persisted KEK. Relevant concerns:

  1. KEK persistence in Keychain: The KEK is stored under the same access group (SLDQ2GJ6TL.io.darkbloom.provider) as the attestation key. The existing TB-003 limitation already notes any same-team-ID binary with the keychain-access-groups entitlement can read this group. The KEK is now an additional item in that group, so a same-team binary could load the KEK and decrypt SSD cache files offline without the SE — this is a new, concrete exfiltration path for prompt-derived data that does not require defeating the SE.

  2. Cache file path/naming leaks model usage: Even if contents are encrypted, the existence and size of cache files on the SSD reveals which model prefixes were recently used, potentially leaking information about consumer activity to the operator.

  3. Cache poisoning by a malicious operator: If an operator can write to the SSD cache directory, they could swap encrypted blobs. The AEAD on each block (XSalsa20-Poly1305 or equivalent — exact scheme not visible in the truncated diff) should detect tampering at read time, but the failure mode (silently skip vs. crash vs. log) is not visible here and should be confirmed to fail closed.

Recommended action: Before enabling DARKBLOOM_PREFIX_CACHE in any production deployment, confirm:

  • The KEK Keychain item has tighter access control (e.g. kSecAttrAccessibleWhenUnlockedThisDeviceOnly + kSecAccessControlUserPresence or at minimum a separate access group from the attestation key).
  • AEAD decryption failures on cache read terminate the request rather than serving stale/empty KV silently.
  • The threat model is updated to cover on-disk KV-derived data as a new asset category.

NEW-002 — privateKey visibility widened to internal without audit of all in-package callers

File: PersistentEnclaveKey.swift line ~70

Changing privateinternal on the SecKey handle means any future file added to ProviderCore can call SecKeyCreateDecryptedData (or other Security framework functions) on it directly, bypassing the +ECIES extension's intended API surface. This is low-risk today but represents a module-level capability expansion. Suggest adding a // MARK: - intentionally internal for ECIES extension only comment and a CI lint rule (e.g. periphery or a custom grep check) to prevent additional callers.


Open findings resolved by this PR

None of the SEC-* open findings are resolved by this PR. SEC-007 (weight hash fail-open) is partially addressed at the cache-identity layer but the coordinator-side gate is unchanged.


🔐 Threat model: docs/threat-model.yaml · Updates on each push to this PR

…ning)

Addresses the threat-model-review finding that PersistentEnclaveKey.
makeTransient() — the entitlement-free transient-SE-key factory — was
public and ungated, so it shipped in release builds.

Independent verification confirmed the factual claim (public, no guard)
but found the exploit risk overstated: only the test-only kv-se-harness
calls it (production uses loadOrCreate), and a transient key can't pass
coordinator-side attestation anyway (SIP/SecureBoot/MDA gates). Still,
compiling a test-only entitlement-free SE-key path out of release builds
is correct defense-in-depth, so it's now #if DEBUG.

Verified: debug build (harness) still compiles with makeTransient;
release ProviderCore compiles without it (nothing else references it).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 581cc60106

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

try? FileManager.default.createDirectory(at: dir, withIntermediateDirectories: true)

let binding = PrefixCacheModelBinding(
modelHash: modelId, modelDtype: "unknown", modelArch: "unknown",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bind prefix cache entries to weight hash

When DARKBLOOM_PREFIX_CACHE is enabled, this stores both the cache directory key and metadata.modelHash from the mutable model id rather than the downloaded weight identity. The model catalog/download path already supports replacing snapshots/local for the same model.id with a different aggregateSHA256, so after a model registry update or re-download under the same id, old encrypted KV files in darkbloom/kv/<sha(modelId)> still pass the loadBlock model/shape checks and can seed generation with KV computed from previous weights, corrupting outputs. Use the manifest aggregate/weight hash (or clear this directory on model replacement) as the binding instead of modelId alone.

Useful? React with 👍 / 👎.

return await container.perform { ctx -> BatchedEngine in
let prefixCache: PrefixCache? = persistence.map {
PrefixCache(
config: PrefixCacheConfig(blockSize: 256, maxBlocks: 4096),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bound in-memory prefix cache by KV budget

With DARKBLOOM_PREFIX_CACHE enabled, this creates an in-memory engine prefix cache that can retain 4096 * 256 tokens worth of KV outside the scheduler's active kvBudget; on larger models or shared long system prompts, that can consume tens or hundreds of GB of unified memory before the cache evicts, causing provider OOMs even though admission control still thinks requests fit. Derive maxBlocks from the model's KV-byte estimate/available memory or make it a much smaller operator-tuned budget.

Useful? React with 👍 / 👎.

metaState: [layoutJSON],
chunkPlaintextSizes: chunks.map { $0.count }
)
try EncryptedKVStore.writeSync(to: fileURL(blockHash), metadata: meta, chunks: chunks, kekKey: kekKey)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add eviction for persisted prefix blocks

When the opt-in prefix cache is enabled, every block evicted from the engine is written to this per-model directory, but this persistence backend never deletes entries or enforces a byte/TTL limit. A provider serving diverse prompts will keep accumulating .darkbloom-kv files until the user's cache volume fills, which can break later cache writes and model downloads; add a disk budget/sweep or skip persistence when the budget is exhausted.

Useful? React with 👍 / 👎.

Comment thread docs/threat-model.yaml Outdated
Comment on lines +286 to +288
io.darkbloom.provider.attestation-signing.v1. Subsequent launches load the
existing key by access group and label query. The private key never leaves the
SE silicon; only signing operations are available. ProviderLoop tries the
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Correct SE key label and allowed operations

This threat-model update is stale relative to the code it documents: PersistentEnclaveKey.defaultLabel is ...attestation-signing.v2, and this same key is now also used by PersistentEnclaveKey+ECIES for KEK unwrap/decryption, not only signing. Because this document is the security sign-off record for the new KV cache, leaving it as v1/signing-only hides the actual keychain item to audit and the new decryption capability granted to same-team entitled binaries.

Useful? React with 👍 / 👎.

A second review pass (crypto-security-auditor) on the G1/G2 fixes found
two more instances of the same uncatchable-crash class still reachable in
the foreign/tampered-file threat model — both defeat G2's invariant that a
malformed file becomes a recoverable cold miss, never a process crash:

- State-setter array count: validateMetaState checks metaState but NOT the
  per-layer state-array count. reconstruct's "c.state = arrays" runs the
  KVCacheSimple/RotatingKVCache state setter, which fatalErrors unless
  given exactly 2 arrays. deserialize only checked the AGGREGATE chunk
  count, so a foreign file with 1 or 3 arrays in a layer crashed the
  provider. Now require 0 or 2 arrays per layer (throw otherwise).

- MLXArray init precondition: validateLayout binds dims [1]/[3] but not
  that shape.product*dtype.size == chunk byte length, nor that dims are
  non-negative / non-overflowing. MLXArray(data:shape:dtype:) hard-traps
  (shapePrecondition) on a mismatch, and shape.reduce(1,*) traps on a
  negative dim or Int overflow. deserialize now computes an overflow-safe
  expected byte count and rejects any descriptor whose chunk length
  disagrees, before constructing the MLXArray.

Both throw KVCacheSerializerError.reconstructionFailed -> cold miss.
Tests: deserializeRejects{WrongStateArrayCount,ShapeByteLengthMismatch,
NegativeDimShape} -- each would crash the test process pre-fix. 100 KV
tests green.
@anupsv
Copy link
Copy Markdown
Contributor Author

anupsv commented May 29, 2026

Follow-up adversarial review — confirming the Codex bug class isn't lurking elsewhere

Ran a multi-lens sweep of the whole encrypted-KV-cache surface (6 lenses: binding-completeness, file-lifecycle/durability, crypto-correctness, serialization-fidelity, index-integrity/paths, concurrency/integration), each finding adversarially verified by an independent skeptic, plus a completeness critic. 11 raw findings; 8 refuted (unreachable / already-guarded / out-of-scope roadmap items), the rest confirmed and fixed below. Two further residuals surfaced when a crypto auditor reviewed the fixes themselves.

Fixed

1. fetchPrefix length desync — corrupts generation (submodule, found by 2 lenses independently)
On a cold (persistence) hit with a saturated block pool (allocateBlock()==nil), the loaded block was merged into the returned cache but cachedTokens was computed from matchedIds (GPU-resident only), undercounting it. remainingTokens then overlapped the seeded KV → the scheduler re-prefilled those positions → duplicated KV at wrong offsets → corrupted output. Fix: cachedTokens = matchedPerBlock.count * bs (the actual merged width). Submodule cc79368.

2. KV shape not bound to the model (G1) — the MB-1 guard compared only metadata integers; the tensors that seed attention come from layout.layers[].arrays[].shape, never cross-checked. A self-consistent file whose layout disagrees with the live model (weights changed under same id, or a foreign file) would seed wrong-shaped KV. Added validateLayout (binds rank + dims [1]=kvHeads, [3]=headDim) in both load paths.

3. metaState fed to fatalError-ing setters (G2)reconstruct handed unvalidated strings to the engine's metaState setters, which fatalError (uncatchable) on malformed input. A single stale/foreign RotatingKVCache file would crash the provider. Pre-validate and throw → cold miss.

4. SSD path traversal (B)loadFromSSD joined the unauthenticated index's entry.relativePath to cacheDir (../ could escape). Now the path is reconstructed deterministically from the trusted binding + digest; the stored relativePath is ignored.

5–6. Two residual uncatchable-crash gaps (found reviewing the fixes) — same class as G2, still reachable: the state setter also fatalErrors on array-count≠2 (only aggregate chunk count was checked), and MLXArray(data:shape:dtype:) hard-traps when shape.product*dtype.size != byteCount (or on a negative/overflowing dim). Both now throw → cold miss.

Refuted (no action)

Torn/orphan-file "permanent defeat" (atomic write + self-heal on re-store), serializer fatalError on inconsistent layout (no producible/forgeable input reaches it — GCM AAD + chunk-size pins), layout.version skew (single writer, version gate), TTL/expiry "stale forever" (reserved field, not a control; not the seed class), unbounded disk growth (documented P6, unwired path).

Coverage

100 provider-swift KV tests + 11 submodule prefix-cache tests green. New regressions: cold-saturated-pool no-overlap, validateLayout accept/reject, malformed-metaState (3), wrong-array-count, shape/byte-length mismatch, negative-dim, malicious-relativePath ignored; the MB-1 and prefix-hash manager tests were rewritten to exercise the residual threats (model-dir-prefix collision; same-dir on-disk swap) now that the path is reconstructed.

Commits: cc79368 (submodule), da3fb6a3, 413c3082.

New docs/ssd-kv-cache.md documents the encrypted SSD KV cache as it
actually runs: the two tiers (wired engine block cache + the built-but-
unwired checkpoint manager), the store/evict/load data path, the DBKV file
format, the SE-wrapped-KEK / per-file-DEK envelope, KVCacheSerializer, the
DARKBLOOM_PREFIX_CACHE flag + prerequisites, the load-path verification
ladder (model/shape/prefix-hash/path/tensor-shape/decode-safety, all fail
closed to a cold miss), exact-checkpoint matching, failure modes, on-disk
layout, the TB-007 security model, and a code/test map. Cross-linked from
the design doc, which remains the rationale/threat-model/phased plan.
anupsv added 2 commits May 29, 2026 12:38
…P2s)

Addresses 3 Codex P2 findings on the encrypted SSD KV cache (all only
matter when DARKBLOOM_PREFIX_CACHE is enabled, default off):

- Weight-hash binding (#1): the cache dir key and metadata modelHash were
  derived from the mutable modelId. A re-download under the same id with
  different weights left old encrypted KV that still passed the
  model/shape guards and could seed generation with stale-weight KV,
  corrupting output. Now bound to ModelInfo.weightHash when available
  (falls back to modelId) for both the directory and the MB-1 binding, so
  a weight change yields a fresh dir + binding and old KV is invalidated.

- Memory budget (#2): the engine block cache used a fixed maxBlocks=4096
  (4096*256 tokens of KV held OUTSIDE the scheduler kvBudget) -> tens to
  hundreds of GB on large models -> OOM even though admission thinks
  requests fit. maxBlocks is now derived from a memory budget
  (DARKBLOOM_PREFIX_CACHE_MAX_GB, default 1/8 physical RAM) and the
  model's per-token KV bytes; the cache disables itself if even one block
  would not fit.

- Disk eviction (#3): EncryptedPrefixCachePersistence wrote a file per
  evicted block and never cleaned up, growing until the volume filled
  (breaking later cache writes and model downloads). Added an LRU
  byte-budget sweep (DARKBLOOM_PREFIX_CACHE_DISK_GB, default 10 GB; 0 =
  unlimited), amortized so the directory scan does not run on every block.

Tests: prefixCacheBindingId / prefixCacheMaxBlocks helpers (weight binding
+ memory scaling/clamp); disk-budget eviction keeps usage within budget
while evicting oldest. 101 KV + 10 BatchScheduler tests green.
… doc

- threat-model.yaml (#4): the SE-key block was stale relative to the code
  it documents as the KV-cache security sign-off record. Corrected the
  label v1 -> v2 (current defaultLabel; legacy v1 is migrated on load) and
  documented that the same persistent SE key now performs ECIES
  unwrap/decryption for the KV-cache KEK, not only ECDSA signing -- so the
  decryption capability granted to entitled same-team binaries and the
  actual keychain item to audit are both visible.

- ssd-kv-cache.md: synced to the weight-hash binding, memory-bounded
  maxBlocks, and disk-eviction behavior, plus the two new budget env vars
  (DARKBLOOM_PREFIX_CACHE_MAX_GB / _DISK_GB).
@anupsv
Copy link
Copy Markdown
Contributor Author

anupsv commented May 29, 2026

Codex review (round 2) — verified + addressed

All four P2 findings verified against the code and fixed. Commits 2ebbcea0 (code+tests) and 3ed78753 (docs).

#1 — Bind prefix cache to weight hash (BatchScheduler:320): real, fixed.
Confirmed: the dir key and metadata.modelHash came from the mutable modelId, so a re-download under the same id with different weights left old KV that still passes the model/shape guards → stale-weight KV could seed generation. Now bound to ModelInfo.weightHash when the catalog provides it (falls back to modelId), for both the on-disk directory and the MB-1 binding — a weight change yields a fresh dir + binding, so old KV is neither found nor accepted. (ModelInfo.weightHash was already available at the load-model call site; threaded through loadModel → makeBatchedEngine → makeEncryptedPrefixPersistenceIfEnabled.)

#2 — Bound in-memory prefix cache by KV budget (BatchScheduler:235): real, fixed.
Confirmed: maxBlocks=4096 × blockSize=256 ≈ ~400 GB of KV worst-case, held outside the scheduler's kvBudget. maxBlocks is now derived from a memory budget (DARKBLOOM_PREFIX_CACHE_MAX_GB, default 1/8 physical RAM) ÷ the model's kvBytesPerToken (reusing the existing estimator), clamped to the old ceiling; the cache disables itself if even one block won't fit.

#3 — Add eviction for persisted prefix blocks (EncryptedPrefixCachePersistence:75): real, fixed.
Confirmed: the wired backend wrote a file per evicted block with no cleanup → unbounded growth. Added an LRU byte-budget sweep (DARKBLOOM_PREFIX_CACHE_DISK_GB, default 10 GB; 0 = unlimited), evicting oldest .darkbloom-kv by mtime, amortized so the dir scan doesn't run on every block. (Note: an earlier review pass had refuted "unbounded growth" — but that was about the unwired PrefixCacheManager tier; Codex correctly targets the wired EncryptedPrefixCachePersistence, which did need this.)

#4 — Correct SE key label and allowed operations (threat-model.yaml:288): real, fixed.
Confirmed stale: defaultLabel is ...attestation-signing.v2 (doc said v1), and the same persistent SE key is now also used by PersistentEnclaveKey+ECIES for KEK unwrap/decryption (doc said "only signing operations are available"). Corrected the label (v1→v2, legacy migrated on load) and documented the ECIES decryption capability + the keychain item to audit, in all three places the doc referenced the key.

Tests: prefixCacheBindingId / prefixCacheMaxBlocks (weight binding + memory scaling/clamp) and disk-budget eviction (stays within budget, evicts oldest). 101 KV + 10 BatchScheduler tests green. The how-it-works doc (docs/ssd-kv-cache.md) is synced to all three behaviors + the new env vars.

The on-disk prefix-cache budget defaulted to a flat 10 GB regardless of
the actual disk. Make the default 50% of the cache volume's free capacity,
measured live at model load (volumeAvailableCapacityForImportantUsage,
falling back to raw available capacity). The DARKBLOOM_PREFIX_CACHE_DISK_GB
env override still wins (0 = unlimited); a near-full disk yields a tiny
positive budget (evict-almost-everything) rather than the env's
0-means-unlimited; if free space can't be read, falls back to 10 GB.

Split into a pure, testable policy (resolveDiskBudget) + a free-space
probe (volumeFreeBytes). Tests: 50%-of-free, env override, env-0-unlimited,
near-full -> tiny positive, unknown -> 10 GB fallback, and a live
free-capacity read. 113 KV + BatchScheduler tests green.
…versized writes

Address the adversarial review of the budget/binding changes.

- Env crash (medium, x2): prefixCacheBudgetBytes (MAX_GB) and
  resolveDiskBudget (DISK_GB) did Int(gb * 2^30) guarded only by a sign
  check. Double("inf")/"1e400"/huge values pass and Int(Double) TRAPS on
  non-finite/overflow (uncatchable) -> provider crash on model load. MAX_GB
  is read unconditionally to size maxBlocks, so it crashed even with the
  cache OFF. Now reject non-finite / out-of-range values back to the
  default. Extracted a pure resolveMemoryBudget mirror for testability.

- Orphaned directories (low): keying the cache dir by weightHash (the prior
  weight-binding fix) created a fresh, never-swept directory on every
  re-download. Key the dir by the MODEL id instead (stable) and keep the
  MB-1 binding on the weight hash: stale-weight files are rejected AND
  deleted by loadBlock on access and aged out by the sweep — invalidation
  without leaking dirs.

- Write-then-delete treadmill (low): when a single block exceeds the disk
  budget (tiny env value / near-full disk) saveBlock wrote then immediately
  evicted it. Skip the write up front instead.

Tests: resolveMemoryBudget + resolveDiskBudget now cover inf/NaN/overflow
(would have trapped pre-fix); write-skip-when-oversized; delete-on-MB1-
mismatch. Docs: corrected the sample log line, the dir-key scheme, and an
explicit per-model (not global), measured-once disk-bounding note. 116 KV +
BatchScheduler tests green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants