feat(kv-cache): encrypted SSD KV cache — primitives, verification, and flag-gated engine integration#237
feat(kv-cache): encrypted SSD KV cache — primitives, verification, and flag-gated engine integration#237anupsv wants to merge 27 commits into
Conversation
…arrival path Provider (Swift): - Add NetworkPowerAssertion: IOKit assertions for NetworkClientActive and BackgroundTask, acquired for the entire provider session - Keeps the macOS network stack alive during sleep so APN pushes (courier.push.apple.com:5223) can be delivered for MDM commands - No root required — uses IOPMAssertionCreateWithName API Coordinator: - Increase SecurityInfo timeout from 30s to 90s to cover Power Nap cycles (every ~15 minutes on AC power) - Add OnLateSecurityInfoCallback: when a SecurityInfo webhook arrives after the 90s timeout, retroactively upgrade the self_signed provider to hardware trust instead of silently dropping the response - Wire up the callback in main.go alongside the existing SetOnMDA callback, using the same ForEachProvider + serial match pattern
…in-backed Reflects commit 4a0dae5 (PersistentEnclaveKey.swift). Key changes: - TB-003 how_it_works: document Security framework persistent key, access group SLDQ2GJ6TL.io.darkbloom.provider, kSecAttrIsPermanent, and the errSecMissingEntitlement fallback behaviour on patched binaries - TB-003 current_limitations: add two new limitations — team-scoped cross-binary keychain access, and silent ephemeral fallback that defers rejection to the coordinator rather than failing at the process boundary - TB-009 how_it_works: rewrite SE key lifecycle section to reflect persistent identity across restarts; rotation now requires explicit keychain deletion - T-013 (binary tampering) mitigations: add keychain access group enforcement as a fourth, implemented mitigation; update detection_hint - T-033 (attestation replay) affected_files: add PersistentEnclaveKey.swift and AttestationSigner.swift; update mitigation wording - T-035 (repudiation after rotation) description, mitigations, detection_hint: reframe rotation as an explicit operator action rather than automatic per launch; note kSecAttrIsPermanent as a positive mitigation; add open finding that coordinator cannot detect opportunistic keychain delete + re-registration Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
First phase of SSD KV cache per docs/ssd-kv-cache-design.md. P0
delivers the cryptographic plumbing only; no integration with
BatchScheduler yet (that's P3).
Adds:
- `KVCache/KeyWrappingService` — protocol abstracting wrap/unwrap
of the KEK at rest. Two impls: `InMemoryKeyWrappingService`
(tests) and `SecureEnclaveKeyWrappingService` (prod; ECIES
via the existing persistent SE identity using
`.eciesEncryptionStandardX963SHA256AESGCM`).
- `KVCache/WrappedKEKStorage` — protocol abstracting the at-rest
persistence layer. `KeychainWrappedKEKStorage` for prod (data-
protection Keychain with `kSecAttrAccessibleAfterFirstUnlock-
ThisDeviceOnly`), `InMemoryWrappedKEKStorage` for tests. Two
layers (wrap + storage) so the entitlement-gated Keychain path
is independently swappable from the SE-bound wrap path; both
are individually exercised in tests.
- `KVCache/KVCacheKEK` — actor that generates/persists the KEK
on first run, holds the unwrapped key in actor-isolated state,
and exposes per-file DEK wrap/unwrap with AAD binding.
- `KVCache/EncryptedKVStore` — on-disk format codec for the
`.darkbloom-kv` file. Layout: magic ‖ version ‖ flags ‖
file_IV ‖ wrapped_DEK ‖ metadata-JSON ‖ encrypted chunks. Per-
chunk nonces derived via HKDF-SHA256 from (DEK, file_IV,
chunk_index) — no nonce reuse, no nonce storage. Metadata
bytes are bound as AAD on every chunk seal AND on the DEK
wrap, so tampering with any field surfaces as an authentication
failure on the first chunk decrypt (and on DEK unwrap too,
via belt-and-suspenders).
- `Security/PersistentEnclaveKey+ECIES.swift` — adds
`eciesEncrypt/eciesDecrypt` to the existing SE identity. The
only modification to existing code is bumping `privateKey` from
`private` to `internal` so the extension can reach it; the SE
still owns the key material.
Threat model + format details: docs/ssd-kv-cache-design.md.
Tests:
- KeyWrappingServiceTests — 7 cases: wrap roundtrip, unique
ciphertexts per call, tamper detection (ct + tag), wrong key,
malformed input, empty plaintext.
- KVCacheKEKTests — 10 cases including SE-roundtrip
+ Keychain-storage roundtrip (both skip on missing
entitlement, same pattern as existing PersistentEnclaveKey
tests).
- EncryptedKVStoreTests — 13 cases covering single/multi-chunk
roundtrip, metadata-only read, tamper at magic / version /
metadata / chunk-ct levels, truncation, wrong-KEK,
chunk-count/size mismatch, and the HKDF nonce derivation
determinism + diversity properties.
All 32 tests pass (2 skip on missing keychain-access-groups
entitlement in unsigned debug builds; same pattern as
PersistentEnclaveKeyTests).
Corrects a misleading claim in the SSD KV cache design: §5.1
previously stated that `model_hash` in the metadata AAD "binds the
cache to a specific model file — reloading after a model upgrade fails
closed." That overstated what the crypto does.
The AES-GCM layer uses the file's OWN metadata (read back from disk) as
AAD on both the DEK unwrap and every chunk seal. This proves the bytes
weren't altered since write (tamper-evidence) but does NOT verify the
file belongs to the currently-loaded model: a structurally valid cache
file authored for model A decrypts cleanly while model B is loaded,
because both supply the file's embedded metadata as AAD. The cipher
cannot distinguish "right model" from "wrong model."
Adds:
- §5.1 rewritten to spell out what the AAD does (tamper-evidence)
vs does NOT (model-binding).
- §8.1.1 new section defining invariant MB-1: PrefixCacheManager
MUST check meta.modelHash == currentLoadedModelHash (and an
architectural-shape guard for 12-char hash-prefix collisions)
BEFORE unwrapping the DEK or decrypting. Includes the guard
pseudocode and three P3 regression tests that must fail if the
guard is removed.
- §10 two new failure-mode rows (cross-model file, shape mismatch),
each noting they're caught by MB-1, not the crypto.
- §14 P3 phase now explicitly includes MB-1 + its rejection tests.
- docs/ssd-kv-cache-model-binding.{mmd,svg} — read-path flowchart
showing where MB-1 sits (between metadata-read and decrypt) and a
callout box on what the AES-GCM layer does vs doesn't guarantee.
Animated SVG (draws on in a browser; renders static when embedded).
No code change — P0 crypto is unaffected; MB-1 is an application-layer
guard that lands with the P3 BatchScheduler integration.
…dow models
Empirically resolves the open correctness question gating prefix cache
for sliding-window models (GPT-OSS-20B, Gemma-4 26B-A4B MoE). Two
review agents disagreed on whether RotatingKVCache survives
snapshot -> restore -> resume; an adversarial lens claimed
temporalOrder() scrambles token order on restore + multi-token prefill.
These tests settle it from runtime behaviour. Each token t is encoded
as K=t, V=t+100 so the returned (keys,values) reveal exact token order;
any scrambling shows as a content mismatch vs a never-reset reference.
RotatingKVCacheRestoreTests (single-stream, the circular-buffer case):
- multi-token prefill after restore matches reference (the adversary's
exact scenario — wrap the buffer, snapshot, restore, prefill)
- single-token decode after restore matches
- pre-wrap restore matches
- omitting metaState on restore DOES corrupt order — proving idx/offset
serialization is load-bearing (the failure mode is real but only
under misuse: state without metaState)
BatchRotatingExtractRoundtripTests (the batched path our design uses):
- extract(row) isolates the correct row (no cross-row leakage)
- extract(row) -> snapshot -> restore -> resume matches extract ->
resume, for both multi-token and single-token continuations
Verdict: rotating-cache restore is CORRECT when state AND metaState are
restored together (the loadPromptCache contract). BatchRotatingKVCache
keeps a linear front-trimmed buffer (no circular wrap), and its
extract(idx) returns a single-stream RotatingKVCache with a fully
populated metaState (keep,maxCacheSize,step,offset,idx) — so our
design path (extract row -> single-stream -> serialize) sidesteps the
batched cache's own incomplete metaState. The metaState-sync
requirement is now a guarded invariant.
All 7 tests pass. No production code changed.
Rewrites §4.4 and adds §4.5 to reflect what was verified against the
actual mlx-swift-lm source + the empirical rotating-restore tests.
Corrections to the earlier draft:
- "slice the first N columns" was wrong: caches are 4-D
[B,kvHeads,seq,headDim], sliced on axis 2, and the snapshot is taken
by extractBatched(row) from the live batched caches — not naive
slicing.
- Arbitrary longest-prefix match (old O2) is DROPPED for the models we
serve. Recurrent (Mamba/GatedDeltaNet) and sliding-window layers
cannot be sliced to a shorter prefix, so reuse is EXACT-CHECKPOINT
only: hit when the incoming prompt's prefix is byte-identical to a
cached checkpoint boundary (e.g. end of a system prompt). Covers the
dominant shared-system-prompt case.
New §4.5 — verified per-model cacheability (cache type detected at LOAD
time, never hardcoded; MoE is irrelevant — it only changes the FFN):
- Qwen3.5 MoE / Qwen3-Next: hybrid MambaCache + KVCacheSimple →
exact-checkpoint (recurrent state restorable at boundary, not
sliceable).
- Gemma-4 26B-A4B MoE: sliding RotatingKVCache(512) + full; only 15
non-shared caches to snapshot (20 layers KV-share, auto-reconstructed).
- GPT-OSS-20B MoE: sliding RotatingKVCache(128) + full; attention
sinks are learned weights, not KV state → no snapshot impact.
- Unsupported cache types (Chunked/Quantized/CacheList/DeepseekV4
pooling) gated out at load time → cold path, no error.
Invariant MS-1 (metaState-sync): extracted caches must persist
metaState in sync with state (RotatingKVCache's idx/offset drive
temporalOrder). EncryptedKVStoreMetadata already has the metaState
field; regression-guarded by omittingMetaStateOnRestoreCorruptsOrder.
Also: status line updated (P0 landed + cacheability verified), O2 marked
dropped, per-request flow diagram shows exact-checkpoint + MB-1 guard,
§15 records what verification resolved + adds [Q8].
The decrypted RAM tier of the SSD KV cache (design §4.1). Holds
recently-used prefix KV snapshots as live [any KVCache] (one extracted
single-stream cache per layer) so a repeat request whose prompt prefix
is byte-identical to a cached checkpoint hits RAM instead of decrypting
from SSD or running a cold prefill.
Scope: RAM only — no SSD, no encryption, no BatchScheduler wiring (those
are P2-P4). Plain final class (non-Sendable: holds MLXArrays), to be
owned by the PrefixCacheManager actor in P3; tests run single-threaded.
Design points:
- Keyed by (modelHash, prefixDigest). modelHash is the locally-computed
weight hash, so a lookup for model B structurally cannot return model
A's entry — the RAM-tier half of MB-1.
- get() returns copy() of each stored cache (upstream's blessed clone,
state.map { $0[.ellipsis] }), so a consumer can seed a batch row and
decode into the returned caches without corrupting the stored
snapshot. This is the load-bearing invariant.
- LRU eviction by a monotonic use-counter (not wall-clock — keeps it
deterministic, avoids Date.now()), bounded by BOTH an entry count and
a byte budget (measured via MLXArray.nbytes over each cache's state).
- clear(modelHash:) for model unload; entriesForFlush(modelHash:)
returns copies for the P4 SSD-flush path without removing.
Tests (9, all pass):
- hit/miss + tokenCount round-trip
- MB-1: lookup keyed by modelHash (no cross-model bleed)
- distinct digests are distinct entries
- getReturnsIndependentCopy: mutate a returned copy (append 5 tokens),
assert the stored snapshot stays at its original offset — proves
copy()-on-get protects the snapshot
- LRU eviction by entry count (recently-used survives, LRU evicted)
- eviction by byte budget
- clear(modelHash:) drops only that model
- put replaces an existing key without leaking byte accounting
- entriesForFlush returns per-model copies without removing
The exact-checkpoint lookup layer (design §4.4, §7; [Q3] resolved to
JSON, not SQLite). No SSD I/O of cache payloads and no BatchScheduler
wiring yet — that's P3 (held for review).
PrefixDigest — checkpoint keys:
- Checkpoints at the O9 boundaries (256/512/1024/2048/4096/8192).
- For a prompt's token array, computes SHA-256 of the first c tokens
at each checkpoint c <= count, in a SINGLE pass by snapshotting the
rolling hash at each boundary. Proven (checkpointDigestEquals-
IndependentPrefixHash) to equal an independent hash of the first c
tokens — so a longer cached prefix is findable from a shorter shared
one, and two prompts sharing a system prompt agree on every
checkpoint digest within the shared region.
- Tokens hashed as little-endian Int64 after a domain-separation tag;
stable across machines, can't collide with other SHA uses.
PrefixCacheIndex — JSON-persisted, in-RAM:
- Maps (modelHash, digestHex) -> {relativePath, tokenCount, fileBytes,
createdAt, lastHitAt, hitCount}.
- findLongestCheckpoint(modelHash:tokens:): computes the prompt's
checkpoint digests and returns the entry for the LONGEST checkpoint
present for that model — the exact-checkpoint match. Partitioned by
modelHash (MB-1: model B can't match model A's entries).
- record / touch / remove / removeModel / entriesLRUFirst (eviction
order for P6) / rebuild(from:) (recover when JSON missing/corrupt).
- Atomic JSON write-back on save(); dirty-tracked. A corrupt index
file is treated as empty (logged), not fatal — SSD files are self-
describing and the index rebuilds from them.
- Timestamps passed in by the caller (now: Int64), keeping the type
deterministic and clock-free for tests.
Tests (16, all pass): digest determinism + prefix-sensitivity,
single-pass == independent-hash equivalence, shared-prefix agreement,
boundary/token-count handling; index exact + longest-checkpoint match,
divergent-prefix miss, MB-1 model scoping, touch metadata, remove/
removeModel, LRU ordering, JSON persistence round-trip, corrupt-file
recovery, rebuild.
A 5-dimension adversarial review (findings verified by refutation)
found 0 critical, 0 high, 6 medium, 9 low. Fixes for the cheap/
clearly-correct ones; the two genuinely-P3 items (PCR-1 sending-returns
across the actor boundary, XC-3 MB-1 guard + tests) are deferred to P3
where the PrefixCacheManager actor lands. KV-3 (redundant magic field)
is won't-fix.
Crypto / format:
- KV-1, XC-2: fix 3-way doc drift on per-chunk nonce derivation. The
code uses HKDF-Expand-only (Extract skipped — DEK is already a
uniform 256-bit key, RFC 5869 §3.3) with file_IV folded into `info`,
NOT file_IV as an HKDF salt. Header + byte-layout comments now match
the implementation and the function doc. (No crypto change — write
and read already shared deriveChunkNonce, so round-trip was correct.)
- KV-2: fsync the containing directory (F_FULLFSYNC, fsync fallback)
after the atomic rename in EncryptedKVStore.write, so a just-renamed
cache file is durable across power loss. Best-effort: a miss only
costs a cold prefill.
- KV-4: SecureEnclaveKeyWrappingService.unwrap classifies auth
failures by structured OSStatus (errSecDecode/errSecAuthFailed/
errSecParam) instead of substring-matching a locale-dependent
error string.
RAM tier:
- PCR-2: PrefixCacheRAM.put refuses (and counts) an entry whose own
size exceeds maxBytes instead of storing-then-self-evicting into a
silent no-op. put now returns Bool (@discardableResult).
- PCR-3: byte accounting uses innerState() (physical, step-allocated
buffers) instead of state() (trimmed logical view), so maxBytes
bounds true resident RAM.
Index / digest:
- PCI-1: PrefixCacheIndex.save writes via Data.write(.atomic) directly
(Foundation does aux-file + rename) instead of a manual tmp-<uuid> +
replaceItemAt, which could leak a UUID-named orphan on a crash
between write and replace with no sweep.
- PCI-3, XC-4: entriesLRUFirst adds a deterministic secondary key
(digestHex) so equal-lastHitAt entries order stably.
- PD-1: PrefixDigest.checkpoints dedups boundaries so a duplicated
caller-supplied boundary isn't double-emitted.
Tests:
- XC-1: new batchRotatingExtractMatchesIndependentSingleStreamReference
— builds the reference WITHOUT extract() (an independent single-
stream RotatingKVCache fed row 0's tokens) and compares resume,
proving extract() is semantically equivalent, not merely idempotent.
- new: ramRejectsEntryLargerThanByteBudget (PCR-2),
indexLRUTieBreakIsDeterministic (PCI-3).
- ramPutReplacesExistingKeyWithoutLeakingBytes rewritten to assert the
real no-leak invariant under physical byte accounting (PCR-3).
All KV cache tests pass (46 in the affected set; full P0-P2 suite green).
The missing primitive the SSD tier needs: convert an extracted [any KVCache] (one single-stream cache per layer, from BatchedCache.extractBatched) into raw byte chunks + a layout descriptor, and back. Chunks feed straight into EncryptedKVStore so plaintext KV NEVER touches disk — we deliberately do NOT route through upstream savePromptCache (it writes a plaintext .safetensors and its reconstruction helper is private). Byte round-trip via MLXArray.asData(.copy) / MLXArray(data:shape:dtype:) is dtype-agnostic, so bf16 round-trips exactly. Reconstruction uses each cache type's PUBLIC state + metaState setters. Scope — SSD-serializable: KVCacheSimple + RotatingKVCache (the attention + sliding-window caches Gemma-4 26B-A4B and GPT-OSS-20B use, plus all pure-attention models). NOT SSD-serializable: MambaCache / ArraysCache (recurrent). Their metaState setter deliberately traps (assertionFailure) and the real reconstruction path (ArraysCache.restoreFromMetaState) is `internal` to MLXLMCommon — unreachable from ProviderCore. Rebuilding recurrent state via the partial public API can't be verified correct without running the model, and a wrong recurrent state silently emits garbage tokens, so the serializer REFUSES recurrent caches (and any hybrid stack containing one) rather than guess. Consequence: hybrid models (Qwen3.5/Next) get the RAM tier only — which uses copy(), no serialization — not SSD persistence. SSD-for-Mamba is a documented follow-up gated on upstream exposing a public reconstruction. (This constraint was found by the test suite: an earlier attempt to set MambaCache.metaState tripped the upstream assertionFailure — caught before it could become a latent bug.) Also unsupported: ChunkedKVCache, QuantizedKVCache, CacheList. serialize throws on any unsupported layer; the P3 manager's load-time capability gate keeps them out first. Tests (7, all pass): KVCacheSimple state round-trip; bf16 exact fidelity; resume-equivalence for KVCacheSimple AND wrapped RotatingKVCache (reconstructed cache continues generation identically); Mamba rejected (RAM-only); Chunked rejected + attention/sliding stack accepted; full end-to-end serialize -> EncryptedKVStore.write (layout in metaState) -> read -> deserialize.
The three-tier orchestration layer (design §4), one manager per loaded
model. Standalone and fully tested; the BatchScheduler wiring is the
NEXT step and is deliberately not in this commit (review checkpoint
before touching live inference).
Closes the two review findings deferred from P0-P2:
- XC-3: the MB-1 model-binding guard now EXISTS and is enforced on the
SSD load path — readMetadataOnly first, verify metadata.modelHash ==
binding.modelHash AND the architectural shape (numLayers/kvHeads/
headDim) BEFORE unwrap/decrypt, drop the index entry + count the
mismatch on failure. This catches a wrong-model file the crypto
cannot (a valid file from another model decrypts cleanly because the
AAD is its own metadata).
- PCR-1: the non-Sendable [any KVCache] crosses the actor boundary via
documented @unchecked Sendable transfer types (PrefixLookupResult
out, SendableKVCaches in). `sending` is unusable here because values
produced through the actor-isolated PrefixCacheRAM are inferred into
the actor's region; the boxes are sound because the caches are always
fresh (RAM hits copy(), SSD hits freshly deserialized) and single-
owner — matching the codebase's existing UncheckedSendable idiom.
Behavior:
- lookup(tokens:): exact-checkpoint match, RAM tier (longest checkpoint
first) then SSD tier (MB-1-guarded read -> deserialize -> promote to
RAM). Returns fresh caller-owned caches + tokenCount + tier.
- store(tokens:checkpointLength:caches:): write-back to RAM only.
- flushToSSD(): serialize RAM entries not already on SSD -> encrypt via
EncryptedKVStore (layout JSON in metaState) -> record + save index.
Skips non-serializable stacks defensively.
- capability gate: ssdEnabled requires index+kek+cacheDir; a model
whose caches aren't KVCacheSerializer-supported (Mamba hybrids) runs
RAM-only. Timestamps injected (now:) for determinism.
Tests (8): RAM hit + longest-checkpoint-wins; full SSD round-trip
(store -> flush -> clearRAM -> SSD hit -> promote-to-RAM); restart
persistence across manager instances; MB-1 rejects a cross-model file
even when the index points B at A's file (the symlink/collision case) —
asserts modelMismatches counted; SSD-disabled-without-backing; miss on
sub-checkpoint prompt.
Full KV cache suite green: 98 tests.
The encrypted SSD backend for the engine's in-GPU block prefix cache (Path 2). Conforms to MLXLMCommon.PrefixCachePersistence (added to the submodule): the engine calls saveBlock on LRU eviction and loadBlock on a block-hash miss, so evicted blocks are AES-GCM-encrypted to disk (surviving eviction AND restart) instead of dropped + re-prefilled. Reuses our crypto primitives — EncryptedKVStore + KVCacheSerializer + the KEK — keyed by the engine's content-addressed block hash. (Note: this block-level integration supersedes the checkpoint-level PrefixCacheManager/Index/Digest/RAM for the live path — the engine already owns lookup/LRU/indexing; those layers remain for any non-engine use but aren't on this path.) EncryptedKVStore: refactored the body-build / header-assemble / atomic- write / chunk-decrypt into shared sync helpers, and added writeSync / readSync taking a SymmetricKey KEK directly. The engine step loop is synchronous and can't await the KVCacheKEK actor, so the persistence holds an already-unwrapped KEK and does synchronous crypto + I/O. The async write/read now delegate to the same helpers — format is identical (the 15 EncryptedKVStore tests still pass). EncryptedPrefixCachePersistence: sync saveBlock/loadBlock; MB-1 guard on load (metadata.modelHash + shape before decrypt); KVCacheSimple-only (matches the engine's prefix cache); best-effort save (never throws). SECURITY (TB-007): this adds encryption-at-rest (disk-theft defense) but does NOT close the in-process cross-tenant sharing / TTFT side-channel — the provider can't see tenant identity. Default-off flag + explicit threat-model sign-off required (the BatchScheduler flag is the next commit). Documented in the type header. Tests (5): save/load round-trip; on-disk bytes are encrypted (DBKV magic present, no plaintext); MB-1 rejects wrong model; wrong KEK -> nil (no crash); and END-TO-END through the real upstream PrefixCache — maxBlocks=1 forces eviction (saveBlock) then fetch reloads from encrypted SSD (loadBlock). EncryptedKVStore async suite still green.
Wires the encrypted prefix cache into live inference behind the
DARKBLOOM_PREFIX_CACHE env flag (default OFF — unset = exact current
behavior, prefixCache nil).
When set, makeBatchedEngine builds an EncryptedPrefixCachePersistence
and passes it as the engine's Scheduler.prefixCache, so the engine's
in-GPU block cache is backed by AES-GCM-encrypted SSD storage (evicted
blocks persist + survive restart; fetch misses reload from disk).
Guards before enabling:
- architecture must expose numLayers/kvHeads/headDim (from the
config.json already parsed in snapshotContainer) — else disabled.
- KEK must be Secure-Enclave-wrapped + Keychain-persisted (so files
survive restart). If unavailable (no SE / entitlement) we REFUSE to
enable rather than fall back to an ephemeral key that would silently
break restart-reuse.
- per-model dir keyed by sha256(modelId)[:12] under the OS cache dir.
SECURITY (TB-007): enabling re-opens the cross-tenant data-leak / TTFT
side-channel that was deliberately gated — the provider cannot see
tenant identity, so the cache is shared across consumers. This commit
adds encryption-at-rest (disk-theft defense) but does NOT close the
in-process channel. Ships ONLY behind the default-off flag with an
explicit operator threat-model sign-off. Loud warning logged on enable.
modelId is used as the model-binding key; weight-hash binding (to
invalidate on a weight change under the same id) is a documented
follow-up.
Full KV cache suite green: 103 tests. End-to-end live behavior on real
hardware still needs the 2-Mac smoke test (cannot run in CI).
Points libs/mlx-swift-lm at the additive PrefixCachePersistence change (branch feat/encrypted-prefix-cache-persistence, 6f79f04) that the encrypted prefix-cache wiring depends on. The submodule change is default-nil (no behavior change unless darkbloom passes a persistence backend), so this bump is safe with the flag off. NOTE: the submodule change lives on a branch, not yet merged to the submodule's main. The corresponding submodule PR must merge before this parent change can land on master.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Opt-in integration test (gated by DARKBLOOM_MODEL_TEST_DIR, so CI /
plain `swift test` skip it) that drives the REAL engine with a REAL
model through our encrypted prefix-cache path. Builds the BatchedEngine
with an EncryptedPrefixCachePersistence (injected in-memory KEK), then:
- generates a prompt with cache OFF (reference),
- generates the same prompt twice with cache ON,
- asserts the cache-ON greedy (temp 0) output is byte-IDENTICAL to the
reference — i.e. prefix KV reuse does not corrupt generation,
- asserts the cache engaged (hits > 0, tokens_saved > 0).
Validated on a real M5 Max with mlx-community/Llama-3.2-1B-Instruct-bf16:
output matched the uncached reference exactly and the cache saved 256
prefill tokens (one full block) on the repeat request.
…validation
Adds PersistentEnclaveKey.makeTransient() — a non-persisted Secure
Enclave key (kSecAttrIsPermanent=false, no keychain access group). Since
it never touches a keychain access group it needs NO
keychain-access-groups entitlement, so SE crypto (sign / ECIES) can be
exercised on real hardware from unsigned builds. (loadOrCreate, which
persists under the team access group, still requires a signed build —
that's the same SecItem path the production attestation key uses.)
Adds kv-se-harness — a TEST-ONLY standalone executable (not a product,
not shipped) that uses a transient SE key to validate the encrypted
KV-cache key path on real hardware without code signing:
- ECIES wrap (public) + unwrap (SE-private) via the new
PersistentEnclaveKey+ECIES / SecureEnclaveKeyWrappingService code,
- KVCacheKEK generate -> SE-wrap -> store -> read -> SE-unwrap,
recovering identical key material,
- DEK wrap/unwrap under the recovered KEK,
- tamper rejection (flipped wrapped-DEK byte fails auth).
Validated on a real M5 Max (unsigned): all five checks PASS, closing the
SE-crypto validation gap. The only piece still requiring a signed build
is keychain PERSISTENCE of the key (production-proven attestation path),
not any code this branch introduced.
Hardware validation — M5 Max (
|
Master was force-updated, diverging the branch base. Single conflict in coordinator/cmd/coordinator/main.go — purely from the rewritten base, NOT from any KV-cache change (this branch is provider-swift only). The conflict was two forms of the same SetOnLateSecurityInfo callback: the branch carried the older inherited version (LookupDevice HTTP inside the per-provider lock); master refactored it to collect candidates under the lock and do the HTTP lookups outside it. Resolved by taking master's version of the whole file — same feature, better lock discipline, and the branch has no independent change there. Submodule pointer (libs/mlx-swift-lm @ 6f79f04, the PrefixCachePersistence hook) preserved. Coordinator builds clean post-merge.
|
This PR introduces an opt-in encrypted SSD KV-cache (prefix cache) backed by a SE-wrapped KEK, wires Trust boundaries touched
Per-threat assessmentT-007 — Provider serves manipulated model outputs ( T-028 — Residual inference data in GPU memory ( T-008 — Provider sends plaintext SSE chunks on encryption failure ( T-009 — Swift provider excluded from private-text routing ( T-033 — Attestation blob replay ( T-035 — Provider denies actions after key rotation ( New attack surface NOT covered by existing threatsNEW-001 — SSD KV cache files as an exfiltration / tampering target (not covered) Files: The encrypted SSD cache persists KV tensors derived from decrypted consumer prompts to disk. The threat model notes the operator has full filesystem read access (TB-003) and the X25519 key was historically the highest-risk on-disk secret. Now there is a second category of on-disk data: SE-encrypted KV blobs keyed by a Keychain-persisted KEK. Relevant concerns:
Recommended action: Before enabling
NEW-002 — File: Changing Open findings resolved by this PRNone of the SEC-* open findings are resolved by this PR. SEC-007 (weight hash fail-open) is partially addressed at the cache-identity layer but the coordinator-side gate is unchanged. 🔐 Threat model: |
…ning) Addresses the threat-model-review finding that PersistentEnclaveKey. makeTransient() — the entitlement-free transient-SE-key factory — was public and ungated, so it shipped in release builds. Independent verification confirmed the factual claim (public, no guard) but found the exploit risk overstated: only the test-only kv-se-harness calls it (production uses loadOrCreate), and a transient key can't pass coordinator-side attestation anyway (SIP/SecureBoot/MDA gates). Still, compiling a test-only entitlement-free SE-key path out of release builds is correct defense-in-depth, so it's now #if DEBUG. Verified: debug build (harness) still compiles with makeTransient; release ProviderCore compiles without it (nothing else references it).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 581cc60106
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| try? FileManager.default.createDirectory(at: dir, withIntermediateDirectories: true) | ||
|
|
||
| let binding = PrefixCacheModelBinding( | ||
| modelHash: modelId, modelDtype: "unknown", modelArch: "unknown", |
There was a problem hiding this comment.
Bind prefix cache entries to weight hash
When DARKBLOOM_PREFIX_CACHE is enabled, this stores both the cache directory key and metadata.modelHash from the mutable model id rather than the downloaded weight identity. The model catalog/download path already supports replacing snapshots/local for the same model.id with a different aggregateSHA256, so after a model registry update or re-download under the same id, old encrypted KV files in darkbloom/kv/<sha(modelId)> still pass the loadBlock model/shape checks and can seed generation with KV computed from previous weights, corrupting outputs. Use the manifest aggregate/weight hash (or clear this directory on model replacement) as the binding instead of modelId alone.
Useful? React with 👍 / 👎.
| return await container.perform { ctx -> BatchedEngine in | ||
| let prefixCache: PrefixCache? = persistence.map { | ||
| PrefixCache( | ||
| config: PrefixCacheConfig(blockSize: 256, maxBlocks: 4096), |
There was a problem hiding this comment.
Bound in-memory prefix cache by KV budget
With DARKBLOOM_PREFIX_CACHE enabled, this creates an in-memory engine prefix cache that can retain 4096 * 256 tokens worth of KV outside the scheduler's active kvBudget; on larger models or shared long system prompts, that can consume tens or hundreds of GB of unified memory before the cache evicts, causing provider OOMs even though admission control still thinks requests fit. Derive maxBlocks from the model's KV-byte estimate/available memory or make it a much smaller operator-tuned budget.
Useful? React with 👍 / 👎.
| metaState: [layoutJSON], | ||
| chunkPlaintextSizes: chunks.map { $0.count } | ||
| ) | ||
| try EncryptedKVStore.writeSync(to: fileURL(blockHash), metadata: meta, chunks: chunks, kekKey: kekKey) |
There was a problem hiding this comment.
Add eviction for persisted prefix blocks
When the opt-in prefix cache is enabled, every block evicted from the engine is written to this per-model directory, but this persistence backend never deletes entries or enforces a byte/TTL limit. A provider serving diverse prompts will keep accumulating .darkbloom-kv files until the user's cache volume fills, which can break later cache writes and model downloads; add a disk budget/sweep or skip persistence when the budget is exhausted.
Useful? React with 👍 / 👎.
| io.darkbloom.provider.attestation-signing.v1. Subsequent launches load the | ||
| existing key by access group and label query. The private key never leaves the | ||
| SE silicon; only signing operations are available. ProviderLoop tries the |
There was a problem hiding this comment.
Correct SE key label and allowed operations
This threat-model update is stale relative to the code it documents: PersistentEnclaveKey.defaultLabel is ...attestation-signing.v2, and this same key is now also used by PersistentEnclaveKey+ECIES for KEK unwrap/decryption, not only signing. Because this document is the security sign-off record for the new KV cache, leaving it as v1/signing-only hides the actual keychain item to audit and the new decryption capability granted to same-team entitled binaries.
Useful? React with 👍 / 👎.
A second review pass (crypto-security-auditor) on the G1/G2 fixes found
two more instances of the same uncatchable-crash class still reachable in
the foreign/tampered-file threat model — both defeat G2's invariant that a
malformed file becomes a recoverable cold miss, never a process crash:
- State-setter array count: validateMetaState checks metaState but NOT the
per-layer state-array count. reconstruct's "c.state = arrays" runs the
KVCacheSimple/RotatingKVCache state setter, which fatalErrors unless
given exactly 2 arrays. deserialize only checked the AGGREGATE chunk
count, so a foreign file with 1 or 3 arrays in a layer crashed the
provider. Now require 0 or 2 arrays per layer (throw otherwise).
- MLXArray init precondition: validateLayout binds dims [1]/[3] but not
that shape.product*dtype.size == chunk byte length, nor that dims are
non-negative / non-overflowing. MLXArray(data:shape:dtype:) hard-traps
(shapePrecondition) on a mismatch, and shape.reduce(1,*) traps on a
negative dim or Int overflow. deserialize now computes an overflow-safe
expected byte count and rejects any descriptor whose chunk length
disagrees, before constructing the MLXArray.
Both throw KVCacheSerializerError.reconstructionFailed -> cold miss.
Tests: deserializeRejects{WrongStateArrayCount,ShapeByteLengthMismatch,
NegativeDimShape} -- each would crash the test process pre-fix. 100 KV
tests green.
Follow-up adversarial review — confirming the Codex bug class isn't lurking elsewhereRan a multi-lens sweep of the whole encrypted-KV-cache surface (6 lenses: binding-completeness, file-lifecycle/durability, crypto-correctness, serialization-fidelity, index-integrity/paths, concurrency/integration), each finding adversarially verified by an independent skeptic, plus a completeness critic. 11 raw findings; 8 refuted (unreachable / already-guarded / out-of-scope roadmap items), the rest confirmed and fixed below. Two further residuals surfaced when a crypto auditor reviewed the fixes themselves. Fixed1. 2. KV shape not bound to the model (G1) — the MB-1 guard compared only metadata integers; the tensors that seed attention come from 3. metaState fed to fatalError-ing setters (G2) — 4. SSD path traversal (B) — 5–6. Two residual uncatchable-crash gaps (found reviewing the fixes) — same class as G2, still reachable: the Refuted (no action)Torn/orphan-file "permanent defeat" (atomic write + self-heal on re-store), serializer Coverage100 provider-swift KV tests + 11 submodule prefix-cache tests green. New regressions: cold-saturated-pool no-overlap, validateLayout accept/reject, malformed-metaState (3), wrong-array-count, shape/byte-length mismatch, negative-dim, malicious-relativePath ignored; the MB-1 and prefix-hash manager tests were rewritten to exercise the residual threats (model-dir-prefix collision; same-dir on-disk swap) now that the path is reconstructed. Commits: |
New docs/ssd-kv-cache.md documents the encrypted SSD KV cache as it actually runs: the two tiers (wired engine block cache + the built-but- unwired checkpoint manager), the store/evict/load data path, the DBKV file format, the SE-wrapped-KEK / per-file-DEK envelope, KVCacheSerializer, the DARKBLOOM_PREFIX_CACHE flag + prerequisites, the load-path verification ladder (model/shape/prefix-hash/path/tensor-shape/decode-safety, all fail closed to a cold miss), exact-checkpoint matching, failure modes, on-disk layout, the TB-007 security model, and a code/test map. Cross-linked from the design doc, which remains the rationale/threat-model/phased plan.
…P2s) Addresses 3 Codex P2 findings on the encrypted SSD KV cache (all only matter when DARKBLOOM_PREFIX_CACHE is enabled, default off): - Weight-hash binding (#1): the cache dir key and metadata modelHash were derived from the mutable modelId. A re-download under the same id with different weights left old encrypted KV that still passed the model/shape guards and could seed generation with stale-weight KV, corrupting output. Now bound to ModelInfo.weightHash when available (falls back to modelId) for both the directory and the MB-1 binding, so a weight change yields a fresh dir + binding and old KV is invalidated. - Memory budget (#2): the engine block cache used a fixed maxBlocks=4096 (4096*256 tokens of KV held OUTSIDE the scheduler kvBudget) -> tens to hundreds of GB on large models -> OOM even though admission thinks requests fit. maxBlocks is now derived from a memory budget (DARKBLOOM_PREFIX_CACHE_MAX_GB, default 1/8 physical RAM) and the model's per-token KV bytes; the cache disables itself if even one block would not fit. - Disk eviction (#3): EncryptedPrefixCachePersistence wrote a file per evicted block and never cleaned up, growing until the volume filled (breaking later cache writes and model downloads). Added an LRU byte-budget sweep (DARKBLOOM_PREFIX_CACHE_DISK_GB, default 10 GB; 0 = unlimited), amortized so the directory scan does not run on every block. Tests: prefixCacheBindingId / prefixCacheMaxBlocks helpers (weight binding + memory scaling/clamp); disk-budget eviction keeps usage within budget while evicting oldest. 101 KV + 10 BatchScheduler tests green.
… doc - threat-model.yaml (#4): the SE-key block was stale relative to the code it documents as the KV-cache security sign-off record. Corrected the label v1 -> v2 (current defaultLabel; legacy v1 is migrated on load) and documented that the same persistent SE key now performs ECIES unwrap/decryption for the KV-cache KEK, not only ECDSA signing -- so the decryption capability granted to entitled same-team binaries and the actual keychain item to audit are both visible. - ssd-kv-cache.md: synced to the weight-hash binding, memory-bounded maxBlocks, and disk-eviction behavior, plus the two new budget env vars (DARKBLOOM_PREFIX_CACHE_MAX_GB / _DISK_GB).
Codex review (round 2) — verified + addressedAll four P2 findings verified against the code and fixed. Commits #1 — Bind prefix cache to weight hash (BatchScheduler:320): real, fixed. #2 — Bound in-memory prefix cache by KV budget (BatchScheduler:235): real, fixed. #3 — Add eviction for persisted prefix blocks (EncryptedPrefixCachePersistence:75): real, fixed. #4 — Correct SE key label and allowed operations (threat-model.yaml:288): real, fixed. Tests: |
The on-disk prefix-cache budget defaulted to a flat 10 GB regardless of the actual disk. Make the default 50% of the cache volume's free capacity, measured live at model load (volumeAvailableCapacityForImportantUsage, falling back to raw available capacity). The DARKBLOOM_PREFIX_CACHE_DISK_GB env override still wins (0 = unlimited); a near-full disk yields a tiny positive budget (evict-almost-everything) rather than the env's 0-means-unlimited; if free space can't be read, falls back to 10 GB. Split into a pure, testable policy (resolveDiskBudget) + a free-space probe (volumeFreeBytes). Tests: 50%-of-free, env override, env-0-unlimited, near-full -> tiny positive, unknown -> 10 GB fallback, and a live free-capacity read. 113 KV + BatchScheduler tests green.
…versized writes
Address the adversarial review of the budget/binding changes.
- Env crash (medium, x2): prefixCacheBudgetBytes (MAX_GB) and
resolveDiskBudget (DISK_GB) did Int(gb * 2^30) guarded only by a sign
check. Double("inf")/"1e400"/huge values pass and Int(Double) TRAPS on
non-finite/overflow (uncatchable) -> provider crash on model load. MAX_GB
is read unconditionally to size maxBlocks, so it crashed even with the
cache OFF. Now reject non-finite / out-of-range values back to the
default. Extracted a pure resolveMemoryBudget mirror for testability.
- Orphaned directories (low): keying the cache dir by weightHash (the prior
weight-binding fix) created a fresh, never-swept directory on every
re-download. Key the dir by the MODEL id instead (stable) and keep the
MB-1 binding on the weight hash: stale-weight files are rejected AND
deleted by loadBlock on access and aged out by the sweep — invalidation
without leaking dirs.
- Write-then-delete treadmill (low): when a single block exceeds the disk
budget (tiny env value / near-full disk) saveBlock wrote then immediately
evicted it. Skip the write up front instead.
Tests: resolveMemoryBudget + resolveDiskBudget now cover inf/NaN/overflow
(would have trapped pre-fix); write-skip-when-oversized; delete-on-MB1-
mismatch. Docs: corrected the sample log line, the dir-key scheme, and an
explicit per-model (not global), measured-once disk-bounding note. 116 KV +
BatchScheduler tests green.
Summary
Encrypted-at-rest KV cache for the Swift provider: persist prefill KV to
disk encrypted, reload after restart/eviction to skip re-prefill. 12
commits, ~3.5k LOC, 103 KV-cache tests green.
What's in it
Crypto primitives (P0) —
EncryptedKVStore(AES-256-GCM, per-filerandom DEK, HKDF-derived per-chunk nonces, metadata-as-AAD tamper
binding, atomic write + dir fsync) and
KVCacheKEK(envelope: per-fileDEK wrapped by an SE-derived KEK held in Keychain). Async + sync
(
writeSync/readSync) paths share the format.Cache machinery (P1–P3) —
PrefixCacheRAM(LRU),PrefixDigest+PrefixCacheIndex(exact-checkpoint lookup),PrefixCacheManager(orchestration actor with the MB-1 model-binding guard), and
KVCacheSerializer([KVCache]↔ encryptable bytes, bf16-exact).Live integration (Path 2) — backs the engine's own in-GPU block
prefix cache with our encryption: a
PrefixCachePersistencehook(submodule) calls
EncryptedPrefixCachePersistenceon block evict/load,so evicted blocks are AES-GCM-encrypted to SSD and reloaded instead of
re-prefilled. Gated behind the default-off
DARKBLOOM_PREFIX_CACHEflag.Verified before building
resolved empirically (a review claimed
temporalOrderscrambles onrestore; tests prove it's correct when state+metaState round-trip
together).
source: all hybrid (sliding/recurrent), so the cache is exact-
checkpoint (no arbitrary longest-prefix); Mamba layers are RAM-only
(recurrent state isn't a per-token prefix). Adversarial review of P0–P2
found 0 critical / 0 high; fixes applied.
Severity: HIGH (inherent, when the flag is enabled) — mitigated to LOW as-shipped.
DARKBLOOM_PREFIX_CACHE, and enabling it requires an explicit operator threat-model sign-off. Flag off (the default) ⇒ no cache ⇒ no exposure.The engine prefix cache was deliberately disabled (
TB-007) for across-tenant data-leak / TTFT side-channel: the provider cannot see
tenant identity, so the cache is shared across consumers. This PR adds
encryption-at-rest (disk-theft defense) but does NOT close the
in-process cross-tenant sharing/timing channel. It ships only behind
the default-off flag and requires an explicit operator threat-model
sign-off. Flag off = today's exact behavior (engine
prefixCache: nil).Notes / follow-ups
PrefixCacheManager/Index/Digest/RAMfor the live path (theengine already owns lookup/LRU/indexing). Those remain for non-engine
use; only
EncryptedKVStore+KVCacheSerializer+ KEK are on thelive path.
modelId; weight-hash binding (invalidate on weightchange under the same id) is a follow-up.
docs/ssd-kv-cache-design.md.Need help on this PR? Tag
@codesmithwith what you need. Autofix is disabled.