feat(kv-cache): encrypted SSD KV cache — primitives, verification, and flag-gated engine integration by anupsv · Pull Request #237 · Layr-Labs/d-inference

anupsv · 2026-05-29T01:32:21Z

Summary

Encrypted-at-rest KV cache for the Swift provider: persist prefill KV to
disk encrypted, reload after restart/eviction to skip re-prefill. 12
commits, ~3.5k LOC, 103 KV-cache tests green.

DRAFT — two gates before merge:

Submodule PR must merge first. This branch bumps
libs/mlx-swift-lm to a branch (feat/encrypted-prefix-cache-persistence)
adding the additive PrefixCachePersistence hook. That submodule PR
has to land before this can go to master.

2-Mac smoke test pending. End-to-end behavior with the flag ON
can't run in CI — needs the M5 Thunderbolt pair. All unit/integration
coverage here runs on swift test.

What's in it

Crypto primitives (P0) — EncryptedKVStore (AES-256-GCM, per-file
random DEK, HKDF-derived per-chunk nonces, metadata-as-AAD tamper
binding, atomic write + dir fsync) and KVCacheKEK (envelope: per-file
DEK wrapped by an SE-derived KEK held in Keychain). Async + sync
(writeSync/readSync) paths share the format.

Cache machinery (P1–P3) — PrefixCacheRAM (LRU), PrefixDigest +
PrefixCacheIndex (exact-checkpoint lookup), PrefixCacheManager
(orchestration actor with the MB-1 model-binding guard), and
KVCacheSerializer ([KVCache] ↔ encryptable bytes, bf16-exact).

Live integration (Path 2) — backs the engine's own in-GPU block
prefix cache with our encryption: a PrefixCachePersistence hook
(submodule) calls EncryptedPrefixCachePersistence on block evict/load,
so evicted blocks are AES-GCM-encrypted to SSD and reloaded instead of
re-prefilled. Gated behind the default-off DARKBLOOM_PREFIX_CACHE flag.

Verified before building

Cross-model cacheability + the rotating-cache snapshot/restore hazard
resolved empirically (a review claimed temporalOrder scrambles on
restore; tests prove it's correct when state+metaState round-trip
together).
Qwen3.5/3.6/3.7 MoE + Gemma-4 26B-A4B + GPT-OSS-20B confirmed from
source: all hybrid (sliding/recurrent), so the cache is exact-
checkpoint (no arbitrary longest-prefix); Mamba layers are RAM-only
(recurrent state isn't a per-token prefix). Adversarial review of P0–P2
found 0 critical / 0 high; fixes applied.

⚠️ SECURITY — TB-007 (must read before enabling)

Severity: HIGH (inherent, when the flag is enabled) — mitigated to LOW as-shipped.

Inherent — HIGH: cross-tenant information disclosure (TTFT timing oracle + shared prefix blocks) in a multi-tenant provider. It breaks tenant isolation, a core privacy property. It is a timing/inference side-channel, not direct content exfiltration (a tenant cannot read another's KV without already knowing the exact tokens) — hence HIGH, not CRITICAL.
Residual — LOW (as shipped): the cache is default-OFF, opt-in only via DARKBLOOM_PREFIX_CACHE, and enabling it requires an explicit operator threat-model sign-off. Flag off (the default) ⇒ no cache ⇒ no exposure.

The engine prefix cache was deliberately disabled (TB-007) for a
cross-tenant data-leak / TTFT side-channel: the provider cannot see
tenant identity, so the cache is shared across consumers. This PR adds
encryption-at-rest (disk-theft defense) but does NOT close the
in-process cross-tenant sharing/timing channel. It ships only behind
the default-off flag and requires an explicit operator threat-model
sign-off. Flag off = today's exact behavior (engine prefixCache: nil).

Notes / follow-ups

The block-level engine integration supersedes the checkpoint-level
PrefixCacheManager/Index/Digest/RAM for the live path (the
engine already owns lookup/LRU/indexing). Those remain for non-engine
use; only EncryptedKVStore + KVCacheSerializer + KEK are on the
live path.
Model binding uses modelId; weight-hash binding (invalidate on weight
change under the same id) is a follow-up.
Design + threat model: docs/ssd-kv-cache-design.md.

^{Need help on this PR? Tag @codesmith with what you need. Autofix is disabled.}

…arrival path Provider (Swift): - Add NetworkPowerAssertion: IOKit assertions for NetworkClientActive and BackgroundTask, acquired for the entire provider session - Keeps the macOS network stack alive during sleep so APN pushes (courier.push.apple.com:5223) can be delivered for MDM commands - No root required — uses IOPMAssertionCreateWithName API Coordinator: - Increase SecurityInfo timeout from 30s to 90s to cover Power Nap cycles (every ~15 minutes on AC power) - Add OnLateSecurityInfoCallback: when a SecurityInfo webhook arrives after the 90s timeout, retroactively upgrade the self_signed provider to hardware trust instead of silently dropping the response - Wire up the callback in main.go alongside the existing SetOnMDA callback, using the same ForEachProvider + serial match pattern

…in-backed Reflects commit 4a0dae5 (PersistentEnclaveKey.swift). Key changes: - TB-003 how_it_works: document Security framework persistent key, access group SLDQ2GJ6TL.io.darkbloom.provider, kSecAttrIsPermanent, and the errSecMissingEntitlement fallback behaviour on patched binaries - TB-003 current_limitations: add two new limitations — team-scoped cross-binary keychain access, and silent ephemeral fallback that defers rejection to the coordinator rather than failing at the process boundary - TB-009 how_it_works: rewrite SE key lifecycle section to reflect persistent identity across restarts; rotation now requires explicit keychain deletion - T-013 (binary tampering) mitigations: add keychain access group enforcement as a fourth, implemented mitigation; update detection_hint - T-033 (attestation replay) affected_files: add PersistentEnclaveKey.swift and AttestationSigner.swift; update mitigation wording - T-035 (repudiation after rotation) description, mitigations, detection_hint: reframe rotation as an explicit operator action rather than automatic per launch; note kSecAttrIsPermanent as a positive mitigation; add open finding that coordinator cannot detect opportunistic keychain delete + re-registration Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

First phase of SSD KV cache per docs/ssd-kv-cache-design.md. P0 delivers the cryptographic plumbing only; no integration with BatchScheduler yet (that's P3). Adds: - `KVCache/KeyWrappingService` — protocol abstracting wrap/unwrap of the KEK at rest. Two impls: `InMemoryKeyWrappingService` (tests) and `SecureEnclaveKeyWrappingService` (prod; ECIES via the existing persistent SE identity using `.eciesEncryptionStandardX963SHA256AESGCM`). - `KVCache/WrappedKEKStorage` — protocol abstracting the at-rest persistence layer. `KeychainWrappedKEKStorage` for prod (data- protection Keychain with `kSecAttrAccessibleAfterFirstUnlock- ThisDeviceOnly`), `InMemoryWrappedKEKStorage` for tests. Two layers (wrap + storage) so the entitlement-gated Keychain path is independently swappable from the SE-bound wrap path; both are individually exercised in tests. - `KVCache/KVCacheKEK` — actor that generates/persists the KEK on first run, holds the unwrapped key in actor-isolated state, and exposes per-file DEK wrap/unwrap with AAD binding. - `KVCache/EncryptedKVStore` — on-disk format codec for the `.darkbloom-kv` file. Layout: magic ‖ version ‖ flags ‖ file_IV ‖ wrapped_DEK ‖ metadata-JSON ‖ encrypted chunks. Per- chunk nonces derived via HKDF-SHA256 from (DEK, file_IV, chunk_index) — no nonce reuse, no nonce storage. Metadata bytes are bound as AAD on every chunk seal AND on the DEK wrap, so tampering with any field surfaces as an authentication failure on the first chunk decrypt (and on DEK unwrap too, via belt-and-suspenders). - `Security/PersistentEnclaveKey+ECIES.swift` — adds `eciesEncrypt/eciesDecrypt` to the existing SE identity. The only modification to existing code is bumping `privateKey` from `private` to `internal` so the extension can reach it; the SE still owns the key material. Threat model + format details: docs/ssd-kv-cache-design.md. Tests: - KeyWrappingServiceTests — 7 cases: wrap roundtrip, unique ciphertexts per call, tamper detection (ct + tag), wrong key, malformed input, empty plaintext. - KVCacheKEKTests — 10 cases including SE-roundtrip + Keychain-storage roundtrip (both skip on missing entitlement, same pattern as existing PersistentEnclaveKey tests). - EncryptedKVStoreTests — 13 cases covering single/multi-chunk roundtrip, metadata-only read, tamper at magic / version / metadata / chunk-ct levels, truncation, wrong-KEK, chunk-count/size mismatch, and the HKDF nonce derivation determinism + diversity properties. All 32 tests pass (2 skip on missing keychain-access-groups entitlement in unsigned debug builds; same pattern as PersistentEnclaveKeyTests).

Corrects a misleading claim in the SSD KV cache design: §5.1 previously stated that `model_hash` in the metadata AAD "binds the cache to a specific model file — reloading after a model upgrade fails closed." That overstated what the crypto does. The AES-GCM layer uses the file's OWN metadata (read back from disk) as AAD on both the DEK unwrap and every chunk seal. This proves the bytes weren't altered since write (tamper-evidence) but does NOT verify the file belongs to the currently-loaded model: a structurally valid cache file authored for model A decrypts cleanly while model B is loaded, because both supply the file's embedded metadata as AAD. The cipher cannot distinguish "right model" from "wrong model." Adds: - §5.1 rewritten to spell out what the AAD does (tamper-evidence) vs does NOT (model-binding). - §8.1.1 new section defining invariant MB-1: PrefixCacheManager MUST check meta.modelHash == currentLoadedModelHash (and an architectural-shape guard for 12-char hash-prefix collisions) BEFORE unwrapping the DEK or decrypting. Includes the guard pseudocode and three P3 regression tests that must fail if the guard is removed. - §10 two new failure-mode rows (cross-model file, shape mismatch), each noting they're caught by MB-1, not the crypto. - §14 P3 phase now explicitly includes MB-1 + its rejection tests. - docs/ssd-kv-cache-model-binding.{mmd,svg} — read-path flowchart showing where MB-1 sits (between metadata-read and decrypt) and a callout box on what the AES-GCM layer does vs doesn't guarantee. Animated SVG (draws on in a browser; renders static when embedded). No code change — P0 crypto is unaffected; MB-1 is an application-layer guard that lands with the P3 BatchScheduler integration.

…dow models Empirically resolves the open correctness question gating prefix cache for sliding-window models (GPT-OSS-20B, Gemma-4 26B-A4B MoE). Two review agents disagreed on whether RotatingKVCache survives snapshot -> restore -> resume; an adversarial lens claimed temporalOrder() scrambles token order on restore + multi-token prefill. These tests settle it from runtime behaviour. Each token t is encoded as K=t, V=t+100 so the returned (keys,values) reveal exact token order; any scrambling shows as a content mismatch vs a never-reset reference. RotatingKVCacheRestoreTests (single-stream, the circular-buffer case): - multi-token prefill after restore matches reference (the adversary's exact scenario — wrap the buffer, snapshot, restore, prefill) - single-token decode after restore matches - pre-wrap restore matches - omitting metaState on restore DOES corrupt order — proving idx/offset serialization is load-bearing (the failure mode is real but only under misuse: state without metaState) BatchRotatingExtractRoundtripTests (the batched path our design uses): - extract(row) isolates the correct row (no cross-row leakage) - extract(row) -> snapshot -> restore -> resume matches extract -> resume, for both multi-token and single-token continuations Verdict: rotating-cache restore is CORRECT when state AND metaState are restored together (the loadPromptCache contract). BatchRotatingKVCache keeps a linear front-trimmed buffer (no circular wrap), and its extract(idx) returns a single-stream RotatingKVCache with a fully populated metaState (keep,maxCacheSize,step,offset,idx) — so our design path (extract row -> single-stream -> serialize) sidesteps the batched cache's own incomplete metaState. The metaState-sync requirement is now a guarded invariant. All 7 tests pass. No production code changed.

Rewrites §4.4 and adds §4.5 to reflect what was verified against the actual mlx-swift-lm source + the empirical rotating-restore tests. Corrections to the earlier draft: - "slice the first N columns" was wrong: caches are 4-D [B,kvHeads,seq,headDim], sliced on axis 2, and the snapshot is taken by extractBatched(row) from the live batched caches — not naive slicing. - Arbitrary longest-prefix match (old O2) is DROPPED for the models we serve. Recurrent (Mamba/GatedDeltaNet) and sliding-window layers cannot be sliced to a shorter prefix, so reuse is EXACT-CHECKPOINT only: hit when the incoming prompt's prefix is byte-identical to a cached checkpoint boundary (e.g. end of a system prompt). Covers the dominant shared-system-prompt case. New §4.5 — verified per-model cacheability (cache type detected at LOAD time, never hardcoded; MoE is irrelevant — it only changes the FFN): - Qwen3.5 MoE / Qwen3-Next: hybrid MambaCache + KVCacheSimple → exact-checkpoint (recurrent state restorable at boundary, not sliceable). - Gemma-4 26B-A4B MoE: sliding RotatingKVCache(512) + full; only 15 non-shared caches to snapshot (20 layers KV-share, auto-reconstructed). - GPT-OSS-20B MoE: sliding RotatingKVCache(128) + full; attention sinks are learned weights, not KV state → no snapshot impact. - Unsupported cache types (Chunked/Quantized/CacheList/DeepseekV4 pooling) gated out at load time → cold path, no error. Invariant MS-1 (metaState-sync): extracted caches must persist metaState in sync with state (RotatingKVCache's idx/offset drive temporalOrder). EncryptedKVStoreMetadata already has the metaState field; regression-guarded by omittingMetaStateOnRestoreCorruptsOrder. Also: status line updated (P0 landed + cacheability verified), O2 marked dropped, per-request flow diagram shows exact-checkpoint + MB-1 guard, §15 records what verification resolved + adds [Q8].

The decrypted RAM tier of the SSD KV cache (design §4.1). Holds recently-used prefix KV snapshots as live [any KVCache] (one extracted single-stream cache per layer) so a repeat request whose prompt prefix is byte-identical to a cached checkpoint hits RAM instead of decrypting from SSD or running a cold prefill. Scope: RAM only — no SSD, no encryption, no BatchScheduler wiring (those are P2-P4). Plain final class (non-Sendable: holds MLXArrays), to be owned by the PrefixCacheManager actor in P3; tests run single-threaded. Design points: - Keyed by (modelHash, prefixDigest). modelHash is the locally-computed weight hash, so a lookup for model B structurally cannot return model A's entry — the RAM-tier half of MB-1. - get() returns copy() of each stored cache (upstream's blessed clone, state.map { $0[.ellipsis] }), so a consumer can seed a batch row and decode into the returned caches without corrupting the stored snapshot. This is the load-bearing invariant. - LRU eviction by a monotonic use-counter (not wall-clock — keeps it deterministic, avoids Date.now()), bounded by BOTH an entry count and a byte budget (measured via MLXArray.nbytes over each cache's state). - clear(modelHash:) for model unload; entriesForFlush(modelHash:) returns copies for the P4 SSD-flush path without removing. Tests (9, all pass): - hit/miss + tokenCount round-trip - MB-1: lookup keyed by modelHash (no cross-model bleed) - distinct digests are distinct entries - getReturnsIndependentCopy: mutate a returned copy (append 5 tokens), assert the stored snapshot stays at its original offset — proves copy()-on-get protects the snapshot - LRU eviction by entry count (recently-used survives, LRU evicted) - eviction by byte budget - clear(modelHash:) drops only that model - put replaces an existing key without leaking byte accounting - entriesForFlush returns per-model copies without removing

The exact-checkpoint lookup layer (design §4.4, §7; [Q3] resolved to JSON, not SQLite). No SSD I/O of cache payloads and no BatchScheduler wiring yet — that's P3 (held for review). PrefixDigest — checkpoint keys: - Checkpoints at the O9 boundaries (256/512/1024/2048/4096/8192). - For a prompt's token array, computes SHA-256 of the first c tokens at each checkpoint c <= count, in a SINGLE pass by snapshotting the rolling hash at each boundary. Proven (checkpointDigestEquals- IndependentPrefixHash) to equal an independent hash of the first c tokens — so a longer cached prefix is findable from a shorter shared one, and two prompts sharing a system prompt agree on every checkpoint digest within the shared region. - Tokens hashed as little-endian Int64 after a domain-separation tag; stable across machines, can't collide with other SHA uses. PrefixCacheIndex — JSON-persisted, in-RAM: - Maps (modelHash, digestHex) -> {relativePath, tokenCount, fileBytes, createdAt, lastHitAt, hitCount}. - findLongestCheckpoint(modelHash:tokens:): computes the prompt's checkpoint digests and returns the entry for the LONGEST checkpoint present for that model — the exact-checkpoint match. Partitioned by modelHash (MB-1: model B can't match model A's entries). - record / touch / remove / removeModel / entriesLRUFirst (eviction order for P6) / rebuild(from:) (recover when JSON missing/corrupt). - Atomic JSON write-back on save(); dirty-tracked. A corrupt index file is treated as empty (logged), not fatal — SSD files are self- describing and the index rebuilds from them. - Timestamps passed in by the caller (now: Int64), keeping the type deterministic and clock-free for tests. Tests (16, all pass): digest determinism + prefix-sensitivity, single-pass == independent-hash equivalence, shared-prefix agreement, boundary/token-count handling; index exact + longest-checkpoint match, divergent-prefix miss, MB-1 model scoping, touch metadata, remove/ removeModel, LRU ordering, JSON persistence round-trip, corrupt-file recovery, rebuild.

A 5-dimension adversarial review (findings verified by refutation) found 0 critical, 0 high, 6 medium, 9 low. Fixes for the cheap/ clearly-correct ones; the two genuinely-P3 items (PCR-1 sending-returns across the actor boundary, XC-3 MB-1 guard + tests) are deferred to P3 where the PrefixCacheManager actor lands. KV-3 (redundant magic field) is won't-fix. Crypto / format: - KV-1, XC-2: fix 3-way doc drift on per-chunk nonce derivation. The code uses HKDF-Expand-only (Extract skipped — DEK is already a uniform 256-bit key, RFC 5869 §3.3) with file_IV folded into `info`, NOT file_IV as an HKDF salt. Header + byte-layout comments now match the implementation and the function doc. (No crypto change — write and read already shared deriveChunkNonce, so round-trip was correct.) - KV-2: fsync the containing directory (F_FULLFSYNC, fsync fallback) after the atomic rename in EncryptedKVStore.write, so a just-renamed cache file is durable across power loss. Best-effort: a miss only costs a cold prefill. - KV-4: SecureEnclaveKeyWrappingService.unwrap classifies auth failures by structured OSStatus (errSecDecode/errSecAuthFailed/ errSecParam) instead of substring-matching a locale-dependent error string. RAM tier: - PCR-2: PrefixCacheRAM.put refuses (and counts) an entry whose own size exceeds maxBytes instead of storing-then-self-evicting into a silent no-op. put now returns Bool (@discardableResult). - PCR-3: byte accounting uses innerState() (physical, step-allocated buffers) instead of state() (trimmed logical view), so maxBytes bounds true resident RAM. Index / digest: - PCI-1: PrefixCacheIndex.save writes via Data.write(.atomic) directly (Foundation does aux-file + rename) instead of a manual tmp-<uuid> + replaceItemAt, which could leak a UUID-named orphan on a crash between write and replace with no sweep. - PCI-3, XC-4: entriesLRUFirst adds a deterministic secondary key (digestHex) so equal-lastHitAt entries order stably. - PD-1: PrefixDigest.checkpoints dedups boundaries so a duplicated caller-supplied boundary isn't double-emitted. Tests: - XC-1: new batchRotatingExtractMatchesIndependentSingleStreamReference — builds the reference WITHOUT extract() (an independent single- stream RotatingKVCache fed row 0's tokens) and compares resume, proving extract() is semantically equivalent, not merely idempotent. - new: ramRejectsEntryLargerThanByteBudget (PCR-2), indexLRUTieBreakIsDeterministic (PCI-3). - ramPutReplacesExistingKeyWithoutLeakingBytes rewritten to assert the real no-leak invariant under physical byte accounting (PCR-3). All KV cache tests pass (46 in the affected set; full P0-P2 suite green).

The missing primitive the SSD tier needs: convert an extracted [any KVCache] (one single-stream cache per layer, from BatchedCache.extractBatched) into raw byte chunks + a layout descriptor, and back. Chunks feed straight into EncryptedKVStore so plaintext KV NEVER touches disk — we deliberately do NOT route through upstream savePromptCache (it writes a plaintext .safetensors and its reconstruction helper is private). Byte round-trip via MLXArray.asData(.copy) / MLXArray(data:shape:dtype:) is dtype-agnostic, so bf16 round-trips exactly. Reconstruction uses each cache type's PUBLIC state + metaState setters. Scope — SSD-serializable: KVCacheSimple + RotatingKVCache (the attention + sliding-window caches Gemma-4 26B-A4B and GPT-OSS-20B use, plus all pure-attention models). NOT SSD-serializable: MambaCache / ArraysCache (recurrent). Their metaState setter deliberately traps (assertionFailure) and the real reconstruction path (ArraysCache.restoreFromMetaState) is `internal` to MLXLMCommon — unreachable from ProviderCore. Rebuilding recurrent state via the partial public API can't be verified correct without running the model, and a wrong recurrent state silently emits garbage tokens, so the serializer REFUSES recurrent caches (and any hybrid stack containing one) rather than guess. Consequence: hybrid models (Qwen3.5/Next) get the RAM tier only — which uses copy(), no serialization — not SSD persistence. SSD-for-Mamba is a documented follow-up gated on upstream exposing a public reconstruction. (This constraint was found by the test suite: an earlier attempt to set MambaCache.metaState tripped the upstream assertionFailure — caught before it could become a latent bug.) Also unsupported: ChunkedKVCache, QuantizedKVCache, CacheList. serialize throws on any unsupported layer; the P3 manager's load-time capability gate keeps them out first. Tests (7, all pass): KVCacheSimple state round-trip; bf16 exact fidelity; resume-equivalence for KVCacheSimple AND wrapped RotatingKVCache (reconstructed cache continues generation identically); Mamba rejected (RAM-only); Chunked rejected + attention/sliding stack accepted; full end-to-end serialize -> EncryptedKVStore.write (layout in metaState) -> read -> deserialize.

@unchecked

The three-tier orchestration layer (design §4), one manager per loaded model. Standalone and fully tested; the BatchScheduler wiring is the NEXT step and is deliberately not in this commit (review checkpoint before touching live inference). Closes the two review findings deferred from P0-P2: - XC-3: the MB-1 model-binding guard now EXISTS and is enforced on the SSD load path — readMetadataOnly first, verify metadata.modelHash == binding.modelHash AND the architectural shape (numLayers/kvHeads/ headDim) BEFORE unwrap/decrypt, drop the index entry + count the mismatch on failure. This catches a wrong-model file the crypto cannot (a valid file from another model decrypts cleanly because the AAD is its own metadata). - PCR-1: the non-Sendable [any KVCache] crosses the actor boundary via documented @unchecked Sendable transfer types (PrefixLookupResult out, SendableKVCaches in). `sending` is unusable here because values produced through the actor-isolated PrefixCacheRAM are inferred into the actor's region; the boxes are sound because the caches are always fresh (RAM hits copy(), SSD hits freshly deserialized) and single- owner — matching the codebase's existing UncheckedSendable idiom. Behavior: - lookup(tokens:): exact-checkpoint match, RAM tier (longest checkpoint first) then SSD tier (MB-1-guarded read -> deserialize -> promote to RAM). Returns fresh caller-owned caches + tokenCount + tier. - store(tokens:checkpointLength:caches:): write-back to RAM only. - flushToSSD(): serialize RAM entries not already on SSD -> encrypt via EncryptedKVStore (layout JSON in metaState) -> record + save index. Skips non-serializable stacks defensively. - capability gate: ssdEnabled requires index+kek+cacheDir; a model whose caches aren't KVCacheSerializer-supported (Mamba hybrids) runs RAM-only. Timestamps injected (now:) for determinism. Tests (8): RAM hit + longest-checkpoint-wins; full SSD round-trip (store -> flush -> clearRAM -> SSD hit -> promote-to-RAM); restart persistence across manager instances; MB-1 rejects a cross-model file even when the index points B at A's file (the symlink/collision case) — asserts modelMismatches counted; SSD-disabled-without-backing; miss on sub-checkpoint prompt. Full KV cache suite green: 98 tests.

The encrypted SSD backend for the engine's in-GPU block prefix cache (Path 2). Conforms to MLXLMCommon.PrefixCachePersistence (added to the submodule): the engine calls saveBlock on LRU eviction and loadBlock on a block-hash miss, so evicted blocks are AES-GCM-encrypted to disk (surviving eviction AND restart) instead of dropped + re-prefilled. Reuses our crypto primitives — EncryptedKVStore + KVCacheSerializer + the KEK — keyed by the engine's content-addressed block hash. (Note: this block-level integration supersedes the checkpoint-level PrefixCacheManager/Index/Digest/RAM for the live path — the engine already owns lookup/LRU/indexing; those layers remain for any non-engine use but aren't on this path.) EncryptedKVStore: refactored the body-build / header-assemble / atomic- write / chunk-decrypt into shared sync helpers, and added writeSync / readSync taking a SymmetricKey KEK directly. The engine step loop is synchronous and can't await the KVCacheKEK actor, so the persistence holds an already-unwrapped KEK and does synchronous crypto + I/O. The async write/read now delegate to the same helpers — format is identical (the 15 EncryptedKVStore tests still pass). EncryptedPrefixCachePersistence: sync saveBlock/loadBlock; MB-1 guard on load (metadata.modelHash + shape before decrypt); KVCacheSimple-only (matches the engine's prefix cache); best-effort save (never throws). SECURITY (TB-007): this adds encryption-at-rest (disk-theft defense) but does NOT close the in-process cross-tenant sharing / TTFT side-channel — the provider can't see tenant identity. Default-off flag + explicit threat-model sign-off required (the BatchScheduler flag is the next commit). Documented in the type header. Tests (5): save/load round-trip; on-disk bytes are encrypted (DBKV magic present, no plaintext); MB-1 rejects wrong model; wrong KEK -> nil (no crash); and END-TO-END through the real upstream PrefixCache — maxBlocks=1 forces eviction (saveBlock) then fetch reloads from encrypted SSD (loadBlock). EncryptedKVStore async suite still green.

Wires the encrypted prefix cache into live inference behind the DARKBLOOM_PREFIX_CACHE env flag (default OFF — unset = exact current behavior, prefixCache nil). When set, makeBatchedEngine builds an EncryptedPrefixCachePersistence and passes it as the engine's Scheduler.prefixCache, so the engine's in-GPU block cache is backed by AES-GCM-encrypted SSD storage (evicted blocks persist + survive restart; fetch misses reload from disk). Guards before enabling: - architecture must expose numLayers/kvHeads/headDim (from the config.json already parsed in snapshotContainer) — else disabled. - KEK must be Secure-Enclave-wrapped + Keychain-persisted (so files survive restart). If unavailable (no SE / entitlement) we REFUSE to enable rather than fall back to an ephemeral key that would silently break restart-reuse. - per-model dir keyed by sha256(modelId)[:12] under the OS cache dir. SECURITY (TB-007): enabling re-opens the cross-tenant data-leak / TTFT side-channel that was deliberately gated — the provider cannot see tenant identity, so the cache is shared across consumers. This commit adds encryption-at-rest (disk-theft defense) but does NOT close the in-process channel. Ships ONLY behind the default-off flag with an explicit operator threat-model sign-off. Loud warning logged on enable. modelId is used as the model-binding key; weight-hash binding (to invalidate on a weight change under the same id) is a documented follow-up. Full KV cache suite green: 103 tests. End-to-end live behavior on real hardware still needs the 2-Mac smoke test (cannot run in CI).

Points libs/mlx-swift-lm at the additive PrefixCachePersistence change (branch feat/encrypted-prefix-cache-persistence, 6f79f04) that the encrypted prefix-cache wiring depends on. The submodule change is default-nil (no behavior change unless darkbloom passes a persistence backend), so this bump is safe with the flag off. NOTE: the submodule change lives on a branch, not yet merged to the submodule's main. The corresponding submodule PR must merge before this parent change can land on master.

vercel · 2026-05-29T01:32:27Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
d-inference	Ready	Preview	May 29, 2026 9:22pm
d-inference-console-ui-dev	Ready	Preview	May 29, 2026 9:22pm
d-inference-landing	Ready	Preview	May 29, 2026 9:22pm

Opt-in integration test (gated by DARKBLOOM_MODEL_TEST_DIR, so CI / plain `swift test` skip it) that drives the REAL engine with a REAL model through our encrypted prefix-cache path. Builds the BatchedEngine with an EncryptedPrefixCachePersistence (injected in-memory KEK), then: - generates a prompt with cache OFF (reference), - generates the same prompt twice with cache ON, - asserts the cache-ON greedy (temp 0) output is byte-IDENTICAL to the reference — i.e. prefix KV reuse does not corrupt generation, - asserts the cache engaged (hits > 0, tokens_saved > 0). Validated on a real M5 Max with mlx-community/Llama-3.2-1B-Instruct-bf16: output matched the uncached reference exactly and the cache saved 256 prefill tokens (one full block) on the repeat request.

…validation Adds PersistentEnclaveKey.makeTransient() — a non-persisted Secure Enclave key (kSecAttrIsPermanent=false, no keychain access group). Since it never touches a keychain access group it needs NO keychain-access-groups entitlement, so SE crypto (sign / ECIES) can be exercised on real hardware from unsigned builds. (loadOrCreate, which persists under the team access group, still requires a signed build — that's the same SecItem path the production attestation key uses.) Adds kv-se-harness — a TEST-ONLY standalone executable (not a product, not shipped) that uses a transient SE key to validate the encrypted KV-cache key path on real hardware without code signing: - ECIES wrap (public) + unwrap (SE-private) via the new PersistentEnclaveKey+ECIES / SecureEnclaveKeyWrappingService code, - KVCacheKEK generate -> SE-wrap -> store -> read -> SE-unwrap, recovering identical key material, - DEK wrap/unwrap under the recovered KEK, - tamper rejection (flipped wrapped-DEK byte fails auth). Validated on a real M5 Max (unsigned): all five checks PASS, closing the SE-crypto validation gap. The only piece still requiring a signed build is keychain PERSISTENCE of the key (production-proven attestation path), not any code this branch introduced.

anupsv · 2026-05-29T02:40:27Z

Hardware validation — M5 Max (`m5-max-128gb-1`, macOS 26.4.1, Swift 6.3.1)

Tested on a real M5 Max (real Metal / MLX / Secure Enclave), not just CI.

Full KV-cache suite: 103/103 pass on real Metal — including the end-to-end encrypted evict → AES-GCM encrypt → SSD → reload through the real engine PrefixCache, bf16 byte-exact round-trips, and the on-disk-is-encrypted check.

Live model path (new LivePrefixCacheModelTests, env-gated, real mlx-community/Llama-3.2-1B-Instruct-bf16):

cache-ON greedy output byte-identical to the cache-OFF reference → prefix KV reuse does not corrupt generation;
cache engaged: hits=1, 256 prefill tokens saved (one full block) on the repeat request.

Secure-Enclave crypto (new kv-se-harness, transient SE key → no entitlement → unsigned build): ECIES wrap/unwrap on the real SE, KEK generate→SE-wrap→store→read→SE-unwrap (identical material), DEK round-trip, and tamper rejection — all PASS on real hardware.

Still requires a signed build (not exercised here): keychain persistence of the SE key under the team access group. macOS AMFI SIGKILLs an ad-hoc binary carrying a restricted keychain-access-groups entitlement, so this needs a real Developer-ID-signed bundle. Note: this is the same SecItem path the shipping attestation key already uses — no new logic in this PR depends on it beyond what's already in production.

Net: the entire data path + SE crypto introduced by this PR is hardware-validated; the one unexercised piece is production-proven keychain storage that only a signed build (release CI) can run.

Master was force-updated, diverging the branch base. Single conflict in coordinator/cmd/coordinator/main.go — purely from the rewritten base, NOT from any KV-cache change (this branch is provider-swift only). The conflict was two forms of the same SetOnLateSecurityInfo callback: the branch carried the older inherited version (LookupDevice HTTP inside the per-provider lock); master refactored it to collect candidates under the lock and do the HTTP lookups outside it. Resolved by taking master's version of the whole file — same feature, better lock discipline, and the branch has no independent change there. Submodule pointer (libs/mlx-swift-lm @ 6f79f04, the PrefixCachePersistence hook) preserved. Coordinator builds clean post-merge.

github-actions · 2026-05-29T03:02:20Z

This PR introduces an opt-in encrypted SSD KV-cache (prefix cache) backed by a SE-wrapped KEK, wires weightHash through to the engine for cache identity binding, and adds a #if DEBUG-only transient SE key factory; the changes are net-neutral-to-positive on security posture for the production path, but the new SSD cache feature introduces attack surface not fully covered by existing threats.

Trust boundaries touched

TB	Name	Relevance
TB-003	Provider operator vs. process	New KV cache files written to SSD; KEK lifecycle via SE
TB-007	Provider inference engine	Prefix cache sharing changes cross-tenant isolation assumptions
TB-009	Apple attestation chain	SE key reused for ECIES KEK-wrap, expanding its use beyond signing

Per-threat assessment

T-007 — Provider serves manipulated model outputs (BatchScheduler.swift)
✅ Strengthens. The weightHash is now passed all the way into makeBatchedEngine and used as the cache identity key for the prefix-cache persistence layer (line makeEncryptedPrefixPersistenceIfEnabled). A re-download under the same modelId with different weights will produce a different weightHash and therefore miss the cache, preventing stale KV from being served as if it came from the new weights. This is a positive improvement to the weight-binding story, though coordinator-side enforcement remains fail-open (SEC-007, unchanged).

T-028 — Residual inference data in GPU memory (BatchScheduler.swift)
⚠️ Weakens slightly. Previously prefixCache: nil was hardcoded (marked // SECURITY: TB-007). The cache is now conditionally live when DARKBLOOM_PREFIX_CACHE=1. Prefix blocks contain decoded KV-cache tensors, which are themselves derived from decrypted prompt tokens. The comment in the diff explicitly acknowledges "cross-tenant sharing / TTFT side-channel" is NOT closed by this change. The RAM-resident prefix cache (PrefixCacheRAM) holds KV tensors across requests; a subsequent tenant's request that partially matches a prior prompt's prefix will read those tensors. This is a new in-memory cross-tenant data residue path that did not exist before this PR. The feature is opt-in and flag-gated, which limits exposure, but operators who enable it in production should understand this is explicitly out-of-threat-model.

T-008 — Provider sends plaintext SSE chunks on encryption failure (ProviderLoop.swift)
ℹ️ Neutral. The only change here is passing weightHash: modelInfo.weightHash to loadModel. No change to encryption error handling.

T-009 — Swift provider excluded from private-text routing (ProviderLoop.swift)
ℹ️ Neutral. No change to privacy flag reporting or the routing gate.

T-033 — Attestation blob replay (PersistentEnclaveKey.swift)
ℹ️ Neutral. makeTransient() is #if DEBUG-gated and confirmed compiled out of release builds. Transient keys cannot satisfy coordinator attestation (different public key fingerprint), so no replay surface is introduced in production.

T-035 — Provider denies actions after key rotation (PersistentEnclaveKey.swift)
ℹ️ Neutral. privateKey visibility changed from private to internal to allow PersistentEnclaveKey+ECIES to call SecKeyCreateDecryptedData. The private material never leaves the SE; this is a handle-level change. No new export path is opened.

New attack surface NOT covered by existing threats

NEW-001 — SSD KV cache files as an exfiltration / tampering target (not covered)

Files: provider-swift/Sources/ProviderCore/KVCache/EncryptedKVStore.swift, EncryptedPrefixCachePersistence.swift, KVCacheKEK.swift, KeyWrappingService.swift (not shown in diff)

The encrypted SSD cache persists KV tensors derived from decrypted consumer prompts to disk. The threat model notes the operator has full filesystem read access (TB-003) and the X25519 key was historically the highest-risk on-disk secret. Now there is a second category of on-disk data: SE-encrypted KV blobs keyed by a Keychain-persisted KEK. Relevant concerns:

KEK persistence in Keychain: The KEK is stored under the same access group (SLDQ2GJ6TL.io.darkbloom.provider) as the attestation key. The existing TB-003 limitation already notes any same-team-ID binary with the keychain-access-groups entitlement can read this group. The KEK is now an additional item in that group, so a same-team binary could load the KEK and decrypt SSD cache files offline without the SE — this is a new, concrete exfiltration path for prompt-derived data that does not require defeating the SE.
Cache file path/naming leaks model usage: Even if contents are encrypted, the existence and size of cache files on the SSD reveals which model prefixes were recently used, potentially leaking information about consumer activity to the operator.
Cache poisoning by a malicious operator: If an operator can write to the SSD cache directory, they could swap encrypted blobs. The AEAD on each block (XSalsa20-Poly1305 or equivalent — exact scheme not visible in the truncated diff) should detect tampering at read time, but the failure mode (silently skip vs. crash vs. log) is not visible here and should be confirmed to fail closed.

Recommended action: Before enabling DARKBLOOM_PREFIX_CACHE in any production deployment, confirm:

The KEK Keychain item has tighter access control (e.g. kSecAttrAccessibleWhenUnlockedThisDeviceOnly + kSecAccessControlUserPresence or at minimum a separate access group from the attestation key).
AEAD decryption failures on cache read terminate the request rather than serving stale/empty KV silently.
The threat model is updated to cover on-disk KV-derived data as a new asset category.

NEW-002 — privateKey visibility widened to internal without audit of all in-package callers

File: PersistentEnclaveKey.swift line ~70

Changing private → internal on the SecKey handle means any future file added to ProviderCore can call SecKeyCreateDecryptedData (or other Security framework functions) on it directly, bypassing the +ECIES extension's intended API surface. This is low-risk today but represents a module-level capability expansion. Suggest adding a // MARK: - intentionally internal for ECIES extension only comment and a CI lint rule (e.g. periphery or a custom grep check) to prevent additional callers.

Open findings resolved by this PR

None of the SEC-* open findings are resolved by this PR. SEC-007 (weight hash fail-open) is partially addressed at the cache-identity layer but the coordinator-side gate is unchanged.

🔐 Threat model: docs/threat-model.yaml · Updates on each push to this PR

…ning) Addresses the threat-model-review finding that PersistentEnclaveKey. makeTransient() — the entitlement-free transient-SE-key factory — was public and ungated, so it shipped in release builds. Independent verification confirmed the factual claim (public, no guard) but found the exploit risk overstated: only the test-only kv-se-harness calls it (production uses loadOrCreate), and a transient key can't pass coordinator-side attestation anyway (SIP/SecureBoot/MDA gates). Still, compiling a test-only entitlement-free SE-key path out of release builds is correct defense-in-depth, so it's now #if DEBUG. Verified: debug build (harness) still compiles with makeTransient; release ProviderCore compiles without it (nothing else references it).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 581cc60106

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-29T07:31:39Z

+        try? FileManager.default.createDirectory(at: dir, withIntermediateDirectories: true)
+
+        let binding = PrefixCacheModelBinding(
+            modelHash: modelId, modelDtype: "unknown", modelArch: "unknown",


Bind prefix cache entries to weight hash

When DARKBLOOM_PREFIX_CACHE is enabled, this stores both the cache directory key and metadata.modelHash from the mutable model id rather than the downloaded weight identity. The model catalog/download path already supports replacing snapshots/local for the same model.id with a different aggregateSHA256, so after a model registry update or re-download under the same id, old encrypted KV files in darkbloom/kv/<sha(modelId)> still pass the loadBlock model/shape checks and can seed generation with KV computed from previous weights, corrupting outputs. Use the manifest aggregate/weight hash (or clear this directory on model replacement) as the binding instead of modelId alone.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-29T07:31:39Z

+        return await container.perform { ctx -> BatchedEngine in
+            let prefixCache: PrefixCache? = persistence.map {
+                PrefixCache(
+                    config: PrefixCacheConfig(blockSize: 256, maxBlocks: 4096),


Bound in-memory prefix cache by KV budget

With DARKBLOOM_PREFIX_CACHE enabled, this creates an in-memory engine prefix cache that can retain 4096 * 256 tokens worth of KV outside the scheduler's active kvBudget; on larger models or shared long system prompts, that can consume tens or hundreds of GB of unified memory before the cache evicts, causing provider OOMs even though admission control still thinks requests fit. Derive maxBlocks from the model's KV-byte estimate/available memory or make it a much smaller operator-tuned budget.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-29T07:31:39Z

+                metaState: [layoutJSON],
+                chunkPlaintextSizes: chunks.map { $0.count }
+            )
+            try EncryptedKVStore.writeSync(to: fileURL(blockHash), metadata: meta, chunks: chunks, kekKey: kekKey)


Add eviction for persisted prefix blocks

When the opt-in prefix cache is enabled, every block evicted from the engine is written to this per-model directory, but this persistence backend never deletes entries or enforces a byte/TTL limit. A provider serving diverse prompts will keep accumulating .darkbloom-kv files until the user's cache volume fills, which can break later cache writes and model downloads; add a disk budget/sweep or skip persistence when the budget is exhausted.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-29T07:31:39Z

+      io.darkbloom.provider.attestation-signing.v1. Subsequent launches load the
+      existing key by access group and label query. The private key never leaves the
+      SE silicon; only signing operations are available. ProviderLoop tries the


Correct SE key label and allowed operations

This threat-model update is stale relative to the code it documents: PersistentEnclaveKey.defaultLabel is ...attestation-signing.v2, and this same key is now also used by PersistentEnclaveKey+ECIES for KEK unwrap/decryption, not only signing. Because this document is the security sign-off record for the new KV cache, leaving it as v1/signing-only hides the actual keychain item to audit and the new decryption capability granted to same-team entitled binaries.

Useful? React with 👍 / 👎.

A second review pass (crypto-security-auditor) on the G1/G2 fixes found two more instances of the same uncatchable-crash class still reachable in the foreign/tampered-file threat model — both defeat G2's invariant that a malformed file becomes a recoverable cold miss, never a process crash: - State-setter array count: validateMetaState checks metaState but NOT the per-layer state-array count. reconstruct's "c.state = arrays" runs the KVCacheSimple/RotatingKVCache state setter, which fatalErrors unless given exactly 2 arrays. deserialize only checked the AGGREGATE chunk count, so a foreign file with 1 or 3 arrays in a layer crashed the provider. Now require 0 or 2 arrays per layer (throw otherwise). - MLXArray init precondition: validateLayout binds dims [1]/[3] but not that shape.product*dtype.size == chunk byte length, nor that dims are non-negative / non-overflowing. MLXArray(data:shape:dtype:) hard-traps (shapePrecondition) on a mismatch, and shape.reduce(1,*) traps on a negative dim or Int overflow. deserialize now computes an overflow-safe expected byte count and rejects any descriptor whose chunk length disagrees, before constructing the MLXArray. Both throw KVCacheSerializerError.reconstructionFailed -> cold miss. Tests: deserializeRejects{WrongStateArrayCount,ShapeByteLengthMismatch, NegativeDimShape} -- each would crash the test process pre-fix. 100 KV tests green.

anupsv · 2026-05-29T07:44:54Z

Follow-up adversarial review — confirming the Codex bug class isn't lurking elsewhere

Ran a multi-lens sweep of the whole encrypted-KV-cache surface (6 lenses: binding-completeness, file-lifecycle/durability, crypto-correctness, serialization-fidelity, index-integrity/paths, concurrency/integration), each finding adversarially verified by an independent skeptic, plus a completeness critic. 11 raw findings; 8 refuted (unreachable / already-guarded / out-of-scope roadmap items), the rest confirmed and fixed below. Two further residuals surfaced when a crypto auditor reviewed the fixes themselves.

Fixed

1. fetchPrefix length desync — corrupts generation (submodule, found by 2 lenses independently)
On a cold (persistence) hit with a saturated block pool (allocateBlock()==nil), the loaded block was merged into the returned cache but cachedTokens was computed from matchedIds (GPU-resident only), undercounting it. remainingTokens then overlapped the seeded KV → the scheduler re-prefilled those positions → duplicated KV at wrong offsets → corrupted output. Fix: cachedTokens = matchedPerBlock.count * bs (the actual merged width). Submodule cc79368.

2. KV shape not bound to the model (G1) — the MB-1 guard compared only metadata integers; the tensors that seed attention come from layout.layers[].arrays[].shape, never cross-checked. A self-consistent file whose layout disagrees with the live model (weights changed under same id, or a foreign file) would seed wrong-shaped KV. Added validateLayout (binds rank + dims [1]=kvHeads, [3]=headDim) in both load paths.

3. metaState fed to fatalError-ing setters (G2) — reconstruct handed unvalidated strings to the engine's metaState setters, which fatalError (uncatchable) on malformed input. A single stale/foreign RotatingKVCache file would crash the provider. Pre-validate and throw → cold miss.

4. SSD path traversal (B) — loadFromSSD joined the unauthenticated index's entry.relativePath to cacheDir (../ could escape). Now the path is reconstructed deterministically from the trusted binding + digest; the stored relativePath is ignored.

5–6. Two residual uncatchable-crash gaps (found reviewing the fixes) — same class as G2, still reachable: the state setter also fatalErrors on array-count≠2 (only aggregate chunk count was checked), and MLXArray(data:shape:dtype:) hard-traps when shape.product*dtype.size != byteCount (or on a negative/overflowing dim). Both now throw → cold miss.

Refuted (no action)

Torn/orphan-file "permanent defeat" (atomic write + self-heal on re-store), serializer fatalError on inconsistent layout (no producible/forgeable input reaches it — GCM AAD + chunk-size pins), layout.version skew (single writer, version gate), TTL/expiry "stale forever" (reserved field, not a control; not the seed class), unbounded disk growth (documented P6, unwired path).

Coverage

100 provider-swift KV tests + 11 submodule prefix-cache tests green. New regressions: cold-saturated-pool no-overlap, validateLayout accept/reject, malformed-metaState (3), wrong-array-count, shape/byte-length mismatch, negative-dim, malicious-relativePath ignored; the MB-1 and prefix-hash manager tests were rewritten to exercise the residual threats (model-dir-prefix collision; same-dir on-disk swap) now that the path is reconstructed.

Commits: cc79368 (submodule), da3fb6a3, 413c3082.

New docs/ssd-kv-cache.md documents the encrypted SSD KV cache as it actually runs: the two tiers (wired engine block cache + the built-but- unwired checkpoint manager), the store/evict/load data path, the DBKV file format, the SE-wrapped-KEK / per-file-DEK envelope, KVCacheSerializer, the DARKBLOOM_PREFIX_CACHE flag + prerequisites, the load-path verification ladder (model/shape/prefix-hash/path/tensor-shape/decode-safety, all fail closed to a cold miss), exact-checkpoint matching, failure modes, on-disk layout, the TB-007 security model, and a code/test map. Cross-linked from the design doc, which remains the rationale/threat-model/phased plan.

…P2s) Addresses 3 Codex P2 findings on the encrypted SSD KV cache (all only matter when DARKBLOOM_PREFIX_CACHE is enabled, default off): - Weight-hash binding (#1): the cache dir key and metadata modelHash were derived from the mutable modelId. A re-download under the same id with different weights left old encrypted KV that still passed the model/shape guards and could seed generation with stale-weight KV, corrupting output. Now bound to ModelInfo.weightHash when available (falls back to modelId) for both the directory and the MB-1 binding, so a weight change yields a fresh dir + binding and old KV is invalidated. - Memory budget (#2): the engine block cache used a fixed maxBlocks=4096 (4096*256 tokens of KV held OUTSIDE the scheduler kvBudget) -> tens to hundreds of GB on large models -> OOM even though admission thinks requests fit. maxBlocks is now derived from a memory budget (DARKBLOOM_PREFIX_CACHE_MAX_GB, default 1/8 physical RAM) and the model's per-token KV bytes; the cache disables itself if even one block would not fit. - Disk eviction (#3): EncryptedPrefixCachePersistence wrote a file per evicted block and never cleaned up, growing until the volume filled (breaking later cache writes and model downloads). Added an LRU byte-budget sweep (DARKBLOOM_PREFIX_CACHE_DISK_GB, default 10 GB; 0 = unlimited), amortized so the directory scan does not run on every block. Tests: prefixCacheBindingId / prefixCacheMaxBlocks helpers (weight binding + memory scaling/clamp); disk-budget eviction keeps usage within budget while evicting oldest. 101 KV + 10 BatchScheduler tests green.

… doc - threat-model.yaml (#4): the SE-key block was stale relative to the code it documents as the KV-cache security sign-off record. Corrected the label v1 -> v2 (current defaultLabel; legacy v1 is migrated on load) and documented that the same persistent SE key now performs ECIES unwrap/decryption for the KV-cache KEK, not only ECDSA signing -- so the decryption capability granted to entitled same-team binaries and the actual keychain item to audit are both visible. - ssd-kv-cache.md: synced to the weight-hash binding, memory-bounded maxBlocks, and disk-eviction behavior, plus the two new budget env vars (DARKBLOOM_PREFIX_CACHE_MAX_GB / _DISK_GB).

anupsv · 2026-05-29T19:38:44Z

Codex review (round 2) — verified + addressed

All four P2 findings verified against the code and fixed. Commits 2ebbcea0 (code+tests) and 3ed78753 (docs).

#1 — Bind prefix cache to weight hash (BatchScheduler:320): real, fixed.
Confirmed: the dir key and metadata.modelHash came from the mutable modelId, so a re-download under the same id with different weights left old KV that still passes the model/shape guards → stale-weight KV could seed generation. Now bound to ModelInfo.weightHash when the catalog provides it (falls back to modelId), for both the on-disk directory and the MB-1 binding — a weight change yields a fresh dir + binding, so old KV is neither found nor accepted. (ModelInfo.weightHash was already available at the load-model call site; threaded through loadModel → makeBatchedEngine → makeEncryptedPrefixPersistenceIfEnabled.)

#2 — Bound in-memory prefix cache by KV budget (BatchScheduler:235): real, fixed.
Confirmed: maxBlocks=4096 × blockSize=256 ≈ ~400 GB of KV worst-case, held outside the scheduler's kvBudget. maxBlocks is now derived from a memory budget (DARKBLOOM_PREFIX_CACHE_MAX_GB, default 1/8 physical RAM) ÷ the model's kvBytesPerToken (reusing the existing estimator), clamped to the old ceiling; the cache disables itself if even one block won't fit.

#3 — Add eviction for persisted prefix blocks (EncryptedPrefixCachePersistence:75): real, fixed.
Confirmed: the wired backend wrote a file per evicted block with no cleanup → unbounded growth. Added an LRU byte-budget sweep (DARKBLOOM_PREFIX_CACHE_DISK_GB, default 10 GB; 0 = unlimited), evicting oldest .darkbloom-kv by mtime, amortized so the dir scan doesn't run on every block. (Note: an earlier review pass had refuted "unbounded growth" — but that was about the unwired PrefixCacheManager tier; Codex correctly targets the wired EncryptedPrefixCachePersistence, which did need this.)

#4 — Correct SE key label and allowed operations (threat-model.yaml:288): real, fixed.
Confirmed stale: defaultLabel is ...attestation-signing.v2 (doc said v1), and the same persistent SE key is now also used by PersistentEnclaveKey+ECIES for KEK unwrap/decryption (doc said "only signing operations are available"). Corrected the label (v1→v2, legacy migrated on load) and documented the ECIES decryption capability + the keychain item to audit, in all three places the doc referenced the key.

Tests: prefixCacheBindingId / prefixCacheMaxBlocks (weight binding + memory scaling/clamp) and disk-budget eviction (stays within budget, evicts oldest). 101 KV + 10 BatchScheduler tests green. The how-it-works doc (docs/ssd-kv-cache.md) is synced to all three behaviors + the new env vars.

The on-disk prefix-cache budget defaulted to a flat 10 GB regardless of the actual disk. Make the default 50% of the cache volume's free capacity, measured live at model load (volumeAvailableCapacityForImportantUsage, falling back to raw available capacity). The DARKBLOOM_PREFIX_CACHE_DISK_GB env override still wins (0 = unlimited); a near-full disk yields a tiny positive budget (evict-almost-everything) rather than the env's 0-means-unlimited; if free space can't be read, falls back to 10 GB. Split into a pure, testable policy (resolveDiskBudget) + a free-space probe (volumeFreeBytes). Tests: 50%-of-free, env override, env-0-unlimited, near-full -> tiny positive, unknown -> 10 GB fallback, and a live free-capacity read. 113 KV + BatchScheduler tests green.

…versized writes Address the adversarial review of the budget/binding changes. - Env crash (medium, x2): prefixCacheBudgetBytes (MAX_GB) and resolveDiskBudget (DISK_GB) did Int(gb * 2^30) guarded only by a sign check. Double("inf")/"1e400"/huge values pass and Int(Double) TRAPS on non-finite/overflow (uncatchable) -> provider crash on model load. MAX_GB is read unconditionally to size maxBlocks, so it crashed even with the cache OFF. Now reject non-finite / out-of-range values back to the default. Extracted a pure resolveMemoryBudget mirror for testability. - Orphaned directories (low): keying the cache dir by weightHash (the prior weight-binding fix) created a fresh, never-swept directory on every re-download. Key the dir by the MODEL id instead (stable) and keep the MB-1 binding on the weight hash: stale-weight files are rejected AND deleted by loadBlock on access and aged out by the sweep — invalidation without leaking dirs. - Write-then-delete treadmill (low): when a single block exceeds the disk budget (tiny env value / near-full disk) saveBlock wrote then immediately evicted it. Skip the write up front instead. Tests: resolveMemoryBudget + resolveDiskBudget now cover inf/NaN/overflow (would have trapped pre-fix); write-skip-when-oversized; delete-on-MB1- mismatch. Docs: corrected the sample log line, the dir-key scheme, and an explicit per-model (not global), measured-once disk-bounding note. 116 KV + BatchScheduler tests green.

Gajesh2007 and others added 14 commits May 27, 2026 22:29

anupsv added 2 commits May 28, 2026 19:39

vercel Bot deployed to Preview – d-inference-landing May 29, 2026 02:40 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 29, 2026 02:41 View deployment

vercel Bot deployed to Preview – d-inference May 29, 2026 02:41 View deployment

anupsv requested a deployment to benchmarks May 29, 2026 03:01 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference-landing May 29, 2026 03:01 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 29, 2026 03:02 View deployment

vercel Bot deployed to Preview – d-inference May 29, 2026 03:02 View deployment

anupsv marked this pull request as ready for review May 29, 2026 03:26

anupsv requested a deployment to benchmarks May 29, 2026 03:36 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference May 29, 2026 07:31 View deployment

chatgpt-codex-connector Bot reviewed May 29, 2026

View reviewed changes

anupsv requested a deployment to benchmarks May 29, 2026 07:44 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference-landing May 29, 2026 07:44 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 29, 2026 07:45 View deployment

vercel Bot deployed to Preview – d-inference May 29, 2026 07:45 View deployment

anupsv requested a deployment to benchmarks May 29, 2026 17:05 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference-landing May 29, 2026 17:05 View deployment

vercel Bot deployed to Preview – d-inference May 29, 2026 17:05 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 29, 2026 17:05 View deployment

anupsv added 2 commits May 29, 2026 12:38

anupsv requested a deployment to benchmarks May 29, 2026 19:38 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference-landing May 29, 2026 19:38 View deployment

vercel Bot deployed to Preview – d-inference May 29, 2026 19:39 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 29, 2026 19:39 View deployment

anupsv requested a deployment to benchmarks May 29, 2026 19:55 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference-landing May 29, 2026 19:55 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 29, 2026 19:56 View deployment

vercel Bot deployed to Preview – d-inference May 29, 2026 19:56 View deployment

anupsv requested a deployment to benchmarks May 29, 2026 21:21 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference-landing May 29, 2026 21:21 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 29, 2026 21:22 View deployment

vercel Bot deployed to Preview – d-inference May 29, 2026 21:22 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(kv-cache): encrypted SSD KV cache — primitives, verification, and flag-gated engine integration#237

feat(kv-cache): encrypted SSD KV cache — primitives, verification, and flag-gated engine integration#237
anupsv wants to merge 27 commits into
masterfrom
feat/ssd-kv-cache

anupsv commented May 29, 2026 •

edited

Loading

Uh oh!

vercel Bot commented May 29, 2026 •

edited

Loading

Uh oh!

anupsv commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Uh oh!

anupsv commented May 29, 2026

Uh oh!

anupsv commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anupsv commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in it

Verified before building

⚠️ SECURITY — TB-007 (must read before enabling)

Notes / follow-ups

Uh oh!

vercel Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anupsv commented May 29, 2026

Hardware validation — M5 Max (m5-max-128gb-1, macOS 26.4.1, Swift 6.3.1)

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Trust boundaries touched

Per-threat assessment

New attack surface NOT covered by existing threats

Open findings resolved by this PR

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

anupsv commented May 29, 2026

Follow-up adversarial review — confirming the Codex bug class isn't lurking elsewhere

Fixed

Refuted (no action)

Coverage

Uh oh!

anupsv commented May 29, 2026

Codex review (round 2) — verified + addressed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anupsv commented May 29, 2026 •

edited

Loading

vercel Bot commented May 29, 2026 •

edited

Loading

Hardware validation — M5 Max (`m5-max-128gb-1`, macOS 26.4.1, Swift 6.3.1)

github-actions Bot commented May 29, 2026 •

edited

Loading