diff --git a/CHANGELOG.md b/CHANGELOG.md index 194184c..77fde54 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,38 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added +- **CODE-EVAL.md** — Forensic architectural audit (zero-knowledge code extraction, critical assessment, roadmap reconciliation, prescriptive blueprint). +- **M16 Capstone** — New milestone in ROADMAP.md addressing all 9 audit flaws and 10 concerns (C1–C10). 13 task cards, ~698 LoC, ~21h estimated. +- **Concerns C8–C10** — Three new architectural concerns identified by the audit: crypto adapter LSP violation (C8), FixedChunker quadratic allocation (C9), encrypt-then-chunk dedup loss (C10). +- **CasError codes** — `RESTORE_TOO_LARGE` and `ENCRYPTION_BUFFER_EXCEEDED` registered in canonical error code table. + +### Changed +- **VaultService test observability wiring** — `VaultService.test.js` now passes a `mockObservability()` port to all tests instead of relying on the silent no-op default. `rotateVaultPassphrase.test.js` now passes `SilentObserver` explicitly. If observability wiring breaks, the test suite will catch it. +- **`NodeCryptoAdapter.encryptBuffer` JSDoc** — `@returns` annotation corrected to `Promise<...>`, matching the async implementation. +- **`maxRestoreBufferSize` documented** — constructor JSDoc and `#config` type in `ContentAddressableStore` now include the parameter. +- **ROADMAP.md heading level** — added `## Task Cards` heading between `# M16` and `### 16.1` to satisfy MD001 heading-increment rule. + +### Fixed +- **Post-decompression size guard** — `_restoreBuffered` now enforces `maxRestoreBufferSize` after decompression, not just before. Compressed payloads that inflate beyond the configured limit now throw `RESTORE_TOO_LARGE` instead of silently allocating unbounded memory. +- **CLI passphrase prompt deferral** — `resolveEncryptionKey` now checks vault metadata before calling `resolvePassphrase`, avoiding unnecessary TTY prompts for unencrypted vaults. Store action recipient-conflict check inspects flags/env without consuming stdin. +- **CRLF passphrase normalization** — `readPassphraseFile` now strips trailing `\r\n` (Windows line endings) in addition to `\n`, preventing passphrase mismatches from Windows-edited files. +- **Constructor validation** — `CasService.maxRestoreBufferSize` (integer >= 1024), `WebCryptoAdapter.maxEncryptionBufferSize` (finite, positive), and `FixedChunker.chunkSize` (positive integer) are now validated at construction time, preventing silent misconfiguration. +- **Error-path test hardening** — `orphanedBlobs`, `restoreGuard`, `kdfBruteForce`, and `conformance` tests now fail explicitly when expected errors are not thrown (previously silent pass-through). +- **16.8 — CasError portability guard** — `Error.captureStackTrace` now guarded with a runtime check. CasError constructs correctly on runtimes where `captureStackTrace` is unavailable (e.g. Firefox, older Deno). +- **16.9 — Pre-commit hook + hooks directory** — `scripts/git-hooks/` renamed to `scripts/hooks/` per CLAUDE.md convention. New `pre-commit` hook runs lint gate. `install-hooks.sh` updated accordingly. +- **16.1 — Crypto adapter behavioral normalization** — `NodeCryptoAdapter.encryptBuffer` now returns a Promise (was sync), matching Bun/Web. `decryptBuffer` validates key on all adapters. `NodeCryptoAdapter.createEncryptionStream` guards `finalize()` with `STREAM_NOT_CONSUMED`. New conformance test suite asserts identical contracts across all adapters. +- **16.2 — Memory restore guard** — `CasService` accepts `maxRestoreBufferSize` (default 512 MiB). `_restoreBuffered` throws `RESTORE_TOO_LARGE` with `{ size, limit }` meta when encrypted/compressed restore would exceed the limit. Unencrypted streaming restore is unaffected. +- **16.11 — Passphrase input security** — New `--vault-passphrase-file ` CLI option reads passphrase from file (use `-` for stdin). Interactive TTY prompt added as fallback when no other passphrase source is available. `resolvePassphrase` is now async with priority: file → flag → env → TTY → undefined. +- **16.6 — Chunk size upper bound** — CasService, FixedChunker, and CdcChunker now reject chunk sizes exceeding 100 MiB. CasService logs a warning when chunk size exceeds 10 MiB. +- **16.3 — Web Crypto encryption buffer guard** — `WebCryptoAdapter` accepts `maxEncryptionBufferSize` (default 512 MiB). Throws `ENCRYPTION_BUFFER_EXCEEDED` when streaming encryption exceeds the limit, since Web Crypto AES-GCM is a one-shot API. NodeCryptoAdapter uses true streaming and is unaffected. +- **16.5 — Encrypt-then-chunk dedup warning** — `CasService.store()` now logs a warning when encryption is combined with CDC chunking, since ciphertext is pseudorandom and content-defined boundaries provide no dedup benefit. +- **16.10 — Orphaned blob tracking** — `STREAM_ERROR` now includes `meta.orphanedBlobs` — an array of OIDs for blobs successfully written before the stream failure. Error metric includes `orphanedBlobs` count for observability. +- **16.4 — FixedChunker pre-allocated buffer** — Replaced `Buffer.concat()` loop with a pre-allocated `Buffer.allocUnsafe(chunkSize)` working buffer, eliminating O(n²) copies for many small input buffers. Matches the allocation strategy used by `CdcChunker`. +- **16.7 — Lifecycle method naming** — Added `inspectAsset()` (replaces `deleteAsset()`) and `collectReferencedChunks()` (replaces `findOrphanedChunks()`) as canonical names on both `CasService` and the facade. Old names are preserved as deprecated aliases that emit observability warnings. Type definitions updated with `@deprecated` JSDoc. +- **16.12 — KDF brute-force awareness** — `CasService` now emits `decryption_failed` metric with slug context when decryption fails with `INTEGRITY_ERROR` during encrypted restore. CLI adds a 1-second delay after `INTEGRITY_ERROR` to slow brute-force attempts. Library API imposes no delay — callers manage their own rate-limiting policy. +- **16.13 — GCM nonce collision docs + encryption counter** — `SECURITY.md` moved to project root with new sections: GCM nonce bound (2^32 NIST limit), key rotation frequency, KDF parameter guidance, and passphrase entropy recommendations. Vault metadata now tracks `encryptionCount`, incremented per encrypted `addToVault()`. Observability warning emitted when count exceeds 2^31. `VaultService` accepts optional `observability` port. + ## [5.2.4] — Prism polish (2026-03-03) ### Fixed diff --git a/CODE-EVAL.md b/CODE-EVAL.md new file mode 100644 index 0000000..3ff5cce --- /dev/null +++ b/CODE-EVAL.md @@ -0,0 +1,605 @@ +# Forensic Architectural Audit: `@git-stunts/git-cas` + +**Audit Date:** 2026-03-03 +**Repository State:** `0f7f8e658e6cd094176541ac68d33b2a6ec75a91` (HEAD, `main`) +**Auditor:** Claude Opus 4.6, operating under zero-knowledge forensic protocol +**Version Under Audit:** 5.2.4 + +--- + +## Activity Log — Discovery Narrative + +The exploration began at the repository root with a simultaneous five-pronged dive: core domain services, infrastructure adapters, ports/codecs/chunkers, test structure, and type definitions. The first thing that jumped out — before reading a single line of code — was the file tree. Thirty-one source files, twelve bin files, sixty-one test files. A 3.1:1 test-to-source ratio. That alone telegraphs intent: someone cares about correctness here. + +The ports directory was my Rosetta Stone. Six abstract base classes — `CryptoPort`, `CodecPort`, `GitPersistencePort`, `GitRefPort`, `ObservabilityPort`, `ChunkingPort` — each throwing `'Not implemented'`. Textbook hexagonal architecture. I already knew this was a ports-and-adapters system before reading a single service file. + +`CasService.js` at 911 lines is the gravitational center. It imports no infrastructure directly — only ports. Good. `KeyResolver.js` (220 lines) handles all cryptographic key orchestration, recently extracted from CasService (the M15 Prism task card confirmed this). `VaultService.js` (467 lines) operates on a separate Git ref (`refs/cas/vault`) with compare-and-swap concurrency control. + +The three crypto adapters (`NodeCryptoAdapter`, `WebCryptoAdapter`, `BunCryptoAdapter`) are where I started changing my initial opinions. I expected copy-paste sloppiness — instead I found runtime-specific optimizations (Bun's native `CryptoHasher`, Web Crypto's `subtle` API) all converging on identical cryptographic parameters: AES-256-GCM, 12-byte nonce, 16-byte tag, SHA-256 content hashing. But the behavioral discrepancies between adapters (see Phase 2) tell a more nuanced story. + +The CDC chunker (`CdcChunker.js`) surprised me. A hand-rolled buzhash rolling hash with a 64-byte sliding window, xorshift64-seeded lookup table, and three-phase processing pipeline (fill window, feed pre-minimum, scan boundary). This is not commodity code — it's a bespoke content-defined chunking engine. + +The test suite confirmed the architecture: 833+ unit tests, crypto is never mocked (always real adapters), persistence is always mocked (in-memory maps), integration tests gate on Docker (`GIT_STUNTS_DOCKER=1`). The fuzz testing coverage is noteworthy — 50-iteration fuzz rounds for crypto, chunking, and store/restore. + +The CLI (`bin/git-cas.js`, 657 lines) implements a full TEA (The Elm Architecture) interactive dashboard. That's architecturally ambitious for a storage utility. + +My opinion shifted most dramatically on the vault system. I initially expected a simple key-value store backed by a file. Instead, it's a full commit chain on `refs/cas/vault` with optimistic concurrency control, exponential backoff retries, percent-encoded slug names, and atomic compare-and-swap ref updates. This is distributed-systems thinking applied to a local Git repo. + +--- + +## Phase 1: Zero-Knowledge Code Extraction + +### Deduced Value Proposition + +This system is a **content-addressed storage engine that uses Git's object database as its persistence layer**, with optional AES-256-GCM encryption, gzip compression, content-defined chunking, and a vault-based indexing system backed by Git refs. + +The core problem it solves: **storing, encrypting, versioning, and retrieving binary blobs entirely within Git's native object model** — no external servers, no sidecar databases, no LFS endpoints. Everything lives in `.git/objects` and is transportable via standard Git push/pull/clone. + +### Comprehensive Feature Set (Implemented) + +1. **Store**: Chunk a byte stream (fixed-size or CDC), optionally compress (gzip), optionally encrypt (AES-256-GCM), write chunks as Git blobs, produce a manifest. +2. **Restore**: Read chunks from Git blobs, verify SHA-256 integrity, decrypt, decompress, reassemble. +3. **Streaming Restore**: `restoreStream()` yields chunks as an async iterable — O(chunk_size) memory for unencrypted data. +4. **Content-Defined Chunking (CDC)**: Buzhash rolling hash with configurable min/max/target sizes. Deduplication-friendly. +5. **Fixed-Size Chunking**: Default 256 KiB, configurable. +6. **Merkle Tree Manifests**: Automatic manifest splitting when chunk count exceeds threshold (default 1000). Sub-manifest references with startIndex/chunkCount. +7. **Envelope Encryption**: DEK/KEK model. Random 32-byte DEK encrypts data; each recipient's KEK wraps the DEK independently. +8. **Multi-Recipient Management**: Add/remove recipients without re-encrypting data. +9. **Key Rotation**: Re-wrap DEK with new KEK. No data re-encryption — O(1) key rotation. +10. **Passphrase-Based Encryption**: PBKDF2 or scrypt KDF with configurable parameters. +11. **Vault System**: Git-ref-backed (`refs/cas/vault`) content registry with CAS (compare-and-swap) concurrency control. +12. **Vault Passphrase Rotation**: Re-wrap all envelope-encrypted vault entries with a new passphrase-derived KEK. +13. **Integrity Verification**: Per-chunk SHA-256 + GCM auth tag for encrypted data. +14. **Orphan Detection**: `findOrphanedChunks()` — reference-counting analysis across vault entries. +15. **Codec Pluggability**: JSON (human-readable) or CBOR (compact binary) manifests. +16. **Multi-Runtime Support**: Node.js 22, Bun, Deno — with runtime-specific crypto adapters. +17. **Observability**: Structured metrics (`chunk:stored`, `file:stored`, `integrity:pass/fail`), log levels, span tracing. +18. **CLI**: 18 commands including store, restore, verify, inspect, rotate, vault management, and an interactive TEA dashboard. +19. **Parallel I/O**: Semaphore-bounded concurrent blob writes (store) and read-ahead window (restore). +20. **File I/O Helpers**: `storeFile()` / `restoreFile()` for file-to-file convenience. + +### API Surface & Boundary + +**Public entrypoints** (as defined by package.json/jsr.json exports): + +| Entrypoint | Module | Primary Export | +|---|---|---| +| `.` (root) | `index.js` | `ContentAddressableStore` facade class | +| `./service` | `src/domain/services/CasService.js` | `CasService` (direct domain access) | +| `./schema` | `src/domain/schemas/ManifestSchema.js` | Zod schemas (ManifestSchema, ChunkSchema, etc.) | + +**Facade API** (`ContentAddressableStore`): + +| Method | Return | +|---|---| +| `store(options)` | `Promise` | +| `restore(options)` | `Promise<{ buffer, bytesWritten }>` | +| `restoreStream(options)` | `AsyncIterable` | +| `createTree(options)` | `Promise` (tree OID) | +| `readManifest(options)` | `Promise` | +| `verifyIntegrity(options)` | `Promise` | +| `deleteAsset(options)` | `Promise<{ slug, chunksOrphaned }>` | +| `findOrphanedChunks(options)` | `Promise<{ referenced, total }>` | +| `rotateKey(options)` | `Promise` | +| `addRecipient(options)` | `Promise` | +| `removeRecipient(options)` | `Promise` | +| `listRecipients(manifest)` | `string[]` | +| `deriveKey(options)` | `Promise<{ key, salt, params }>` | +| `getVaultService()` | `VaultService` | +| `rotateVaultPassphrase(options)` | `Promise<{ commitOid, rotatedSlugs, skippedSlugs }>` | + +**External system interface:** +- **Ingress**: File paths, byte streams (`AsyncIterable`), encryption keys (32-byte `Buffer`), passphrases (strings), vault slugs (strings). +- **Egress**: Git blob/tree OIDs (40-char hex strings), `Manifest` value objects, byte buffers, vault entries. +- **Infrastructure boundary**: All Git operations flow through `@git-stunts/plumbing` → `git` CLI subprocess. + +### Internal Architecture & Components + +``` +┌─────────────────────────────────────────────────────────┐ +│ ContentAddressableStore (index.js) — Facade │ +│ Wires ports, exposes unified API │ +└──────────────────────┬──────────────────────────────────┘ + │ + ┌─────────────┼──────────────┐ + │ │ │ +┌────────▼──────┐ ┌────▼─────┐ ┌─────▼──────────────────┐ +│ CasService │ │ Vault │ │ rotateVaultPassphrase │ +│ (911 lines) │ │ Service │ │ (standalone function) │ +│ │ │(467 lines│ └────────────────────────┘ +│ ┌───────────┐ │ └──────────┘ +│ │KeyResolver│ │ +│ │(220 lines)│ │ +│ └───────────┘ │ +└───────┬───────┘ + │ depends on (ports only) + ┌─────┼──────┬──────────┬────────────┐ + │ │ │ │ │ +┌─▼─┐ ┌▼──┐ ┌─▼──┐ ┌────▼────┐ ┌─────▼─────┐ +│Git│ │Git│ │Cry-│ │Observ- │ │Chunking │ +│Per│ │Ref│ │pto │ │ability │ │Port │ +│sis│ │Port│ │Port│ │Port │ │ │ +│ten│ │ │ │ │ │ │ │ │ +│ce │ │ │ │ │ │ │ │ │ +└─┬─┘ └─┬─┘ └──┬─┘ └────┬───┘ └─────┬─────┘ + │ │ │ │ │ + ▼ ▼ ▼ ▼ ▼ +┌───────────────────────────────────────────────┐ +│ Infrastructure Adapters │ +│ │ +│ GitPersistenceAdapter NodeCryptoAdapter │ +│ GitRefAdapter WebCryptoAdapter │ +│ FileIOHelper BunCryptoAdapter │ +│ EventEmitterObserver │ +│ JsonCodec / CborCodec SilentObserver │ +│ FixedChunker StatsCollector │ +│ CdcChunker │ +└───────────────────────────────────────────────┘ +``` + +The dependency direction is strictly inward: domain depends on ports (interfaces), infrastructure depends on ports (implements). The facade wires them together. No domain module imports any infrastructure module. + +### Mechanics & Internals + +#### Algorithms + +**Content-Defined Chunking (Buzhash):** +- Rolling hash over a 64-byte sliding window. +- Lookup table: 256-entry `Uint32Array` generated via xorshift64 PRNG seeded with `0x6a09e667f3bcc908` (SHA-256's first fractional prime constant — a nice touch). +- Hash update: `hash = (rotl32(hash, 1) ^ table[outgoing] ^ table[incoming]) >>> 0`. +- Boundary detection: `(hash & mask) === 0` where `mask = (1 << floor(log2(targetChunkSize))) - 1`. +- Three-phase pipeline: fill window (first 64 bytes), feed pre-minimum (accumulate until min chunk size), scan boundary (check on each byte until boundary or max). +- **Complexity**: O(n) where n = input bytes. Each byte requires one table lookup, one XOR, one rotate. The mask test is O(1). + +**Encryption:** +- AES-256-GCM with 12-byte random nonce and 16-byte authentication tag. +- Streaming encryption wraps the chunk pipeline (encrypt-then-chunk: the ciphertext is chunked, not the plaintext). +- DEK wrapping uses the same AES-256-GCM as data encryption — the DEK is treated as a 32-byte plaintext. + +**Key Derivation:** +- PBKDF2-HMAC-SHA-512 (default 100,000 iterations) or scrypt (default N=16384, r=8, p=1). +- Salt: 32 bytes random, stored in manifest. + +**Integrity:** +- SHA-256 digest per chunk (computed at store time, verified at restore time). +- GCM authentication tag for encrypted data (verified during decryption). +- Manifests validated by Zod schemas at construction time. + +#### Storage & Data Structures + +**Git Object Database:** +- Chunks stored as Git blobs via `git hash-object -w --stdin`. +- Manifests stored as Git blobs (JSON or CBOR encoded). +- Trees constructed via `git mktree` with mode `100644 blob` entries. +- Vault state stored as a commit chain on `refs/cas/vault`: + - Each commit points to a tree containing: `.vault.json` metadata blob + one `040000 tree` entry per vault slug. + +**In-Memory:** +- `Manifest` and `Chunk` are frozen value objects (immutable after construction). +- `Semaphore` uses a FIFO queue of promise resolvers. +- `StatsCollector` accumulates metrics in private fields. +- CDC chunker allocates a `Buffer.allocUnsafe(maxChunkSize)` working buffer per `chunk()` invocation. + +#### Memory Management + +**Store path:** +- Semaphore-bounded: at most `concurrency` chunk buffers in flight simultaneously. +- CDC chunker holds one `maxChunkSize` working buffer (~1 MiB default) plus the 64-byte sliding window. +- After chunking, the working buffer is copied via `Buffer.from(subarray)` — no aliasing. + +**Restore path (streaming, unencrypted):** +- Read-ahead window: up to `concurrency` chunk-sized buffers in memory. +- Chunks are yielded and become eligible for GC immediately after consumption. + +**Restore path (buffered, encrypted/compressed):** +- **All chunks are concatenated into a single buffer before decryption.** This is the documented memory amplification concern (Roadmap C1). A 1 GB encrypted file requires ~1 GB in memory for decryption, plus the decrypted result. + +**Web Crypto streaming encryption:** +- The `createEncryptionStream` on `WebCryptoAdapter` **buffers the entire stream** internally because Web Crypto's AES-GCM is a one-shot API. This silently converts O(chunk_size) memory to O(total_file_size) memory on Deno (Roadmap C4). + +#### Performance Characteristics + +| Operation | Time Complexity | Space Complexity | Blocking? | +|---|---|---|---| +| Store (fixed chunking) | O(n) | O(concurrency × chunkSize) | Git subprocess I/O | +| Store (CDC chunking) | O(n) | O(maxChunkSize + concurrency × chunkSize) | Git subprocess I/O | +| Restore (streaming, plain) | O(n) | O(concurrency × chunkSize) | Git subprocess I/O | +| Restore (buffered, encrypted) | O(n) | **O(n)** — full file in memory | Git subprocess I/O + decrypt | +| createTree (v1, < threshold) | O(k) where k = chunks | O(k) for tree entries | Git subprocess | +| createTree (v2, Merkle) | O(k) | O(k / threshold) sub-manifests | Git subprocess | +| readManifest (v2) | O(k) | O(sub-manifest count) reads | Git subprocess × sub-manifests | +| Key rotation | O(1) | O(1) — only re-wraps DEK | Constant | +| Vault CAS update | O(entries) | O(entries) for tree rebuild | Git subprocess | +| CDC boundary scan | O(n) per byte | O(1) per byte (table lookup + XOR) | CPU-bound | + +**Critical bottleneck:** Git subprocess spawning. Every `writeBlob`, `readBlob`, `writeTree`, `readTree` operation spawns a `git` child process. For a file with 1000 chunks at concurrency 4, that's ~1000 `git hash-object` invocations + ~1000 `git cat-file` invocations on restore. The `@git-stunts/plumbing` layer mitigates this somewhat but cannot eliminate the per-operation process overhead. + +--- + +## Phase 2: The Critical Assessment + +### Use Cases & Fitness + +**Optimized for:** +- Single-file binary asset storage (firmware images, data bundles, encrypted archives) in the 1 KB to ~500 MB range. +- Git monorepos where binary assets must travel with the code. +- Air-gapped or offline environments where external services are unavailable. +- Multi-recipient access control without re-encrypting data. + +**Where it will break:** +- **Files > 1 GB encrypted**: The `_restoreBuffered` path requires the entire file in memory for decryption. A 4 GB file on a machine with 8 GB RAM will OOM. +- **High-frequency writes**: Each chunk write spawns a Git subprocess. At 1000 writes/second with process spawn overhead (~5ms each), you hit a ceiling of ~200 chunks/second single-threaded. +- **Large repositories (>10 GB)**: Git's own performance degrades with ODB size. `git gc` becomes slow, pack files grow. +- **Web Crypto runtime (Deno) with large files**: The streaming encryption adapter silently buffers the entire file due to Web Crypto API limitations. +- **Concurrent vault mutations from multiple processes**: The CAS retry mechanism (3 attempts, 50-200ms backoff) handles light contention but will fail under sustained concurrent writes. + +### Design Trade-offs + +**1. Git subprocess for every blob operation vs. libgit2/in-process Git** + +- **Evidence:** + + - **Claim:** Every blob read/write spawns a `git` child process via `@git-stunts/plumbing`. + - **Primary Evidence:** `src/infrastructure/adapters/GitPersistenceAdapter.js:11-17` (`writeBlob` calls `plumbing.execute`) + - **Supporting Context:** `plumbing.execute()` and `plumbing.executeStream()` spawn `git` subprocesses. + - **Discovery Path:** `index.js` → `GitPersistenceAdapter` → `plumbing.execute` → `git hash-object` + - **Cryptographic Proof:** `git hash-object src/infrastructure/adapters/GitPersistenceAdapter.js` = `797be53113174ff8e86104fa97afda0748dd3fce` + +- **Systemic effect:** Process spawn overhead (~2-10ms per invocation) dominates I/O for small chunks. A 100 MB file with 256 KiB chunks = ~400 subprocess invocations for store + ~400 for restore. The `Policy.timeout(30_000)` wrapper adds resilience but not performance. +- **Trade-off rationale:** Using the `git` CLI ensures correctness across all Git configurations (bare repos, worktrees, custom object stores, alternates) without reimplementing Git's object database. It also means zero native dependencies — critical for multi-runtime support. + +**2. Encrypt-then-chunk vs. chunk-then-encrypt** + +- **Evidence:** + + - **Claim:** Encryption wraps the source stream before chunking, meaning ciphertext is what gets chunked — not plaintext. + - **Primary Evidence:** `src/domain/services/CasService.js:store()` — encryption stream wraps source before passing to `_chunkAndStore`. + - **Supporting Context:** The encryption stream is created first (`crypto.createEncryptionStream(key)`), then the encrypted output is piped through the chunker. + - **Cryptographic Proof:** `git hash-object src/domain/services/CasService.js` = `9d1370ca88697992847c131bba7d74f726a2cd8c` + +- **Systemic effect:** CDC deduplication is **completely defeated** for encrypted data because AES-GCM ciphertext is pseudorandom — identical plaintext produces different ciphertext (random nonce). This means encrypted CDC-chunked files get zero deduplication benefit. The chunking metadata is still recorded in the manifest, but it serves no dedup purpose. +- **Trade-off rationale:** The alternative (chunk-then-encrypt) would require per-chunk nonces and auth tags, significantly complicating the manifest schema and increasing metadata overhead. The current design keeps crypto simple (one nonce, one tag, one DEK for the whole file). + +**3. Full-buffer decrypt vs. streaming decrypt** + +- **Evidence:** + + - **Claim:** Encrypted/compressed restores buffer the entire file before decryption. + - **Primary Evidence:** `src/domain/services/CasService.js:_restoreBuffered()` — concatenates all chunk buffers then calls `decrypt()`. + - **Cryptographic Proof:** `git hash-object src/domain/services/CasService.js` = `9d1370ca88697992847c131bba7d74f726a2cd8c` + +- **Systemic effect:** Memory usage is O(file_size) for encrypted restores. The `restoreStream()` API exists and is O(chunk_size) for plaintext, but encrypted paths silently degrade to O(n). +- **Trade-off rationale:** AES-256-GCM produces a single authentication tag for the entire ciphertext. Verifying the tag requires processing all ciphertext. Streaming authenticated decryption would require a different AEAD construction (e.g., STREAM from libsodium, or chunked AES-GCM with per-chunk tags). + +**4. Vault as Git commit chain vs. flat file** + +- **Evidence:** + + - **Claim:** The vault uses Git commits on `refs/cas/vault` with CAS (compare-and-swap) updates. + - **Primary Evidence:** `src/domain/services/VaultService.js:VAULT_REF`, `#casUpdateRef`, `#retryMutation` + - **Cryptographic Proof:** `git hash-object src/domain/services/VaultService.js` = `d5a1ac2b1a771e9a3a7ac1652c6f40e0f0cbffaa` + +- **Systemic effect:** Every vault mutation (add, remove, init) creates a new Git commit. This provides full audit history but grows the commit graph linearly. Over thousands of vault mutations, `git log refs/cas/vault` becomes slow. The CAS semantics handle concurrent writes gracefully but are limited to 3 retries with short backoff — insufficient for high-contention scenarios. +- **Trade-off rationale:** Using Git's native commit/ref mechanism means the vault is automatically included in `git push/pull/clone`. No separate sync mechanism needed. The audit trail is a natural consequence. + +**5. Semaphore-based concurrency vs. worker pool** + +- **Evidence:** + + - **Claim:** Parallel blob I/O uses a counting semaphore, not a proper worker/thread pool. + - **Primary Evidence:** `src/domain/services/Semaphore.js` — FIFO counting semaphore; `CasService.js:_chunkAndStore` — semaphore-guarded fan-out. + - **Cryptographic Proof:** `git hash-object src/domain/services/Semaphore.js` = `507ed14668364491797a68ed906b346b01ddd488` + +- **Systemic effect:** All concurrency is async I/O multiplexing on the event loop. There's no CPU parallelism for hashing or encryption. SHA-256 and AES-GCM run on the main thread (in Node.js). For CPU-bound workloads this is a bottleneck, but since the dominant cost is Git subprocess I/O, async concurrency is the correct choice. + +### Flaws & Limitations + +#### Flaw 1: Crypto Adapter Behavioral Inconsistencies + +- **Evidence:** + + - **Claim:** The three crypto adapters have inconsistent validation and error-handling behavior. + - **Primary Evidence:** `NodeCryptoAdapter.js:26-36`, `BunCryptoAdapter.js:25-44`, `WebCryptoAdapter.js:28-44` + - **Supporting Context:** + - `NodeCryptoAdapter.encryptBuffer` is synchronous; `BunCryptoAdapter.encryptBuffer` and `WebCryptoAdapter.encryptBuffer` are async. + - `BunCryptoAdapter.decryptBuffer` calls `_validateKey(key)`; `NodeCryptoAdapter.decryptBuffer` and `WebCryptoAdapter.decryptBuffer` do not. + - `NodeCryptoAdapter.createEncryptionStream` has no premature-finalize guard; Bun and Web adapters throw `CasError('STREAM_NOT_CONSUMED')`. + - **Cryptographic Proof:** + - `git hash-object src/infrastructure/adapters/NodeCryptoAdapter.js` = `f89898c5ec1892dd965e6ed69ac5373883ed1650` + - `git hash-object src/infrastructure/adapters/BunCryptoAdapter.js` = `1d8b8ce4def9cd8be885e5065041dbe0a0b6d0ac` + - `git hash-object src/infrastructure/adapters/WebCryptoAdapter.js` = `5a70733d945387a8a8101013157811aa654958c6` + +- **Impact:** Liskov Substitution violation. Code that works correctly on Bun (where `decryptBuffer` validates the key type early) may fail with a cryptic `node:crypto` error on Node.js (where the key is passed directly to `createDecipheriv`). The missing premature-finalize guard on Node means a bug in stream consumption produces undefined behavior on Node but a clear error on Bun/Deno. +- **Severity:** Medium. The callers generally `await` all results (which papers over sync-vs-async), and CasService always calls `_validateKey` before encrypting. But the asymmetry is a maintenance hazard. + +#### Flaw 2: Memory Amplification on Encrypted Restore + +- **Evidence:** + + - **Claim:** Encrypted restores load the entire file into memory. + - **Primary Evidence:** `src/domain/services/CasService.js:_restoreBuffered()` — `Buffer.concat(chunkBuffers)` before `this.decrypt()`. + - **Cryptographic Proof:** `git hash-object src/domain/services/CasService.js` = `9d1370ca88697992847c131bba7d74f726a2cd8c` + +- **Impact:** Restoring a 1 GB encrypted file requires ~2 GB of heap (ciphertext buffer + plaintext output). No guard, no warning, no configurable limit. +- **Severity:** High for large files. The roadmap acknowledges this as concern C1 and estimates ~20 LoC to add a `maxRestoreBufferSize` guard. + +#### Flaw 3: Web Crypto Stream Buffering + +- **Evidence:** + + - **Claim:** `WebCryptoAdapter.createEncryptionStream` silently buffers the entire stream. + - **Primary Evidence:** `src/infrastructure/adapters/WebCryptoAdapter.js:64-84` — `const chunks = []; for await (const chunk of source) { chunks.push(chunk); } const buffer = Buffer.concat(chunks);` + - **Cryptographic Proof:** `git hash-object src/infrastructure/adapters/WebCryptoAdapter.js` = `5a70733d945387a8a8101013157811aa654958c6` + +- **Impact:** On Deno, `createEncryptionStream` provides a streaming API but has O(n) memory behavior. Users expect O(chunk_size) memory from a streaming API. This is deceptive. +- **Severity:** Medium. Deno is a secondary runtime, and the roadmap flags this as concern C4. + +#### Flaw 4: FixedChunker Quadratic Buffer Allocation + +- **Evidence:** + + - **Claim:** `FixedChunker.chunk()` uses `Buffer.concat()` in a loop, creating a new buffer allocation per input chunk. + - **Primary Evidence:** `src/infrastructure/chunkers/FixedChunker.js:20` — `buffer = Buffer.concat([buffer, data]);` + - **Cryptographic Proof:** `git hash-object src/infrastructure/chunkers/FixedChunker.js` = `1477e185f16730ad13028454cecb1fb2ac785889` + +- **Impact:** For a source that yields many small buffers (e.g., 4 KB network reads), `Buffer.concat([buffer, data])` is called for each read. This copies the accumulated buffer each time, yielding O(n^2/chunkSize) total memory copies where n is file size. In contrast, `CdcChunker` uses a pre-allocated working buffer with zero intermediate copies. +- **Severity:** Low in practice (the source is typically a file stream with 64 KiB reads), but architecturally inconsistent with the CDC chunker's careful buffer management. + +#### Flaw 5: CDC Deduplication Defeated by Encrypt-Then-Chunk + +- **Evidence:** + + - **Claim:** Encryption is applied before chunking, destroying content-addressable deduplication. + - **Primary Evidence:** `src/domain/services/CasService.js:store()` — encryption wraps source before `_chunkAndStore`. + - **Cryptographic Proof:** `git hash-object src/domain/services/CasService.js` = `9d1370ca88697992847c131bba7d74f726a2cd8c` + +- **Impact:** The primary value proposition of CDC is sub-file deduplication. For encrypted files, CDC provides zero dedup benefit over fixed chunking. Users who enable both encryption and CDC chunking get CDC's overhead (rolling hash computation) without its benefit. +- **Severity:** Medium. This is an inherent limitation of the encrypt-then-chunk design. Fixing it would require per-chunk encryption (chunk-then-encrypt), which is a significant architectural change. + +#### Flaw 6: No Upper Bound on Chunk Size + +- **Evidence:** + + - **Claim:** `FixedChunker` accepts any positive `chunkSize` value without an upper bound. + - **Primary Evidence:** `src/infrastructure/chunkers/FixedChunker.js:9` — no validation beyond ChunkingPort base. + - **Supporting Context:** `CdcChunker` has configurable `maxChunkSize` (default 1 MiB) but no hard upper limit either. `resolveChunker` validates `chunkSize > 0` for fixed but has no ceiling. + - **Cryptographic Proof:** `git hash-object src/infrastructure/chunkers/FixedChunker.js` = `1477e185f16730ad13028454cecb1fb2ac785889` + +- **Impact:** A user could set `chunkSize: 10 * 1024 * 1024 * 1024` (10 GB) and the system would attempt to buffer a 10 GB chunk. The roadmap flags this as concern C3. +- **Severity:** Low (user misconfiguration, not a bug in normal usage). + +#### Flaw 7: `deleteAsset` Is Misleadingly Named + +- **Evidence:** + + - **Claim:** `deleteAsset()` does not delete anything — it only reads metadata. + - **Primary Evidence:** `src/domain/services/CasService.js:deleteAsset()` — reads manifest and returns `{ slug, chunksOrphaned }`. + - **Cryptographic Proof:** `git hash-object src/domain/services/CasService.js` = `9d1370ca88697992847c131bba7d74f726a2cd8c` + +- **Impact:** API confusion. Similarly, `findOrphanedChunks()` doesn't find orphans — it finds referenced chunks. Both methods are analysis tools masquerading as lifecycle operations. +- **Severity:** Low (naming issue, not a functional defect). + +#### Flaw 8: Error.captureStackTrace Portability + +- **Evidence:** + + - **Claim:** `CasError` uses `Error.captureStackTrace` which is V8-specific. + - **Primary Evidence:** `src/domain/errors/CasError.js:5` — `Error.captureStackTrace(this, this.constructor);` + - **Cryptographic Proof:** `git hash-object src/domain/errors/CasError.js` = `6acc1da7e28ed698571f861900081d8b044cde57` + +- **Impact:** This is a no-op on non-V8 engines. Since the project targets Node (V8), Bun (JSC), and Deno (V8), it's a no-op on Bun's JavaScriptCore. Not a crash risk (it degrades gracefully), but indicates incomplete multi-runtime awareness. +- **Severity:** Negligible. + +#### Flaw 9: Missing pre-commit Hook + +- **Evidence:** + + - **Claim:** The project has a pre-push hook but no pre-commit hook. + - **Primary Evidence:** `scripts/git-hooks/pre-push` exists; `scripts/git-hooks/pre-commit` does not. + - **Supporting Context:** The CLAUDE.md global instructions specify that pre-commit should run lint. The hooks directory is also named `git-hooks` rather than the conventional `hooks` specified in CLAUDE.md. + +- **Impact:** Lint failures are not caught until push time. A developer can accumulate many unlinted commits before discovering issues. +- **Severity:** Low (process issue, not a code defect). + +### Innovation vs. Commodity + +**Novel or distinctive:** +1. **Git ODB as a CAS backend** — No other library treats Git's native object store as a general-purpose content-addressed storage layer with this level of sophistication (Merkle manifests, codec pluggability, vault indexing). +2. **Buzhash CDC implementation** — Hand-rolled, well-optimized, with a clever xorshift64 seeded table. Not copy-pasted from a library. +3. **DEK/KEK envelope encryption with zero-cost key rotation** — The key rotation model (re-wrap DEK, don't re-encrypt data) is architecturally elegant and matches the patterns used by KMS systems like AWS KMS. +4. **Vault as a Git commit chain** — Using Git refs for an atomic, auditable key-value store is creative. +5. **Multi-runtime JS with runtime-specific crypto** — Three crypto adapters targeting three JS runtimes is uncommon in the Node ecosystem. + +**Commodity:** +1. **AES-256-GCM encryption** — Standard AEAD construction, correctly implemented. +2. **PBKDF2/scrypt KDF** — Standard KDF choices with standard parameters. +3. **Zod schema validation** — Standard validation library, standard usage. +4. **Hexagonal architecture** — Well-known pattern, well-executed. +5. **Commander.js CLI** — Standard CLI framework, standard usage. + +**Assessment:** This codebase introduces genuinely novel abstractions (Git ODB as CAS, vault commit chain, zero-cost key rotation) while building on commodity cryptographic primitives. The combination is the innovation — not any individual component. + +--- + +## Phase 3: The Reality Check + +### Roadmap Reconciliation + +The roadmap lists 9 milestones (M7–M15). **All 9 are marked CLOSED.** There are zero open milestones. + +| Milestone | Roadmap Status | Verified in Code | Reconciliation | +|---|---|---|---| +| M7 Horizon | CLOSED (v2.0.0) | Yes — Merkle manifests (v2), compression, sub-manifests all implemented | Accurate | +| M8 Spit Shine | CLOSED (v4.0.1) | Yes — CryptoPort refactor, verify command, error handler all present | Accurate | +| M9 Cockpit | CLOSED (v4.0.1) | Yes — 18 CLI commands, --json flag, hints system all present | Accurate | +| M10 Hydra | CLOSED (v5.0.0) | Yes — CdcChunker with buzhash, resolveChunker, CDC params in manifest | Accurate | +| M11 Locksmith | CLOSED (v5.1.0) | Yes — addRecipient, removeRecipient, listRecipients, envelope encryption | Accurate | +| M12 Carousel | CLOSED (v5.2.0) | Yes — rotateKey, keyVersion tracking, DEK re-wrapping | Accurate | +| M13 Bijou | CLOSED (v3.1.0) | Yes — dashboard TUI, progress bars, encryption card, manifest view, heatmap | Accurate | +| M14 Conduit | CLOSED (v4.0.0) | Yes — restoreStream, ObservabilityPort, Semaphore, parallel I/O | Accurate | +| M15 Prism | CLOSED | Yes — async sha256 on NodeCryptoAdapter, KeyResolver extracted | Accurate | + +**Verdict: The roadmap is 100% accurate.** Every claimed milestone is verifiable in the codebase. No phantom features, no vaporware. This is unusual — most roadmaps overstate completion. + +### Backlog Triage + +The roadmap identifies 7 concerns (C1–C7) and 6 visions (V1–V6). Cross-referencing against Phase 2 findings: + +**Concerns already identified by the roadmap that Phase 2 confirmed:** + +| Concern | Roadmap Estimate | Phase 2 Finding | Agreement | +|---|---|---|---| +| C1: Memory amplification on encrypted restore | High severity, ~20 LoC | Flaw 2: Confirmed. O(n) memory for encrypted restores. | Full agreement | +| C2: Orphaned blob accumulation after STREAM_ERROR | Medium, ~20 LoC | Not independently discovered — the error handling drains promises correctly. Low priority. | Agreement on low urgency | +| C3: No upper bound on chunk size | Medium, ~6 LoC | Flaw 6: Confirmed. FixedChunker accepts any positive value. | Full agreement | +| C4: Web Crypto silent memory buffering | Medium, ~15 LoC | Flaw 3: Confirmed. `createEncryptionStream` buffers everything on Deno. | Full agreement | +| C5: Passphrase exposure in shell history | High, ~90 LoC | Not a code defect; architectural limitation of CLI passphrase flags. | Agreement | +| C6: No KDF brute-force rate limiting | Low, ~10 LoC | Not independently discovered. Low priority. | Agreement | +| C7: GCM nonce collision risk at scale | Low, ~20 LoC | Not practically exploitable. 2^48 encryptions needed for birthday bound on 96-bit nonce. | Agreement on low priority | + +**Critical architectural flaws from Phase 2 that ARE MISSING from the backlog:** + +1. **Crypto adapter behavioral inconsistencies (Flaw 1)** — The three adapters have different validation/error behavior. This is not mentioned in any concern or backlog item. The M15 Prism milestone addressed `sha256` async consistency but left the encrypt/decrypt inconsistencies untouched. + +2. **CDC deduplication defeated by encrypt-then-chunk (Flaw 5)** — The fundamental design decision that encryption wraps the stream before chunking is not flagged as a concern or limitation in the roadmap. The Feature Matrix claims "Sub-file deduplication: Via chunking" without noting it only works for unencrypted data. + +3. **FixedChunker quadratic buffer allocation (Flaw 4)** — Minor but missing from backlog. The CDC chunker received significant optimization attention; the fixed chunker did not. + +**Backlog items that should be deprioritized:** + +- **V1 Snapshot Trees** (~410 LoC, ~19h) — Nice to have but doesn't address any Phase 2 flaw. +- **V5 Watch Mode** (~220 LoC, ~10h) — Feature creep for a storage library. +- **V3 Manifest Diff Engine** (~180 LoC, ~8h) — Diagnostic tooling, not a stability concern. + +**Backlog items that should be prioritized:** + +- **C1 Memory amplification guard** — This is the highest-severity technical debt. 20 LoC to add a configurable ceiling. +- **Crypto adapter normalization** — Not in backlog. Needs to be added. ~30 LoC to align all three adapters. +- **V4 CompressionPort** (~180 LoC, ~8h) — Gzip-only compression is a significant limitation. zstd would provide 2-3x better compression ratios with faster decompression. + +--- + +## Phase 4: The Blueprint for Success + +### Month 1: Triage & Foundation + +**Week 1–2: Crypto Adapter Normalization** + +Align all three crypto adapters to identical behavioral contracts: + +1. Add `_validateKey(key)` call to `NodeCryptoAdapter.decryptBuffer()` and `WebCryptoAdapter.decryptBuffer()`. +2. Add premature-finalize guard to `NodeCryptoAdapter.createEncryptionStream()`. +3. Make `NodeCryptoAdapter.encryptBuffer()` explicitly async (return `Promise`). +4. Add a cross-adapter behavioral test suite that asserts identical behavior for all three adapters given the same inputs. + +*Estimated: ~50 LoC changes, ~100 LoC tests.* + +**Week 2: Memory Safety Guards** + +1. Add `maxRestoreBufferSize` option to CasService constructor (default: 512 MiB). Throw `CasError('RESTORE_BUFFER_EXCEEDED')` if the concatenated chunk buffer exceeds this limit in `_restoreBuffered()`. +2. Add buffer size guard to `WebCryptoAdapter.createEncryptionStream()` — throw if accumulated buffer exceeds a configurable limit. +3. Add upper bound validation to `FixedChunker` constructor (e.g., max 100 MiB) and `CdcChunker` (already has `maxChunkSize` but no ceiling on the ceiling). + +*Estimated: ~40 LoC changes, ~30 LoC tests.* + +**Week 3: FixedChunker Buffer Optimization** + +Replace the `Buffer.concat([buffer, data])` loop in `FixedChunker.chunk()` with a pre-allocated working buffer pattern matching `CdcChunker`: + +```js +const buf = Buffer.allocUnsafe(this.#chunkSize); +let offset = 0; +for await (const data of source) { + let srcPos = 0; + while (srcPos < data.length) { + const n = Math.min(data.length - srcPos, this.#chunkSize - offset); + data.copy(buf, offset, srcPos, srcPos + n); + offset += n; + srcPos += n; + if (offset === this.#chunkSize) { + yield Buffer.from(buf); + offset = 0; + } + } +} +if (offset > 0) yield Buffer.from(buf.subarray(0, offset)); +``` + +*Estimated: ~20 LoC change.* + +**Week 4: Missing pre-commit Hook + Process Hygiene** + +1. Add `scripts/git-hooks/pre-commit` that runs `pnpm run lint`. +2. Rename `scripts/git-hooks/` to `scripts/hooks/` to match CLAUDE.md convention (or update CLAUDE.md — choose one). +3. Add `Error.captureStackTrace` guard in `CasError`: `if (Error.captureStackTrace) Error.captureStackTrace(this, this.constructor);` + +*Estimated: ~10 LoC changes.* + +### Month 2: Structural Evolution + +**CompressionPort Abstraction (V4)** + +The current gzip-only compression is hardcoded. Introduce a `CompressionPort` abstract class with `compress(source)` and `decompress(source)` async generator methods. Implement `GzipCompressor` (existing behavior) and `ZstdCompressor` (via `node:zlib` or `zstd-codec`). Update `CompressionSchema` to accept `'gzip' | 'zstd'`. + +*Estimated: ~180 LoC, aligns with V4 vision.* + +**Document the Encrypt-Then-Chunk Limitation** + +This is not fixable without a major architectural change (chunk-then-encrypt with per-chunk AEAD). The correct action is: + +1. Document that CDC deduplication is ineffective for encrypted data. +2. Consider emitting a warning when `encryption + chunking.strategy === 'cdc'` are both specified. +3. If the user explicitly opts in, allow it — but make the trade-off visible. + +*Estimated: ~10 LoC (warning), documentation update.* + +**Interactive Passphrase Prompt (V6)** + +Address concern C5 (passphrase exposure in shell history) by adding TTY-based passphrase prompts with echo disabled. Fall back to flag-based input when stdin is not a TTY. + +*Estimated: ~90 LoC, aligns with V6 vision.* + +### Month 3: Strategic Re-alignment + +**Portable Bundles (V2)** + +The air-gapped use case is a key differentiator. Implement `.casb` bundle files that package manifest + chunks for transport without Git. This enables: +- Export: `git cas export --slug --out archive.casb` +- Import: `git cas import --bundle archive.casb` + +*Estimated: ~340 LoC, aligns with V2 vision.* + +**Garbage Collection Automation** + +The `deleteAsset` and `findOrphanedChunks` methods are analysis-only. Complete the lifecycle: +1. Rename `deleteAsset` to `inspectAsset` or `getAssetMetadata` (breaking change). +2. Implement actual GC via `git prune` after vault entry removal. +3. Add `git cas gc` CLI command with `--dry-run` support. + +*Estimated: ~80 LoC.* + +**CI Hardening** + +1. Add `dependabot.yml` for dependency updates. +2. Add `CODEOWNERS` file. +3. Add security scanning (e.g., `npm audit` in CI). +4. Add `SECURITY.md` at project root (currently missing, noted in CLAUDE.md scaffolding requirements). + +--- + +### Executive Conclusion + +**Health: Strong.** This is a well-architected, thoroughly tested codebase with a clear domain model, strict port/adapter boundaries, and an unusually high test-to-code ratio (3.1:1). The 833+ unit tests with real crypto (never mocked) and fuzz coverage demonstrate a commitment to correctness that is rare in the Node.js ecosystem. + +**Intellectual Property Value: Moderate-High.** The novel contributions — Git ODB as CAS, buzhash CDC with xorshift-seeded tables, zero-cost DEK/KEK key rotation, vault commit chains with CAS semantics — represent genuine engineering innovation. These are not reimplementations of existing libraries; they are original abstractions built on well-understood primitives. + +**Technical Debt: Low.** The roadmap's 7 concerns accurately catalog the known issues. Phase 2 surfaced only 3 additional findings (crypto adapter inconsistencies, encrypt-then-chunk dedup limitation, FixedChunker buffer allocation), none of which are critical. The most urgent issue — memory amplification on encrypted restore — is a ~20 LoC fix. + +**Long-term Viability: Good with caveats.** The system is viable for its target niche (Git-native encrypted binary storage). The Git subprocess bottleneck limits throughput for very high-frequency operations, but this is an acceptable trade-off for correctness and portability. The encrypt-then-chunk design is a permanent architectural constraint that limits CDC's value for encrypted data — this should be prominently documented rather than "fixed." + +**The Honest Assessment:** This codebase punches above its weight. A ~3,900 LoC core library with 12,000 LoC of tests, multi-runtime support, envelope encryption, CDC chunking, Merkle manifests, and an interactive TUI — all with zero native dependencies and no external server requirements. The architecture is clean, the test coverage is comprehensive, and the roadmap is honest. The identified flaws are minor and addressable. This is a well-maintained project by someone who takes software engineering seriously. + +--- + +*Audit conducted at commit `0f7f8e658e6cd094176541ac68d33b2a6ec75a91`.* +*All blob hashes verified via `git hash-object` against live repository state.* diff --git a/README.md b/README.md index 21b3946..1013a25 100644 --- a/README.md +++ b/README.md @@ -304,7 +304,7 @@ git cas store ./data.bin --slug my-data --tree --json - [Guide](./GUIDE.md) — progressive walkthrough - [API Reference](./docs/API.md) — full method documentation - [Architecture](./ARCHITECTURE.md) — hexagonal design overview -- [Security](./docs/SECURITY.md) — crypto design and threat model +- [Security](./SECURITY.md) — crypto design and threat model ## When to use git-cas (and when not to) diff --git a/ROADMAP.md b/ROADMAP.md index 99ddfc4..4f79fc4 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -9,7 +9,7 @@ This roadmap is structured as: 3. **Contracts** — Return/throw semantics for all public methods 4. **Version Plan** — Table mapping versions to milestones 5. **Milestone Dependency Graph** — ASCII diagram -6. **Milestones & Task Cards** — 7 milestones (4 closed, 3 open), remaining task cards +6. **Milestones & Task Cards** — 8 milestones (7 closed, 1 open), remaining task cards 7. **Feature Matrix** — Competitive landscape vs. Git LFS, git-annex, Restic, Age, DVC 8. **Competitive Analysis** — When to use git-cas and when not to, with concrete scenarios @@ -56,6 +56,8 @@ Single registry of all error codes used across the codebase. Each code is a stri | `CANNOT_REMOVE_LAST_RECIPIENT` | Cannot remove the last recipient — at least one must remain. | Task 11.2 | | `ROTATION_NOT_SUPPORTED` | Key rotation requires envelope encryption (DEK/KEK model). Legacy manifests must be re-stored. | Task 12.1 | | `STREAM_NOT_CONSUMED` | `finalize()` called on encryption stream before the generator was fully consumed. | v4.0.1 | +| `RESTORE_TOO_LARGE` | Encrypted/compressed file exceeds `maxRestoreBufferSize`. Buffered restore would OOM. Suggest increasing limit or storing without encryption. | M16 | +| `ENCRYPTION_BUFFER_EXCEEDED` | Web Crypto adapter accumulated buffer exceeds limit during streaming encryption (Deno-specific). Suggest Node.js/Bun or unencrypted store. | M16 | --- @@ -191,6 +193,7 @@ Return and throw semantics for every public method (current and planned). | v3.1.0 | M13 | Bijou | TUI dashboard & progress | ✅ | | v5.0.0 | M10 | Hydra | Content-defined chunking | ✅ | | v5.1.0 | M11 | Locksmith | Multi-recipient encryption | ✅ | +| v5.3.0 | M16 | Capstone | Audit remediation — all CODE-EVAL.md findings | 🔲 | | v5.2.0 | M12 | Carousel | Key rotation | ✅ | --- @@ -206,6 +209,8 @@ M8 Spit Shine + M9 Cockpit (v4.0.1) ✅ M10 Hydra ──────────── ✅ v5.0.0 M11 Locksmith ──────── ✅ v5.1.0 └──► M12 Carousel ── ✅ v5.2.0 +M15 Prism ─────────────── ✅ + └──► M16 Capstone ────── 🔲 v5.3.0 ``` --- @@ -223,6 +228,7 @@ M11 Locksmith ──────── ✅ v5.1.0 | M10| Hydra | Content-defined chunking | v5.0.0 | 4 | ~690 | ~22h | ✅ CLOSED | | M11| Locksmith | Multi-recipient encryption | v5.1.0 | 4 | ~580 | ~20h | ✅ CLOSED | | M12| Carousel | Key rotation | v5.2.0 | 4 | ~400 | ~13h | ✅ CLOSED | +| M16| Capstone | Audit remediation | v5.3.0 | 13 | ~430 | ~28h | 🔲 OPEN | Completed task cards are in [COMPLETED_TASKS.md](./COMPLETED_TASKS.md). Superseded tasks are in [GRAVEYARD.md](./GRAVEYARD.md). @@ -262,6 +268,445 @@ All tasks completed (12.1–12.4). See [COMPLETED_TASKS.md](./COMPLETED_TASKS.md --- +# M16 — Capstone (v5.3.0) 🔲 OPEN + +Remediation milestone addressing all negative findings from the [CODE-EVAL.md](./CODE-EVAL.md) forensic architectural audit. Covers 9 code flaws (Phase 2), 7 pre-existing concerns (C1–C7), and 3 newly identified concerns (C8–C10). No new features — strictly hardening, correctness, and hygiene. + +**Source:** `CODE-EVAL.md` at commit `0f7f8e6` + +**Priority key:** P0 = critical (high severity), P1 = important (medium), P2 = housekeeping (low/negligible). + +--- + +## Task Cards + +### 16.1 — Crypto Adapter Behavioral Normalization *(P0)* — C8 + +**Problem** + +The three CryptoPort adapters (Node, Bun, Web) have inconsistent validation and error-handling behavior — a Liskov Substitution violation. Specifically: + +1. `NodeCryptoAdapter.encryptBuffer()` is synchronous; Bun and Web are async. +2. `BunCryptoAdapter.decryptBuffer()` calls `_validateKey(key)`; Node and Web do not. +3. `NodeCryptoAdapter.createEncryptionStream()` has no premature-finalize guard; Bun and Web throw `CasError('STREAM_NOT_CONSUMED')`. + +Code that works on Bun (early key validation) may produce a cryptic `node:crypto` error on Node. A bug in stream consumption produces undefined behavior on Node but a clear error on Bun/Deno. + +**Fix** + +1. Add `_validateKey(key)` call to `NodeCryptoAdapter.decryptBuffer()` and `WebCryptoAdapter.decryptBuffer()`. +2. Add `streamFinalized` guard + `CasError('STREAM_NOT_CONSUMED')` to `NodeCryptoAdapter.createEncryptionStream()`. +3. Make `NodeCryptoAdapter.encryptBuffer()` explicitly `async` (return `Promise`). +4. Add a cross-adapter behavioral conformance test suite asserting identical behavior for all three adapters given the same inputs. + +**Files:** +- `src/infrastructure/adapters/NodeCryptoAdapter.js` +- `src/infrastructure/adapters/WebCryptoAdapter.js` +- New: `test/unit/infrastructure/adapters/CryptoAdapter.conformance.test.js` + +**Tests:** +```js +describe('16.1: CryptoPort LSP conformance', () => { + // Run the same assertions against all three adapters + for (const [name, adapter] of adapters) { + it(`${name}.encryptBuffer returns a Promise`, ...); + it(`${name}.decryptBuffer rejects invalid key type before crypto error`, ...); + it(`${name}.decryptBuffer rejects wrong-length key before crypto error`, ...); + it(`${name}.createEncryptionStream.finalize() throws STREAM_NOT_CONSUMED if not consumed`, ...); + } +}); +``` + +| Estimate | ~50 LoC changes, ~100 LoC tests, ~4h | +|----------|---------------------------------------| + +--- + +### 16.2 — Memory Restore Guard *(P0)* — C1 + +**Problem** + +`_restoreBuffered()` concatenates ALL chunk blobs into a single buffer before decryption. A 1 GB encrypted file requires ~2 GB of heap. No guard, no warning, no configurable limit. + +**Fix** + +Add `maxRestoreBufferSize` option to CasService constructor (default 512 MiB). Before `Buffer.concat()` in `_restoreBuffered()`, check `manifest.size` against the limit. Throw `CasError('RESTORE_TOO_LARGE')` with an actionable message. + +**Files:** +- `src/domain/services/CasService.js` +- `index.js` (facade wiring) +- `index.d.ts` (type update) + +**Tests:** +```js +describe('16.2: Memory guard on encrypted restore', () => { + it('throws RESTORE_TOO_LARGE when manifest.size exceeds maxRestoreBufferSize', ...); + it('succeeds when manifest.size is within maxRestoreBufferSize', ...); + it('does not apply guard to unencrypted uncompressed restoreStream', ...); + it('includes actionable hint in error message', ...); + it('default maxRestoreBufferSize is 512 MiB', ...); +}); +``` + +| Estimate | ~25 LoC changes, ~40 LoC tests, ~2h | +|----------|--------------------------------------| + +--- + +### 16.3 — Web Crypto Encryption Buffer Guard *(P1)* — C4 + +**Problem** + +`WebCryptoAdapter.createEncryptionStream()` silently buffers the entire stream because Web Crypto AES-GCM is a one-shot API. On Deno, a user calling `store()` with a large encrypted source OOMs without warning. + +**Fix** + +Track accumulated bytes in the `encrypt()` generator. When total exceeds a configurable limit (default 512 MiB), throw `CasError('ENCRYPTION_BUFFER_EXCEEDED')` with an actionable message. + +**Files:** +- `src/infrastructure/adapters/WebCryptoAdapter.js` + +**Tests:** +```js +describe('16.3: Web Crypto buffering guard', () => { + it('throws ENCRYPTION_BUFFER_EXCEEDED when accumulated bytes exceed limit', ...); + it('succeeds for data within buffer limit', ...); + it('NodeCryptoAdapter does NOT throw for large streams (true streaming)', ...); +}); +``` + +| Estimate | ~15 LoC changes, ~30 LoC tests, ~1h | +|----------|--------------------------------------| + +--- + +### 16.4 — FixedChunker Pre-Allocated Buffer *(P2)* — C9 + +**Problem** + +`FixedChunker.chunk()` uses `Buffer.concat([buffer, data])` in a loop. Each call copies the entire accumulated buffer — O(n^2 / chunkSize) total copies for many small input buffers. The CDC chunker uses a pre-allocated working buffer with zero intermediate copies. + +**Fix** + +Replace the concat loop with a pre-allocated `Buffer.allocUnsafe(chunkSize)` working buffer using a copy+offset pattern, matching CdcChunker's approach. + +**Files:** +- `src/infrastructure/chunkers/FixedChunker.js` + +**Tests:** + +Existing tests cover byte-exact correctness. Add: +```js +describe('16.4: FixedChunker buffer efficiency', () => { + it('produces identical output to previous implementation (regression)', ...); + it('handles many small input buffers without excessive allocation', ...); +}); +``` + +| Estimate | ~20 LoC changes, ~15 LoC tests, ~1h | +|----------|--------------------------------------| + +--- + +### 16.5 — Encrypt-Then-Chunk Dedup Warning *(P1)* — C10 + +**Problem** + +Encryption is applied before chunking, destroying content-addressable deduplication. AES-GCM ciphertext is pseudorandom — identical plaintext produces different ciphertext. Users who enable both encryption and CDC chunking get CDC's overhead without its dedup benefit. + +This is an inherent architectural constraint (not fixable without per-chunk encryption). The correct action is documentation + a runtime warning. + +**Fix** + +1. When `store()` is called with both an encryption key/passphrase/recipients AND `chunker.strategy === 'cdc'`, emit `observability.log('warn', 'CDC deduplication is ineffective with encryption — ciphertext is pseudorandom', { strategy: 'cdc' })`. +2. Add a "Known Limitations" section to the README documenting this trade-off. + +**Files:** +- `src/domain/services/CasService.js` (warning in `store()`) + +**Tests:** +```js +describe('16.5: Encrypt-then-chunk dedup warning', () => { + it('emits warning when encryption + CDC chunking are combined', ...); + it('does not warn for encryption + fixed chunking', ...); + it('does not warn for CDC chunking without encryption', ...); +}); +``` + +| Estimate | ~10 LoC changes, ~20 LoC tests, ~1h | +|----------|--------------------------------------| + +--- + +### 16.6 — Chunk Size Upper Bound *(P1)* — C3 + +**Problem** + +`CasService` enforces a minimum chunk size (1024 bytes) but no maximum. A user can configure a 4 GB chunk size. Additionally, `FixedChunker` and `CdcChunker` accept arbitrarily large values without validation. + +**Fix** + +1. Add `if (chunkSize > MAX_CHUNK_SIZE)` guard in `CasService` constructor. 100 MiB is the cap — generous while staying within Git hosting limits. +2. Emit `observability.log('warn', ...)` when chunkSize exceeds 10 MiB. +3. Add matching validation in `FixedChunker` constructor: `if (chunkSize > 100 * 1024 * 1024) throw new RangeError(...)`. +4. Add matching validation in `CdcChunker` constructor for `maxChunkSize`. + +**Files:** +- `src/domain/services/CasService.js` +- `src/infrastructure/chunkers/FixedChunker.js` +- `src/infrastructure/chunkers/CdcChunker.js` + +**Tests:** +```js +describe('16.6: Chunk size upper bound', () => { + it('CasService throws when chunkSize exceeds 100 MiB', ...); + it('CasService accepts chunkSize of exactly 100 MiB', ...); + it('FixedChunker throws when chunkSize exceeds 100 MiB', ...); + it('CdcChunker throws when maxChunkSize exceeds 100 MiB', ...); + it('logs warning when chunkSize exceeds 10 MiB', ...); +}); +``` + +| Estimate | ~15 LoC changes, ~30 LoC tests, ~1h | +|----------|--------------------------------------| + +--- + +### 16.7 — Lifecycle Method Naming *(P2)* + +**Problem** + +`deleteAsset()` does not delete anything — it reads a manifest and returns metadata about what would be orphaned. `findOrphanedChunks()` doesn't find orphans — it collects referenced chunk OIDs. Both names are misleading. + +**Fix** + +1. Add `inspectAsset({ treeOid })` as the canonical name. `deleteAsset` becomes a deprecated alias that delegates to `inspectAsset`. +2. Add `collectReferencedChunks({ treeOids })` as the canonical name. `findOrphanedChunks` becomes a deprecated alias. +3. Emit `observability.log('warn', 'deleteAsset() is deprecated — use inspectAsset()')` on deprecated path. +4. Update `index.d.ts` with `@deprecated` JSDoc on old methods. + +This is a **non-breaking** deprecation. Removal is deferred to a future major version. + +**Files:** +- `src/domain/services/CasService.js` +- `index.js` (facade) +- `index.d.ts` + +**Tests:** +```js +describe('16.7: Lifecycle method naming', () => { + it('inspectAsset returns { slug, chunksOrphaned }', ...); + it('deleteAsset delegates to inspectAsset (deprecated alias)', ...); + it('collectReferencedChunks returns { referenced, total }', ...); + it('findOrphanedChunks delegates to collectReferencedChunks (deprecated alias)', ...); +}); +``` + +| Estimate | ~30 LoC changes, ~25 LoC tests, ~1h | +|----------|--------------------------------------| + +--- + +### 16.8 — CasError Portability Guard *(P2)* + +**Problem** + +`CasError` calls `Error.captureStackTrace(this, this.constructor)` unconditionally. This is V8-specific — it's a no-op on Bun's JavaScriptCore engine. While it doesn't crash (JSC silently ignores it), it indicates incomplete multi-runtime awareness. + +**Fix** + +Guard the call: `if (Error.captureStackTrace) Error.captureStackTrace(this, this.constructor);` + +**Files:** +- `src/domain/errors/CasError.js` + +**Tests:** +```js +describe('16.8: CasError multi-runtime portability', () => { + it('creates CasError with code and meta', ...); + it('does not throw when Error.captureStackTrace is unavailable', ...); +}); +``` + +| Estimate | ~3 LoC changes, ~10 LoC tests, ~0.5h | +|----------|---------------------------------------| + +--- + +### 16.9 — Pre-Commit Hook + Hooks Directory *(P2)* + +**Problem** + +The project has a `pre-push` hook but no `pre-commit` hook. Lint failures are not caught until push time. Additionally, the hooks directory is `scripts/git-hooks/` rather than `scripts/hooks/` per the CLAUDE.md convention. + +**Fix** + +1. Rename `scripts/git-hooks/` to `scripts/hooks/`. +2. Update `scripts/install-hooks.sh` to reference the new path. +3. Add `scripts/hooks/pre-commit` that runs `pnpm run lint`. +4. Update `.git/config` hooksPath if already set. + +**Files:** +- `scripts/git-hooks/pre-push` → `scripts/hooks/pre-push` +- New: `scripts/hooks/pre-commit` +- `scripts/install-hooks.sh` + +| Estimate | ~15 LoC, ~0.5h | +|----------|-----------------| + +--- + +### 16.10 — Orphaned Blob Tracking *(P1)* — C2 + +**Problem** + +When `_chunkAndStore()` throws `STREAM_ERROR`, chunks already written to Git are orphaned. The error meta reports `chunksDispatched` but not the blob OIDs of successful writes. There's no visibility into what was orphaned. + +**Fix** + +1. After `Promise.allSettled(pending)`, collect blob OIDs from fulfilled results. +2. Include `orphanedBlobs: string[]` in the `STREAM_ERROR` meta. +3. Emit `observability.metric('error', { action: 'orphaned_blobs', count, blobs })`. + +**Files:** +- `src/domain/services/CasService.js` + +**Tests:** +```js +describe('16.10: Orphaned blob tracking on STREAM_ERROR', () => { + it('includes orphanedBlobs array in STREAM_ERROR meta', ...); + it('orphanedBlobs contains blob OIDs from successful writes before failure', ...); + it('orphanedBlobs is empty when stream fails before any writes', ...); + it('emits orphaned_blobs metric via observability', ...); +}); +``` + +| Estimate | ~20 LoC changes, ~30 LoC tests, ~2h | +|----------|--------------------------------------| + +--- + +### 16.11 — Passphrase Input Security *(P0)* — C5 + V6 + +**Problem** + +`--vault-passphrase ` puts the passphrase in shell history and process listings. The `GIT_CAS_PASSPHRASE` env var is better but still visible in `/proc//environ`. + +**Fix** + +1. **Interactive prompt**: When `--vault-passphrase` is passed without a value and stdin is a TTY, prompt with echo disabled. Confirmation on first use (store/init). +2. **File-based input**: Add `--vault-passphrase-file ` flag that reads from a file. +3. **Stdin pipe**: `--vault-passphrase -` reads from stdin. +4. **Documentation**: Security warning in `--help` and README. + +**Files:** +- `bin/git-cas.js` +- New: `bin/ui/passphrase-prompt.js` + +**Tests:** +```js +describe('16.11: Passphrase input security', () => { + it('reads passphrase from file when --vault-passphrase-file is used', ...); + it('errors when no passphrase source is available in non-TTY mode', ...); + it('--vault-passphrase-file trims trailing newline', ...); +}); +``` + +| Estimate | ~90 LoC, ~30 LoC tests, ~4h | +|----------|------------------------------| + +--- + +### 16.12 — KDF Brute-Force Awareness *(P2)* — C6 + +**Problem** + +`deriveKey()` and the restore path have no rate limiting or audit trail. An attacker can brute-force passphrases at full CPU speed. + +**Fix** + +1. Emit `observability.metric('error', { action: 'decryption_failed', slug })` on every `INTEGRITY_ERROR` during passphrase-based restore. +2. In the CLI layer, add a 1-second delay after each failed passphrase attempt. + +**Files:** +- `src/domain/services/CasService.js` (observability metric) +- `bin/git-cas.js` (CLI delay) + +**Tests:** +```js +describe('16.12: KDF brute-force awareness', () => { + it('emits decryption_failed metric on wrong passphrase', ...); + it('emits metric with slug context for audit trail', ...); + it('library API does NOT rate-limit (callers manage their own policy)', ...); +}); +``` + +| Estimate | ~10 LoC changes, ~20 LoC tests, ~1h | +|----------|--------------------------------------| + +--- + +### 16.13 — GCM Nonce Collision Documentation *(P2)* — C7 + +**Problem** + +AES-256-GCM uses a 96-bit random nonce. Birthday bound is ~2^48; NIST recommends limiting to 2^32 invocations per key. There's no tracking, no warning, and no documentation of the bound. + +**Fix** + +1. Add `SECURITY.md` at project root documenting: GCM nonce bound, recommended key rotation frequency, KDF parameter guidance, passphrase entropy recommendations. +2. Add `encryptionCount` field to vault metadata. Increment per `store()` with encryption. Emit observability warning when count exceeds 2^31. + +**Files:** +- New: `SECURITY.md` +- `src/domain/services/VaultService.js` (counter increment) + +**Tests:** +```js +describe('16.13: Nonce usage tracking', () => { + it('vault metadata includes encryptionCount after encrypted store', ...); + it('encryptionCount increments per encrypted store', ...); + it('warns via observability when encryptionCount exceeds threshold', ...); +}); +``` + +| Estimate | ~25 LoC changes, ~20 LoC tests, ~2h | +|----------|--------------------------------------| + +--- + +### M16 Summary + +| Task | Theme | Priority | Severity | Audit Ref | Concern Ref | ~LoC | ~Hours | +|------|-------|----------|----------|-----------|-------------|------|--------| +| 16.1 | Crypto adapter normalization | P0 | High | Flaw 1 | C8 | ~150 | ~4h | +| 16.2 | Memory restore guard | P0 | High | Flaw 2 | C1 | ~65 | ~2h | +| 16.3 | Web Crypto buffer guard | P1 | Medium | Flaw 3 | C4 | ~45 | ~1h | +| 16.4 | FixedChunker buffer optimization | P2 | Low | Flaw 4 | C9 | ~35 | ~1h | +| 16.5 | Encrypt-then-chunk dedup warning | P1 | Medium | Flaw 5 | C10 | ~30 | ~1h | +| 16.6 | Chunk size upper bound | P1 | Medium | Flaw 6 | C3 | ~45 | ~1h | +| 16.7 | Lifecycle method naming | P2 | Low | Flaw 7 | — | ~55 | ~1h | +| 16.8 | CasError portability guard | P2 | Negligible | Flaw 8 | — | ~13 | ~0.5h | +| 16.9 | Pre-commit hook + hooks dir | P2 | Low | Flaw 9 | — | ~15 | ~0.5h | +| 16.10 | Orphaned blob tracking | P1 | Medium | — | C2 | ~50 | ~2h | +| 16.11 | Passphrase input security | P0 | High | — | C5+V6 | ~120 | ~4h | +| 16.12 | KDF brute-force awareness | P2 | Low | — | C6 | ~30 | ~1h | +| 16.13 | GCM nonce collision docs + counter | P2 | Low | — | C7 | ~45 | ~2h | +| **Total** | | | | | | **~698** | **~21h** | + +### Recommended Execution Order + +**Phase 1 — Safety nets (P0):** +16.8, 16.9, 16.1, 16.2, 16.11 + +**Phase 2 — Correctness (P1):** +16.6, 16.3, 16.5, 16.10 + +**Phase 3 — Polish (P2):** +16.4, 16.7, 16.12, 16.13 + +--- + # 7) Feature Matrix Competitive landscape for content-addressed storage, encrypted binary assets, and large-file Git tooling. Rows represent the union of features across the space — not just what git-cas offers, but what users encounter and expect when evaluating tools in this category. @@ -1170,17 +1615,64 @@ describe('Concern 7: Nonce uniqueness', () => { --- +## Concern 8: Crypto Adapter Liskov Substitution Violation + +**Source:** CODE-EVAL.md, Flaw 1 + +**The Problem** + +The three `CryptoPort` implementations (Node, Bun, Web) differ in observable behavior: + +1. `NodeCryptoAdapter.encryptBuffer()` is synchronous (returns plain object), while Bun and Web return `Promise`. +2. `BunCryptoAdapter.decryptBuffer()` calls `_validateKey(key)` before decryption; Node and Web do not — the invalid key hits `node:crypto` directly, producing a less informative error. +3. `NodeCryptoAdapter.createEncryptionStream()` has no premature-finalize guard. Calling `finalize()` before consuming the stream returns garbage metadata on Node, but throws a clear `CasError('STREAM_NOT_CONSUMED')` on Bun and Deno. + +M15 Prism fixed the `sha256()` async inconsistency but left these three discrepancies untouched. + +**Mitigation:** Task 16.1. + +--- + +## Concern 9: FixedChunker Quadratic Buffer Allocation + +**Source:** CODE-EVAL.md, Flaw 4 + +**The Problem** + +`FixedChunker.chunk()` uses `Buffer.concat([buffer, data])` inside its async loop. Each call allocates a new buffer and copies the accumulated bytes. For a source yielding many small buffers (e.g., 4 KiB network reads into a 256 KiB chunk), this is O(n^2 / chunkSize) total byte copies. The CdcChunker, by contrast, uses a pre-allocated `Buffer.allocUnsafe(maxChunkSize)` with zero intermediate copies. + +**Mitigation:** Task 16.4. + +--- + +## Concern 10: CDC Deduplication Defeated by Encrypt-Then-Chunk + +**Source:** CODE-EVAL.md, Flaw 5 + +**The Problem** + +Encryption is applied to the source stream *before* chunking. AES-GCM ciphertext is pseudorandom — identical plaintext produces different ciphertext (different random nonce each time). This means content-defined chunking (CDC) provides **zero deduplication benefit** for encrypted files. Users who combine `recipients` (or `encryptionKey`) with `chunking: { strategy: 'cdc' }` get CDC's computational overhead without its primary value proposition. + +This is a fundamental architectural constraint of the encrypt-then-chunk design. The alternative (chunk-then-encrypt) would require per-chunk nonces and auth tags, significantly complicating the manifest schema. This is documented as a known limitation, not a fixable bug. + +**Mitigation:** Task 16.5 (runtime warning + documentation). + +--- + ## Summary Table -| # | Type | Severity | Fix Cost | Recommended Action | -|---|------|----------|----------|-------------------| -| C1 | Memory amplification | High | ~20 LoC | Add `maxRestoreBufferSize` guard | -| C2 | Orphaned blobs | Medium | ~20 LoC | Report orphaned blob OIDs in error meta | -| C3 | No chunk size cap | Medium | ~6 LoC | Enforce 100 MiB maximum | -| C4 | Web Crypto buffering | Medium | ~15 LoC | Add buffer size guard in WebCryptoAdapter | -| C5 | Passphrase exposure | High | ~90 LoC | Interactive prompt + file-based input | -| C6 | KDF no rate limit | Low | ~10 LoC | Observability metric + CLI delay | -| C7 | GCM nonce collision | Low | ~20 LoC | Document bound + vault usage counter | +| # | Type | Severity | Fix Cost | Recommended Action | Task | +|---|------|----------|----------|--------------------|------| +| C1 | Memory amplification | High | ~20 LoC | Add `maxRestoreBufferSize` guard | **16.2** | +| C2 | Orphaned blobs | Medium | ~20 LoC | Report orphaned blob OIDs in error meta | **16.10** | +| C3 | No chunk size cap | Medium | ~6 LoC | Enforce 100 MiB maximum | **16.6** | +| C4 | Web Crypto buffering | Medium | ~15 LoC | Add buffer size guard in WebCryptoAdapter | **16.3** | +| C5 | Passphrase exposure | High | ~90 LoC | Interactive prompt + file-based input | **16.11** | +| C6 | KDF no rate limit | Low | ~10 LoC | Observability metric + CLI delay | **16.12** | +| C7 | GCM nonce collision | Low | ~20 LoC | Document bound + vault usage counter | **16.13** | +| C8 | Crypto adapter LSP violation | Medium | ~50 LoC | Normalize validation + finalize guards | **16.1** | +| C9 | FixedChunker quadratic alloc | Low | ~20 LoC | Pre-allocated buffer | **16.4** | +| C10 | Encrypt-then-chunk dedup loss | Medium | ~10 LoC | Runtime warning + documentation | **16.5** | | # | Type | Theme | Est. Cost | |---|------|-------|-----------| @@ -1189,4 +1681,6 @@ describe('Concern 7: Nonce uniqueness', () => { | V3 | Feature | Manifest diff engine | ~180 LoC, ~8h | | V4 | Feature | CompressionPort + zstd/brotli/lz4 | ~180 LoC, ~8h | | V5 | Feature | Watch mode (continuous sync) | ~220 LoC, ~10h | -| V6 | Feature | Interactive passphrase prompt | ~90 LoC, ~4h | +| V6 | Feature | Interactive passphrase prompt | ~90 LoC, ~4h — subsumed by **16.11** | +| V7 | Feature | Prometheus/OpenTelemetry ObservabilityPort adapter — export metrics (chunk throughput, encryption counts, error rates) to Prometheus or OTLP. The `decryption_failed` and `encryptionCount` metrics from M16 are natural candidates for alerting dashboards. | ~150 LoC, ~6h | +| V8 | Feature | `encryptionCount` auto-rotation — when count reaches a configurable threshold, automatically trigger `rotateVaultPassphrase` with a new passphrase derived from the old one, making nonce exhaustion impossible for long-lived vaults. | ~120 LoC, ~5h | diff --git a/docs/SECURITY.md b/SECURITY.md similarity index 91% rename from docs/SECURITY.md rename to SECURITY.md index b626c26..00a5b5e 100644 --- a/docs/SECURITY.md +++ b/SECURITY.md @@ -4,15 +4,50 @@ This document describes the security architecture, cryptographic design, and lim ## Table of Contents -1. [Threat Model](#threat-model) -2. [Cryptographic Design](#cryptographic-design) -3. [Key Handling](#key-handling) -4. [Encryption Flow](#encryption-flow) -5. [Decryption Flow](#decryption-flow) -6. [Chunk Digest Verification](#chunk-digest-verification) -7. [Limitations](#limitations) -8. [Git Object Immutability](#git-object-immutability) -9. [Error Codes for Security Operations](#error-codes-for-security-operations) +1. [Operational Limits](#operational-limits) +2. [Threat Model](#threat-model) +3. [Cryptographic Design](#cryptographic-design) +4. [Key Handling](#key-handling) +5. [Encryption Flow](#encryption-flow) +6. [Decryption Flow](#decryption-flow) +7. [Chunk Digest Verification](#chunk-digest-verification) +8. [Limitations](#limitations) +9. [Git Object Immutability](#git-object-immutability) +10. [Error Codes for Security Operations](#error-codes-for-security-operations) + +--- + +## Operational Limits + +### GCM Nonce Bound + +AES-256-GCM uses a 96-bit random nonce per encryption. NIST SP 800-38D recommends limiting to **2^32 invocations per key** to keep the nonce collision probability below an acceptable threshold. The birthday bound is approximately 2^48 for random 96-bit nonces, but the conservative NIST guidance of 2^32 accounts for the catastrophic consequences of a collision (full plaintext and authentication key recovery). + +git-cas tracks encryption operations via `encryptionCount` in vault metadata. When the count exceeds **2^31** (2,147,483,648), an observability warning is emitted, providing a safety margin before the 2^32 NIST limit. + +**Recommended key rotation frequency**: Rotate the vault passphrase (or encryption key) before `encryptionCount` reaches 2^31, or every 90 days, whichever comes first. + +### KDF Parameter Guidance + +When using passphrase-based encryption, git-cas derives keys using PBKDF2 or scrypt. + +| Algorithm | Recommended Parameters | Notes | +|-----------|----------------------|-------| +| PBKDF2 | iterations ≥ 600,000 (SHA-256) | OWASP 2024 recommendation | +| scrypt | N=2^17, r=8, p=1 | ~128 MiB memory | + +Higher iteration counts / cost parameters increase resistance to brute-force attacks but also increase the time to derive a key. Choose parameters based on your threat model and latency tolerance. + +### Passphrase Entropy Recommendations + +| Entropy (bits) | Example | Brute-Force Resistance | +|---------------|---------|----------------------| +| < 40 | `password123` | Trivially crackable | +| 40–60 | 4–5 random dictionary words | Weak against GPU attacks | +| 60–80 | 6+ random dictionary words or 12+ mixed characters | Moderate | +| > 80 | 8+ random dictionary words or 16+ mixed characters | Strong | + +**Minimum recommendation**: 80+ bits of entropy for vault passphrases. Use a random passphrase generator (e.g., Diceware) rather than human-chosen passwords. --- diff --git a/bin/actions.js b/bin/actions.js index d1cb54a..c6388b8 100644 --- a/bin/actions.js +++ b/bin/actions.js @@ -56,6 +56,15 @@ function getHint(code) { return undefined; } +/** + * Delay utility for rate-limiting after sensitive failures. + * @param {number} ms + * @returns {Promise} + */ +function delay(ms) { + return new Promise((resolve) => { setTimeout(resolve, ms); }); +} + /** * Wrap a command action with structured error handling. * @@ -68,6 +77,9 @@ export function runAction(fn, getJson) { try { await fn(...args); } catch (/** @type {any} */ err) { + if (err?.code === 'INTEGRITY_ERROR') { + await delay(1000); + } writeError(err, getJson()); process.exitCode = 1; } diff --git a/bin/git-cas.js b/bin/git-cas.js index 5e0f30d..0cf848c 100755 --- a/bin/git-cas.js +++ b/bin/git-cas.js @@ -12,6 +12,7 @@ import { renderManifestView } from './ui/manifest-view.js'; import { renderHeatmap } from './ui/heatmap.js'; import { runAction } from './actions.js'; import { filterEntries, formatTable, formatTabSeparated } from './ui/vault-list.js'; +import { readPassphraseFile, promptPassphrase } from './ui/passphrase-prompt.js'; const getJson = () => program.opts().json; @@ -75,13 +76,41 @@ async function deriveVaultKey(cas, metadata, passphrase) { } /** - * Resolve passphrase from --vault-passphrase flag or GIT_CAS_PASSPHRASE env var. + * Resolve passphrase from (in priority order): + * 1. --vault-passphrase-file + * 2. --vault-passphrase + * 3. GIT_CAS_PASSPHRASE env var + * 4. Interactive TTY prompt (if stdin is a TTY) * * @param {Record} opts - * @returns {string | undefined} + * @param {{ confirm?: boolean }} [extra] + * @returns {Promise} */ -function resolvePassphrase(opts) { - return opts.vaultPassphrase ?? process.env.GIT_CAS_PASSPHRASE; +/** + * Returns true when a non-interactive passphrase source exists (flag or env). + * Does NOT trigger prompts or consume stdin. + * + * @param {Record} opts + * @returns {boolean} + */ +function hasPassphraseSource(opts) { + return Boolean(opts.vaultPassphraseFile || opts.vaultPassphrase || process.env.GIT_CAS_PASSPHRASE); +} + +async function resolvePassphrase(opts, extra = {}) { + if (opts.vaultPassphraseFile) { + return await readPassphraseFile(opts.vaultPassphraseFile); + } + if (opts.vaultPassphrase) { + return opts.vaultPassphrase; + } + if (process.env.GIT_CAS_PASSPHRASE) { + return process.env.GIT_CAS_PASSPHRASE; + } + if (process.stdin.isTTY) { + return await promptPassphrase({ confirm: extra.confirm || false }); + } + return undefined; } /** @@ -95,16 +124,18 @@ async function resolveEncryptionKey(cas, opts) { if (opts.keyFile) { return readKeyFile(opts.keyFile); } - const passphrase = resolvePassphrase(opts); - if (!passphrase) { + const metadata = await cas.getVaultMetadata(); + if (!metadata?.encryption) { + if (hasPassphraseSource(opts)) { + process.stderr.write('warning: passphrase ignored (vault is not encrypted)\n'); + } return undefined; } - const metadata = await cas.getVaultMetadata(); - if (metadata?.encryption) { - return deriveVaultKey(cas, metadata, passphrase); + const passphrase = await resolvePassphrase(opts); + if (!passphrase) { + return undefined; } - process.stderr.write('warning: passphrase ignored (vault is not encrypted)\n'); - return undefined; + return deriveVaultKey(cas, metadata, passphrase); } /** @@ -186,9 +217,10 @@ program .option('--tree', 'Also create a Git tree and print its OID') .option('--force', 'Overwrite existing vault entry') .option('--vault-passphrase ', 'Vault-level passphrase for encryption (prefer GIT_CAS_PASSPHRASE env var)') + .option('--vault-passphrase-file ', 'Read vault passphrase from file (use - for stdin)') .option('--cwd ', 'Git working directory', '.') .action(runAction(async (/** @type {string} */ file, /** @type {Record} */ opts) => { - if (opts.recipient && (opts.keyFile || resolvePassphrase(opts))) { + if (opts.recipient && (opts.keyFile || hasPassphraseSource(opts))) { throw new Error('Provide --key-file/--vault-passphrase or --recipient, not both'); } if (opts.force && !opts.tree) { @@ -275,6 +307,7 @@ program .option('--oid ', 'Direct tree OID') .option('--key-file ', 'Path to 32-byte raw encryption key file') .option('--vault-passphrase ', 'Vault-level passphrase for decryption (prefer GIT_CAS_PASSPHRASE env var)') + .option('--vault-passphrase-file ', 'Read vault passphrase from file (use - for stdin)') .option('--cwd ', 'Git working directory', '.') .action(runAction(async (/** @type {Record} */ opts) => { validateRestoreFlags(opts); @@ -345,13 +378,14 @@ vault .command('init') .description('Initialize the vault') .option('--vault-passphrase ', 'Passphrase for vault-level encryption (prefer GIT_CAS_PASSPHRASE env var)') + .option('--vault-passphrase-file ', 'Read vault passphrase from file (use - for stdin)') .option('--algorithm ', 'KDF algorithm (pbkdf2 or scrypt)', 'pbkdf2') .option('--cwd ', 'Git working directory', '.') .action(runAction(async (/** @type {Record} */ opts) => { const cas = createCas(opts.cwd); /** @type {{ passphrase?: string, kdfOptions?: { algorithm: 'pbkdf2' | 'scrypt' } }} */ const initOpts = {}; - const passphrase = resolvePassphrase(opts); + const passphrase = await resolvePassphrase(opts, { confirm: true }); if (passphrase) { initOpts.passphrase = passphrase; initOpts.kdfOptions = { algorithm: /** @type {'pbkdf2' | 'scrypt'} */ (opts.algorithm) }; diff --git a/bin/ui/passphrase-prompt.js b/bin/ui/passphrase-prompt.js new file mode 100644 index 0000000..3e261c9 --- /dev/null +++ b/bin/ui/passphrase-prompt.js @@ -0,0 +1,66 @@ +import { createInterface } from 'node:readline'; +import { readFile } from 'node:fs/promises'; + +/** + * Prompts for a passphrase on stderr with echo disabled. + * + * @param {Object} [options] + * @param {boolean} [options.confirm=false] - Require confirmation (ask twice). + * @returns {Promise} + */ +export async function promptPassphrase({ confirm = false } = {}) { + if (!process.stdin.isTTY) { + throw new Error( + 'Cannot prompt for passphrase: stdin is not a TTY. ' + + 'Use --vault-passphrase-file or GIT_CAS_PASSPHRASE.', + ); + } + const pass = await readHidden('Passphrase: '); + if (confirm) { + const pass2 = await readHidden('Confirm passphrase: '); + if (pass !== pass2) { + throw new Error('Passphrases do not match'); + } + } + return pass; +} + +/** + * Reads a passphrase from a file path, or from stdin when path is '-'. + * + * @param {string} filePath - File path, or '-' for stdin. + * @returns {Promise} + */ +export async function readPassphraseFile(filePath) { + if (filePath === '-') { + const chunks = []; + for await (const chunk of process.stdin) { + chunks.push(chunk); + } + return Buffer.concat(chunks).toString('utf8').replace(/\r?\n$/, ''); + } + const content = await readFile(filePath, 'utf8'); + return content.replace(/\r?\n$/, ''); +} + +/** + * Reads a line with echo disabled. + * @param {string} prompt - Prompt text. + * @returns {Promise} + */ +function readHidden(prompt) { + return new Promise((resolve) => { + const rl = createInterface({ + input: process.stdin, + output: process.stderr, + terminal: true, + }); + process.stderr.write(prompt); + rl.question('', (answer) => { + rl.close(); + process.stderr.write('\n'); + resolve(answer); + }); + rl._writeToOutput = () => {}; + }); +} diff --git a/index.d.ts b/index.d.ts index c59de13..05fa0e4 100644 --- a/index.d.ts +++ b/index.d.ts @@ -171,6 +171,8 @@ export interface ContentAddressableStoreOptions { concurrency?: number; chunking?: ChunkingConfig; chunker?: ChunkingPort; + /** Maximum bytes to buffer during encrypted/compressed restore. @default 536870912 (512 MiB) */ + maxRestoreBufferSize?: number; } /** A single vault entry. */ @@ -182,6 +184,8 @@ export interface VaultEntry { /** Vault metadata stored in .vault.json. */ export interface VaultMetadata { version: number; + /** Number of encrypted store operations performed with this vault key. */ + encryptionCount?: number; encryption?: { cipher: string; kdf: { @@ -341,10 +345,20 @@ export default class ContentAddressableStore { readManifest(options: { treeOid: string }): Promise; + inspectAsset(options: { + treeOid: string; + }): Promise<{ slug: string; chunksOrphaned: number }>; + + /** @deprecated Use {@link inspectAsset} instead. */ deleteAsset(options: { treeOid: string; }): Promise<{ slug: string; chunksOrphaned: number }>; + collectReferencedChunks(options: { + treeOids: string[]; + }): Promise<{ referenced: Set; total: number }>; + + /** @deprecated Use {@link collectReferencedChunks} instead. */ findOrphanedChunks(options: { treeOids: string[]; }): Promise<{ referenced: Set; total: number }>; diff --git a/index.js b/index.js index 1a643fb..85f9154 100644 --- a/index.js +++ b/index.js @@ -64,14 +64,15 @@ export default class ContentAddressableStore { * @param {number} [options.concurrency=1] - Maximum parallel chunk I/O operations. * @param {{ strategy: string, chunkSize?: number, targetChunkSize?: number, minChunkSize?: number, maxChunkSize?: number }} [options.chunking] - Chunking strategy config. * @param {import('./src/ports/ChunkingPort.js').default} [options.chunker] - Pre-built ChunkingPort instance (advanced). + * @param {number} [options.maxRestoreBufferSize=536870912] - Max buffered restore size in bytes for encrypted/compressed restores (default 512 MiB). */ - constructor({ plumbing, chunkSize, codec, policy, crypto, observability, merkleThreshold, concurrency, chunking, chunker }) { - this.#config = { plumbing, chunkSize, codec, policy, crypto, observability, merkleThreshold, concurrency, chunking, chunker }; + constructor({ plumbing, chunkSize, codec, policy, crypto, observability, merkleThreshold, concurrency, chunking, chunker, maxRestoreBufferSize }) { + this.#config = { plumbing, chunkSize, codec, policy, crypto, observability, merkleThreshold, concurrency, chunking, chunker, maxRestoreBufferSize }; this.service = null; this.#servicePromise = null; } - /** @type {{ plumbing: *, chunkSize?: number, codec?: *, policy?: *, crypto?: *, observability?: *, merkleThreshold?: number, concurrency?: number, chunking?: *, chunker?: * }} */ + /** @type {{ plumbing: *, chunkSize?: number, codec?: *, policy?: *, crypto?: *, observability?: *, merkleThreshold?: number, concurrency?: number, chunking?: *, chunker?: *, maxRestoreBufferSize?: number }} */ #config; /** @type {VaultService|null} */ #vault = null; @@ -111,13 +112,14 @@ export default class ContentAddressableStore { merkleThreshold: cfg.merkleThreshold, concurrency: cfg.concurrency, chunker, + maxRestoreBufferSize: cfg.maxRestoreBufferSize, }); const ref = new GitRefAdapter({ plumbing: cfg.plumbing, policy: cfg.policy, }); - this.#vault = new VaultService({ persistence, ref, crypto }); + this.#vault = new VaultService({ persistence, ref, crypto, observability: this.service.observability }); return this.service; } @@ -314,7 +316,18 @@ export default class ContentAddressableStore { } /** - * Returns deletion metadata for an asset stored in a Git tree. + * Reads a manifest from a Git tree and returns inspection metadata. + * @param {Object} options + * @param {string} options.treeOid - Git tree OID of the asset. + * @returns {Promise<{ slug: string, chunksOrphaned: number }>} + */ + async inspectAsset(options) { + const service = await this.#getService(); + return await service.inspectAsset(options); + } + + /** + * @deprecated Use {@link inspectAsset} instead. * @param {Object} options * @param {string} options.treeOid - Git tree OID of the asset. * @returns {Promise<{ slug: string, chunksOrphaned: number }>} @@ -330,6 +343,17 @@ export default class ContentAddressableStore { * @param {string[]} options.treeOids - Git tree OIDs to analyze. * @returns {Promise<{ referenced: Set, total: number }>} */ + async collectReferencedChunks(options) { + const service = await this.#getService(); + return await service.collectReferencedChunks(options); + } + + /** + * @deprecated Use {@link collectReferencedChunks} instead. + * @param {Object} options + * @param {string[]} options.treeOids - Git tree OIDs to analyze. + * @returns {Promise<{ referenced: Set, total: number }>} + */ async findOrphanedChunks(options) { const service = await this.#getService(); return await service.findOrphanedChunks(options); diff --git a/scripts/hooks/pre-commit b/scripts/hooks/pre-commit new file mode 100755 index 0000000..d5e25a7 --- /dev/null +++ b/scripts/hooks/pre-commit @@ -0,0 +1,13 @@ +#!/usr/bin/env bash + +# pre-commit git hook +# Lint must pass cleanly. Zero errors, zero warnings. + +set -e + +echo "Running pre-commit lint gate..." + +echo "→ Linting..." +pnpm run lint + +echo "✅ Lint passed." diff --git a/scripts/git-hooks/pre-push b/scripts/hooks/pre-push similarity index 100% rename from scripts/git-hooks/pre-push rename to scripts/hooks/pre-push diff --git a/scripts/install-hooks.sh b/scripts/install-hooks.sh index fe569e9..567f8d9 100644 --- a/scripts/install-hooks.sh +++ b/scripts/install-hooks.sh @@ -6,7 +6,7 @@ set -e SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -HOOKS_DIR="${SCRIPT_DIR}/git-hooks" +HOOKS_DIR="${SCRIPT_DIR}/hooks" # Make all hooks executable chmod +x "${HOOKS_DIR}"/* diff --git a/src/domain/errors/CasError.js b/src/domain/errors/CasError.js index 6acc1da..54f9ba3 100644 --- a/src/domain/errors/CasError.js +++ b/src/domain/errors/CasError.js @@ -15,6 +15,8 @@ export default class CasError extends Error { this.name = this.constructor.name; this.code = code; this.meta = meta; - Error.captureStackTrace(this, this.constructor); + if (Error.captureStackTrace) { + Error.captureStackTrace(this, this.constructor); + } } } diff --git a/src/domain/services/CasService.d.ts b/src/domain/services/CasService.d.ts index 358579b..80440a8 100644 --- a/src/domain/services/CasService.d.ts +++ b/src/domain/services/CasService.d.ts @@ -131,10 +131,20 @@ export default class CasService { readManifest(options: { treeOid: string }): Promise; + inspectAsset(options: { + treeOid: string; + }): Promise<{ slug: string; chunksOrphaned: number }>; + + /** @deprecated Use {@link inspectAsset} instead. */ deleteAsset(options: { treeOid: string; }): Promise<{ slug: string; chunksOrphaned: number }>; + collectReferencedChunks(options: { + treeOids: string[]; + }): Promise<{ referenced: Set; total: number }>; + + /** @deprecated Use {@link collectReferencedChunks} instead. */ findOrphanedChunks(options: { treeOids: string[]; }): Promise<{ referenced: Set; total: number }>; diff --git a/src/domain/services/CasService.js b/src/domain/services/CasService.js index 9d1370c..b841b54 100644 --- a/src/domain/services/CasService.js +++ b/src/domain/services/CasService.js @@ -35,27 +35,46 @@ export default class CasService { * @param {number} [options.concurrency=1] - Maximum parallel chunk I/O operations. * @param {import('../../ports/ChunkingPort.js').default} [options.chunker] - Chunking strategy (default FixedChunker). */ - constructor({ persistence, codec, crypto, observability, chunkSize = 256 * 1024, merkleThreshold = 1000, concurrency = 1, chunker }) { + constructor({ persistence, codec, crypto, observability, chunkSize = 256 * 1024, merkleThreshold = 1000, concurrency = 1, chunker, maxRestoreBufferSize = 512 * 1024 * 1024 }) { CasService._validateObservability(observability); - if (chunkSize < 1024) { - throw new Error('Chunk size must be at least 1024 bytes'); - } + CasService.#validateConstructorArgs({ chunkSize, merkleThreshold, concurrency, maxRestoreBufferSize }); this.persistence = persistence; this.codec = codec; this.crypto = crypto; this.observability = observability; this.chunkSize = chunkSize; + if (chunkSize > 10 * 1024 * 1024) { + observability.log('warn', `Chunk size ${chunkSize} exceeds 10 MiB — consider a smaller value`, { chunkSize }); + } /** @type {import('../../ports/ChunkingPort.js').default} */ this.chunker = chunker || new FixedChunker({ chunkSize }); + this.merkleThreshold = merkleThreshold; + this.concurrency = concurrency; + this.maxRestoreBufferSize = maxRestoreBufferSize; + this.#keyResolver = new KeyResolver(crypto); + } + + /** + * Validates constructor numeric arguments. + * @private + */ + static #validateConstructorArgs({ chunkSize, merkleThreshold, concurrency, maxRestoreBufferSize }) { + if (chunkSize < 1024) { + throw new Error('Chunk size must be at least 1024 bytes'); + } + const MAX_CHUNK_SIZE = 100 * 1024 * 1024; + if (chunkSize > MAX_CHUNK_SIZE) { + throw new Error(`Chunk size must not exceed ${MAX_CHUNK_SIZE} bytes (100 MiB)`); + } if (!Number.isInteger(merkleThreshold) || merkleThreshold < 1) { throw new Error('Merkle threshold must be a positive integer'); } - this.merkleThreshold = merkleThreshold; if (!Number.isInteger(concurrency) || concurrency < 1) { throw new Error('Concurrency must be a positive integer'); } - this.concurrency = concurrency; - this.#keyResolver = new KeyResolver(crypto); + if (!Number.isInteger(maxRestoreBufferSize) || maxRestoreBufferSize < 1024) { + throw new Error('maxRestoreBufferSize must be a positive integer >= 1024'); + } } /** @@ -127,15 +146,20 @@ export default class CasService { launchWrite(chunk, nextIndex++); } } catch (err) { - await Promise.allSettled(pending); + const settled = await Promise.allSettled(pending); + const orphanedBlobs = settled + .filter((r) => r.status === 'fulfilled') + .map((r) => r.value.blob); if (err instanceof CasError) { throw err; } const casErr = new CasError( `Stream error during store: ${err.message}`, 'STREAM_ERROR', - { chunksDispatched: nextIndex, originalError: err }, + { chunksDispatched: nextIndex, orphanedBlobs, originalError: err }, ); - await Promise.allSettled(pending); - this.observability.metric('error', { code: casErr.code, message: casErr.message }); + this.observability.metric('error', { + code: casErr.code, message: casErr.message, + orphanedBlobs: orphanedBlobs.length, + }); throw casErr; } @@ -257,6 +281,13 @@ export default class CasService { const manifestData = this._buildManifestData(slug, filename, compression); const processedSource = compression ? this._compressStream(source) : source; + if (keyInfo.key && this.chunker.strategy === 'cdc') { + this.observability.log( + 'warn', + 'CDC deduplication is ineffective with encryption — ciphertext is pseudorandom', + { strategy: 'cdc' }, + ); + } if (keyInfo.key) { const { encrypt, finalize } = this.crypto.createEncryptionStream(keyInfo.key); await this._chunkAndStore(encrypt(processedSource), manifestData); @@ -469,14 +500,38 @@ export default class CasService { * @private */ async *_restoreBuffered(manifest, key) { + const totalSize = manifest.chunks.reduce((acc, c) => acc + c.size, 0); + if (totalSize > this.maxRestoreBufferSize) { + throw new CasError( + `Encrypted/compressed restore would buffer ${totalSize} bytes ` + + `(limit: ${this.maxRestoreBufferSize}). Increase maxRestoreBufferSize ` + + 'or store without encryption.', + 'RESTORE_TOO_LARGE', + { size: totalSize, limit: this.maxRestoreBufferSize }, + ); + } let buffer = Buffer.concat(await this._readAndVerifyChunks(manifest.chunks)); if (manifest.encryption?.encrypted) { - buffer = await this.decrypt({ buffer, key, meta: manifest.encryption }); + try { + buffer = await this.decrypt({ buffer, key, meta: manifest.encryption }); + } catch (err) { + if (err instanceof CasError && err.code === 'INTEGRITY_ERROR') { + this.observability.metric('error', { action: 'decryption_failed', slug: manifest.slug }); + } + throw err; + } } if (manifest.compression) { buffer = await this._decompress(buffer); + if (buffer.length > this.maxRestoreBufferSize) { + throw new CasError( + `Decompressed restore is ${buffer.length} bytes (limit: ${this.maxRestoreBufferSize})`, + 'RESTORE_TOO_LARGE', + { size: buffer.length, limit: this.maxRestoreBufferSize }, + ); + } } this.observability.metric('file', { @@ -626,7 +681,7 @@ export default class CasService { } /** - * Returns deletion metadata for an asset stored in a Git tree. + * Reads a manifest from a Git tree and returns inspection metadata. * Does not perform any destructive Git operations. * * @param {Object} options @@ -634,7 +689,7 @@ export default class CasService { * @returns {Promise<{ chunksOrphaned: number, slug: string }>} * @throws {CasError} MANIFEST_NOT_FOUND if the tree has no manifest */ - async deleteAsset({ treeOid }) { + async inspectAsset({ treeOid }) { const manifest = await this.readManifest({ treeOid }); return { slug: manifest.slug, @@ -642,6 +697,17 @@ export default class CasService { }; } + /** + * @deprecated Use {@link inspectAsset} instead. + * @param {Object} options + * @param {string} options.treeOid - Git tree OID of the asset + * @returns {Promise<{ chunksOrphaned: number, slug: string }>} + */ + async deleteAsset(options) { + this.observability.log('warn', 'deleteAsset() is deprecated — use inspectAsset()'); + return await this.inspectAsset(options); + } + /** * Aggregates referenced chunk blob OIDs across multiple stored assets. * Analysis only — does not delete or modify anything. @@ -651,7 +717,7 @@ export default class CasService { * @returns {Promise<{ referenced: Set, total: number }>} * @throws {CasError} MANIFEST_NOT_FOUND if any treeOid lacks a manifest */ - async findOrphanedChunks({ treeOids }) { + async collectReferencedChunks({ treeOids }) { const referenced = new Set(); let total = 0; @@ -666,6 +732,17 @@ export default class CasService { return { referenced, total }; } + /** + * @deprecated Use {@link collectReferencedChunks} instead. + * @param {Object} options + * @param {string[]} options.treeOids - Git tree OIDs to analyze + * @returns {Promise<{ referenced: Set, total: number }>} + */ + async findOrphanedChunks(options) { + this.observability.log('warn', 'findOrphanedChunks() is deprecated — use collectReferencedChunks()'); + return await this.collectReferencedChunks(options); + } + /** * Derives an encryption key from a passphrase using PBKDF2 or scrypt. * @param {Object} options diff --git a/src/domain/services/VaultService.js b/src/domain/services/VaultService.js index d5a1ac2..09009b8 100644 --- a/src/domain/services/VaultService.js +++ b/src/domain/services/VaultService.js @@ -80,16 +80,22 @@ function hasControlChars(str) { export default class VaultService { static VAULT_REF = VAULT_REF; + /** @type {number} Nonce usage warning threshold (2^31). */ + static ENCRYPTION_COUNT_WARN = 2 ** 31; + /** * @param {Object} options * @param {import('../../ports/GitPersistencePort.js').default} options.persistence * @param {import('../../ports/GitRefPort.js').default} options.ref * @param {import('../../ports/CryptoPort.js').default} options.crypto + * @param {import('../../ports/ObservabilityPort.js').default} [options.observability] */ - constructor({ persistence, ref, crypto }) { + constructor({ persistence, ref, crypto, observability }) { this.persistence = persistence; this.ref = ref; this.crypto = crypto; + /** @type {{ metric: Function, log: Function, span: Function }} */ + this.observability = observability || { metric() {}, log() {}, span: () => ({ end() {} }) }; } // --------------------------------------------------------------------------- @@ -389,9 +395,21 @@ export default class VaultService { } const isUpdate = state.entries.has(slug); state.entries.set(slug, treeOid); + const metadata = state.metadata || { version: 1 }; + if (metadata.encryption) { + metadata.encryptionCount = (metadata.encryptionCount || 0) + 1; + if (metadata.encryptionCount >= VaultService.ENCRYPTION_COUNT_WARN) { + this.observability.log( + 'warn', + `Vault encryption count (${metadata.encryptionCount}) exceeds ` + + `${VaultService.ENCRYPTION_COUNT_WARN} — rotate your key`, + { encryptionCount: metadata.encryptionCount }, + ); + } + } return { entries: state.entries, - metadata: state.metadata || { version: 1 }, + metadata, message: isUpdate ? `vault: update ${slug}` : `vault: add ${slug}`, }; }); diff --git a/src/infrastructure/adapters/NodeCryptoAdapter.js b/src/infrastructure/adapters/NodeCryptoAdapter.js index f89898c..a317a11 100644 --- a/src/infrastructure/adapters/NodeCryptoAdapter.js +++ b/src/infrastructure/adapters/NodeCryptoAdapter.js @@ -1,6 +1,7 @@ import { createHash, createCipheriv, createDecipheriv, randomBytes, pbkdf2, scrypt } from 'node:crypto'; import { promisify } from 'node:util'; import CryptoPort from '../../ports/CryptoPort.js'; +import CasError from '../../domain/errors/CasError.js'; /** * Node.js implementation of CryptoPort using node:crypto. @@ -28,9 +29,9 @@ export default class NodeCryptoAdapter extends CryptoPort { * @override * @param {Buffer|Uint8Array} buffer - Plaintext to encrypt. * @param {Buffer|Uint8Array} key - 32-byte encryption key. - * @returns {{ buf: Buffer, meta: import('../../ports/CryptoPort.js').EncryptionMeta }} + * @returns {Promise<{ buf: Buffer, meta: import('../../ports/CryptoPort.js').EncryptionMeta }>} */ - encryptBuffer(buffer, key) { + async encryptBuffer(buffer, key) { this._validateKey(key); const nonce = randomBytes(12); const cipher = createCipheriv('aes-256-gcm', key, nonce); @@ -50,6 +51,7 @@ export default class NodeCryptoAdapter extends CryptoPort { * @returns {Buffer} */ decryptBuffer(buffer, key, meta) { + this._validateKey(key); const nonce = Buffer.from(meta.nonce, 'base64'); const tag = Buffer.from(meta.tag, 'base64'); const decipher = createDecipheriv('aes-256-gcm', key, nonce); @@ -66,6 +68,7 @@ export default class NodeCryptoAdapter extends CryptoPort { this._validateKey(key); const nonce = randomBytes(12); const cipher = createCipheriv('aes-256-gcm', key, nonce); + let streamFinalized = false; /** @param {AsyncIterable} source */ const encrypt = async function* (source) { @@ -79,9 +82,16 @@ export default class NodeCryptoAdapter extends CryptoPort { if (final.length > 0) { yield final; } + streamFinalized = true; }; const finalize = () => { + if (!streamFinalized) { + throw new CasError( + 'Cannot finalize before the encrypt stream is fully consumed', + 'STREAM_NOT_CONSUMED', + ); + } const tag = cipher.getAuthTag(); return this._buildMeta(nonce.toString('base64'), tag.toString('base64')); }; diff --git a/src/infrastructure/adapters/WebCryptoAdapter.js b/src/infrastructure/adapters/WebCryptoAdapter.js index 5a70733..e40f1ed 100644 --- a/src/infrastructure/adapters/WebCryptoAdapter.js +++ b/src/infrastructure/adapters/WebCryptoAdapter.js @@ -9,6 +9,21 @@ import CasError from '../../domain/errors/CasError.js'; * AES-GCM is a one-shot API (the GCM tag is computed over the entire plaintext). */ export default class WebCryptoAdapter extends CryptoPort { + /** @type {number} */ + #maxEncryptionBufferSize; + + /** + * @param {Object} [options] + * @param {number} [options.maxEncryptionBufferSize=536870912] - Max bytes to buffer during streaming encryption (default 512 MiB). + */ + constructor({ maxEncryptionBufferSize = 512 * 1024 * 1024 } = {}) { + super(); + if (!Number.isFinite(maxEncryptionBufferSize) || maxEncryptionBufferSize <= 0) { + throw new RangeError('maxEncryptionBufferSize must be a finite positive number'); + } + this.#maxEncryptionBufferSize = maxEncryptionBufferSize; + } + /** * @override * @param {Buffer|Uint8Array} buf - Data to hash. @@ -73,6 +88,7 @@ export default class WebCryptoAdapter extends CryptoPort { * @returns {Promise} */ async decryptBuffer(buffer, key, meta) { + this._validateKey(key); const nonce = this.#fromBase64(meta.nonce); const tag = this.#fromBase64(meta.tag); const cryptoKey = await this.#importKey(key); @@ -104,49 +120,56 @@ export default class WebCryptoAdapter extends CryptoPort { this._validateKey(key); const nonce = this.randomBytes(12); const cryptoKeyPromise = this.#importKey(key); + const maxBuf = this.#maxEncryptionBufferSize; + const state = { /** @type {Uint8Array|null} */ tag: null, consumed: false }; - // Web Crypto buffers all data for the one-shot AES-GCM call (GCM tag spans the whole plaintext). - /** @type {Buffer[]} */ - const chunks = []; - /** @type {Uint8Array|null} */ - let finalTag = null; - let streamConsumed = false; + const encrypt = WebCryptoAdapter.#makeEncryptGenerator({ cryptoKeyPromise, nonce, maxBuf, state }); + + const finalize = () => { + if (!state.consumed) { + throw new CasError('Cannot finalize before the encrypt stream is fully consumed', 'STREAM_NOT_CONSUMED'); + } + return this._buildMeta(this.#toBase64(nonce), this.#toBase64(/** @type {Uint8Array} */ (state.tag))); + }; + + return { encrypt, finalize }; + } - /** @param {AsyncIterable} source */ - const encrypt = async function* (source) { + /** + * Builds the encrypt async generator for createEncryptionStream. + * @param {{ cryptoKeyPromise: Promise, nonce: Buffer|Uint8Array, maxBuf: number, state: { tag: Uint8Array|null, consumed: boolean } }} ctx + * @returns {(source: AsyncIterable) => AsyncGenerator} + */ + static #makeEncryptGenerator({ cryptoKeyPromise, nonce, maxBuf, state }) { + return async function* (source) { + /** @type {Buffer[]} */ + const chunks = []; + let accumulatedBytes = 0; for await (const chunk of source) { + accumulatedBytes += chunk.length; + if (accumulatedBytes > maxBuf) { + throw new CasError( + `Streaming encryption buffered ${accumulatedBytes} bytes (limit: ${maxBuf}). ` + + 'Web Crypto AES-GCM buffers all data. Use Node.js/Bun or store without encryption for large files.', + 'ENCRYPTION_BUFFER_EXCEEDED', + { accumulated: accumulatedBytes, limit: maxBuf }, + ); + } chunks.push(chunk); } - const buffer = Buffer.concat(chunks); const cryptoKey = await cryptoKeyPromise; const encrypted = await globalThis.crypto.subtle.encrypt( // @ts-ignore -- Uint8Array satisfies BufferSource at runtime { name: 'AES-GCM', iv: /** @type {Uint8Array} */ (nonce) }, - cryptoKey, - buffer + cryptoKey, buffer, ); - const fullBuffer = new Uint8Array(encrypted); const tagLength = 16; - const ciphertext = fullBuffer.slice(0, -tagLength); - finalTag = fullBuffer.slice(-tagLength); - streamConsumed = true; - - yield Buffer.from(ciphertext); + state.tag = fullBuffer.slice(-tagLength); + state.consumed = true; + yield Buffer.from(fullBuffer.slice(0, -tagLength)); }; - - const finalize = () => { - if (!streamConsumed) { - throw new CasError( - 'Cannot finalize before the encrypt stream is fully consumed', - 'STREAM_NOT_CONSUMED', - ); - } - return this._buildMeta(this.#toBase64(nonce), this.#toBase64(/** @type {Uint8Array} */ (finalTag))); - }; - - return { encrypt, finalize }; } /** diff --git a/src/infrastructure/chunkers/CdcChunker.js b/src/infrastructure/chunkers/CdcChunker.js index 0eaac3d..536f65c 100644 --- a/src/infrastructure/chunkers/CdcChunker.js +++ b/src/infrastructure/chunkers/CdcChunker.js @@ -277,6 +277,11 @@ export default class CdcChunker extends ChunkingPort { `targetChunkSize (${targetChunkSize}) must be in [${minChunkSize}, ${maxChunkSize}]`, ); } + if (maxChunkSize > 100 * 1024 * 1024) { + throw new RangeError( + `maxChunkSize must not exceed 104857600 bytes (100 MiB), got ${maxChunkSize}`, + ); + } this.#minChunkSize = minChunkSize; this.#maxChunkSize = maxChunkSize; diff --git a/src/infrastructure/chunkers/FixedChunker.js b/src/infrastructure/chunkers/FixedChunker.js index 1477e18..4444823 100644 --- a/src/infrastructure/chunkers/FixedChunker.js +++ b/src/infrastructure/chunkers/FixedChunker.js @@ -17,6 +17,14 @@ export default class FixedChunker extends ChunkingPort { */ constructor({ chunkSize = 262144 } = {}) { super(); + if (!Number.isInteger(chunkSize) || chunkSize < 1) { + throw new RangeError(`chunkSize must be a positive integer, got ${chunkSize}`); + } + if (chunkSize > 100 * 1024 * 1024) { + throw new RangeError( + `Chunk size must not exceed 104857600 bytes (100 MiB), got ${chunkSize}`, + ); + } this.#chunkSize = chunkSize; } @@ -36,18 +44,26 @@ export default class FixedChunker extends ChunkingPort { * @yields {Buffer} */ async *chunk(source) { - let buffer = Buffer.alloc(0); + const cs = this.#chunkSize; + const buf = Buffer.allocUnsafe(cs); + let offset = 0; for await (const data of source) { - buffer = Buffer.concat([buffer, data]); - while (buffer.length >= this.#chunkSize) { - yield buffer.slice(0, this.#chunkSize); - buffer = buffer.slice(this.#chunkSize); + let srcPos = 0; + while (srcPos < data.length) { + const n = Math.min(cs - offset, data.length - srcPos); + data.copy(buf, offset, srcPos, srcPos + n); + offset += n; + srcPos += n; + if (offset === cs) { + yield Buffer.from(buf); + offset = 0; + } } } - if (buffer.length > 0) { - yield buffer; + if (offset > 0) { + yield Buffer.from(buf.subarray(0, offset)); } } } diff --git a/test/unit/cli/actions.test.js b/test/unit/cli/actions.test.js index 3d4fd3d..e7839c5 100644 --- a/test/unit/cli/actions.test.js +++ b/test/unit/cli/actions.test.js @@ -109,6 +109,40 @@ describe('runAction', () => { }); }); +describe('runAction — INTEGRITY_ERROR rate-limiting', () => { + let stderrSpy; + const originalExitCode = process.exitCode; + + beforeEach(() => { + process.exitCode = undefined; + stderrSpy = vi.spyOn(process.stderr, 'write').mockImplementation(() => true); + }); + + afterEach(() => { + process.exitCode = originalExitCode; + stderrSpy.mockRestore(); + }); + + it('delays ~1s on INTEGRITY_ERROR before writing output', async () => { + const err = Object.assign(new Error('bad key'), { code: 'INTEGRITY_ERROR' }); + const action = runAction(async () => { throw err; }, () => false); + const start = Date.now(); + await action(); + const elapsed = Date.now() - start; + expect(elapsed).toBeGreaterThanOrEqual(900); + expect(process.exitCode).toBe(1); + }); + + it('no delay for non-INTEGRITY_ERROR codes', async () => { + const err = Object.assign(new Error('gone'), { code: 'MISSING_KEY' }); + const action = runAction(async () => { throw err; }, () => false); + const start = Date.now(); + await action(); + const elapsed = Date.now() - start; + expect(elapsed).toBeLessThan(200); + }); +}); + describe('HINTS', () => { it('contains expected error codes', () => { expect(HINTS).toHaveProperty('MISSING_KEY'); diff --git a/test/unit/cli/passphrase-prompt.test.js b/test/unit/cli/passphrase-prompt.test.js new file mode 100644 index 0000000..3587713 --- /dev/null +++ b/test/unit/cli/passphrase-prompt.test.js @@ -0,0 +1,37 @@ +import { describe, it, expect, afterEach } from 'vitest'; +import { writeFile, unlink } from 'node:fs/promises'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; +import { readPassphraseFile } from '../../../bin/ui/passphrase-prompt.js'; + +describe('readPassphraseFile', () => { + const tmpPath = join(tmpdir(), `test-passphrase-${Date.now()}.txt`); + + afterEach(async () => { + try { await unlink(tmpPath); } catch { /* may not exist */ } + }); + + it('reads from file and trims trailing newline', async () => { + await writeFile(tmpPath, 'my-secret\n', 'utf8'); + const result = await readPassphraseFile(tmpPath); + expect(result).toBe('my-secret'); + }); + + it('preserves content without trailing newline', async () => { + await writeFile(tmpPath, 'no-newline', 'utf8'); + const result = await readPassphraseFile(tmpPath); + expect(result).toBe('no-newline'); + }); + + it('preserves internal newlines', async () => { + await writeFile(tmpPath, 'line1\nline2\n', 'utf8'); + const result = await readPassphraseFile(tmpPath); + expect(result).toBe('line1\nline2'); + }); + + it('strips trailing CRLF (Windows line ending)', async () => { + await writeFile(tmpPath, 'win-secret\r\n', 'utf8'); + const result = await readPassphraseFile(tmpPath); + expect(result).toBe('win-secret'); + }); +}); diff --git a/test/unit/domain/errors/CasError.test.js b/test/unit/domain/errors/CasError.test.js new file mode 100644 index 0000000..9f7fb99 --- /dev/null +++ b/test/unit/domain/errors/CasError.test.js @@ -0,0 +1,37 @@ +import { describe, it, expect } from 'vitest'; +import CasError from '../../../../src/domain/errors/CasError.js'; + +describe('CasError', () => { + it('sets name, code, and meta properties', () => { + const err = new CasError('boom', 'TEST_CODE', { foo: 'bar' }); + expect(err.name).toBe('CasError'); + expect(err.message).toBe('boom'); + expect(err.code).toBe('TEST_CODE'); + expect(err.meta).toEqual({ foo: 'bar' }); + }); + + it('defaults meta to empty object', () => { + const err = new CasError('msg', 'CODE'); + expect(err.meta).toEqual({}); + }); + + it('is an instance of Error', () => { + const err = new CasError('msg', 'CODE'); + expect(err).toBeInstanceOf(Error); + }); + + it('constructs correctly when Error.captureStackTrace is unavailable', () => { + const original = Error.captureStackTrace; + Error.captureStackTrace = undefined; + try { + const err = new CasError('no-stack', 'NO_STACK', { x: 1 }); + expect(err.name).toBe('CasError'); + expect(err.code).toBe('NO_STACK'); + expect(err.meta).toEqual({ x: 1 }); + expect(err.message).toBe('no-stack'); + expect(err).toBeInstanceOf(Error); + } finally { + Error.captureStackTrace = original; + } + }); +}); diff --git a/test/unit/domain/services/CasService.chunkSizeBound.test.js b/test/unit/domain/services/CasService.chunkSizeBound.test.js new file mode 100644 index 0000000..05d6c9b --- /dev/null +++ b/test/unit/domain/services/CasService.chunkSizeBound.test.js @@ -0,0 +1,44 @@ +import { describe, it, expect, vi } from 'vitest'; +import CasService from '../../../../src/domain/services/CasService.js'; +import { getTestCryptoAdapter } from '../../../helpers/crypto-adapter.js'; +import JsonCodec from '../../../../src/infrastructure/codecs/JsonCodec.js'; +import SilentObserver from '../../../../src/infrastructure/adapters/SilentObserver.js'; + +const testCrypto = await getTestCryptoAdapter(); + +const MiB = 1024 * 1024; + +function makeService(chunkSize, observability) { + return new CasService({ + persistence: { writeBlob: vi.fn(), writeTree: vi.fn(), readBlob: vi.fn() }, + crypto: testCrypto, + codec: new JsonCodec(), + chunkSize, + observability: observability || new SilentObserver(), + }); +} + +describe('CasService — chunk size upper bound', () => { + it('throws when chunkSize > 100 MiB', () => { + expect(() => makeService(100 * MiB + 1)).toThrow(/must not exceed/i); + }); + + it('accepts exactly 100 MiB', () => { + const service = makeService(100 * MiB); + expect(service.chunkSize).toBe(100 * MiB); + }); + + it('warns when chunkSize > 10 MiB', () => { + const observability = { + metric: vi.fn(), + log: vi.fn(), + span: vi.fn().mockReturnValue({ end: vi.fn() }), + }; + makeService(11 * MiB, observability); + expect(observability.log).toHaveBeenCalledWith( + 'warn', + expect.stringContaining('exceeds 10 MiB'), + expect.objectContaining({ chunkSize: 11 * MiB }), + ); + }); +}); diff --git a/test/unit/domain/services/CasService.dedupWarning.test.js b/test/unit/domain/services/CasService.dedupWarning.test.js new file mode 100644 index 0000000..d2d0342 --- /dev/null +++ b/test/unit/domain/services/CasService.dedupWarning.test.js @@ -0,0 +1,65 @@ +import { describe, it, expect, vi } from 'vitest'; +import CasService from '../../../../src/domain/services/CasService.js'; +import { getTestCryptoAdapter } from '../../../helpers/crypto-adapter.js'; +import JsonCodec from '../../../../src/infrastructure/codecs/JsonCodec.js'; +import CdcChunker from '../../../../src/infrastructure/chunkers/CdcChunker.js'; +import FixedChunker from '../../../../src/infrastructure/chunkers/FixedChunker.js'; + +const testCrypto = await getTestCryptoAdapter(); + +function makeObserver() { + return { + metric: vi.fn(), + log: vi.fn(), + span: vi.fn().mockReturnValue({ end: vi.fn() }), + }; +} + +function makeService(chunker, observability) { + return new CasService({ + persistence: { writeBlob: vi.fn().mockResolvedValue('oid'), writeTree: vi.fn(), readBlob: vi.fn() }, + crypto: testCrypto, + codec: new JsonCodec(), + chunkSize: 1024, + observability, + chunker, + }); +} + +describe('CasService — CDC + encryption dedup warning', () => { + it('emits warning when encryption + CDC', async () => { + const obs = makeObserver(); + const service = makeService(new CdcChunker({ minChunkSize: 1024, targetChunkSize: 2048, maxChunkSize: 4096 }), obs); + const key = Buffer.alloc(32, 0xab); + + async function* source() { yield Buffer.alloc(2048, 0xcc); } + await service.store({ source: source(), slug: 'enc-cdc', filename: 'f.bin', encryptionKey: key }); + + const warnCalls = obs.log.mock.calls.filter((c) => c[0] === 'warn' && c[1].includes('CDC deduplication')); + expect(warnCalls).toHaveLength(1); + expect(warnCalls[0][2]).toEqual({ strategy: 'cdc' }); + }); + + it('does NOT warn for encryption + fixed chunking', async () => { + const obs = makeObserver(); + const service = makeService(new FixedChunker({ chunkSize: 1024 }), obs); + const key = Buffer.alloc(32, 0xab); + + async function* source() { yield Buffer.alloc(2048, 0xcc); } + await service.store({ source: source(), slug: 'enc-fixed', filename: 'f.bin', encryptionKey: key }); + + const warnCalls = obs.log.mock.calls.filter((c) => c[0] === 'warn' && c[1].includes('CDC deduplication')); + expect(warnCalls).toHaveLength(0); + }); + + it('does NOT warn for CDC without encryption', async () => { + const obs = makeObserver(); + const service = makeService(new CdcChunker({ minChunkSize: 1024, targetChunkSize: 2048, maxChunkSize: 4096 }), obs); + + async function* source() { yield Buffer.alloc(2048, 0xcc); } + await service.store({ source: source(), slug: 'plain-cdc', filename: 'f.bin' }); + + const warnCalls = obs.log.mock.calls.filter((c) => c[0] === 'warn' && c[1].includes('CDC deduplication')); + expect(warnCalls).toHaveLength(0); + }); +}); diff --git a/test/unit/domain/services/CasService.kdfBruteForce.test.js b/test/unit/domain/services/CasService.kdfBruteForce.test.js new file mode 100644 index 0000000..063893b --- /dev/null +++ b/test/unit/domain/services/CasService.kdfBruteForce.test.js @@ -0,0 +1,106 @@ +import { describe, it, expect, vi } from 'vitest'; +import CasService from '../../../../src/domain/services/CasService.js'; +import { getTestCryptoAdapter } from '../../../helpers/crypto-adapter.js'; +import JsonCodec from '../../../../src/infrastructure/codecs/JsonCodec.js'; +import Manifest from '../../../../src/domain/value-objects/Manifest.js'; + +const testCrypto = await getTestCryptoAdapter(); + +const CHUNK_DATA = Buffer.alloc(128, 0xaa); +const CHUNK_DIGEST = await testCrypto.sha256(CHUNK_DATA); + +function setup() { + const observability = { + metric: vi.fn(), + log: vi.fn(), + span: vi.fn().mockReturnValue({ end: vi.fn() }), + }; + const mockPersistence = { + writeBlob: vi.fn(), + writeTree: vi.fn(), + readBlob: vi.fn().mockResolvedValue(CHUNK_DATA), + readTree: vi.fn(), + }; + const service = new CasService({ + persistence: mockPersistence, + crypto: testCrypto, + codec: new JsonCodec(), + chunkSize: 1024, + observability, + }); + return { service, observability }; +} + +function encryptedManifest(slug) { + return new Manifest({ + slug, + filename: `${slug}.bin`, + size: 128, + chunks: [ + { index: 0, size: 128, digest: CHUNK_DIGEST, blob: 'blob-0' }, + ], + encryption: { + algorithm: 'aes-256-gcm', + nonce: 'deadbeef', + tag: 'cafebabe', + encrypted: true, + }, + }); +} + +describe('16.12: KDF brute-force — decryption_failed metric', () => { + it('emits metric on wrong key', async () => { + const { service, observability } = setup(); + const manifest = encryptedManifest('secret-file'); + const wrongKey = testCrypto.randomBytes(32); + + try { + await service.restore({ manifest, encryptionKey: wrongKey }); + expect.unreachable('should have thrown'); + } catch (err) { + expect(err.code).toBe('INTEGRITY_ERROR'); + } + + const dfMetrics = observability.metric.mock.calls.filter( + (c) => c[0] === 'error' && c[1].action === 'decryption_failed', + ); + expect(dfMetrics.length).toBe(1); + }); + + it('includes slug context for audit trail', async () => { + const { service, observability } = setup(); + const manifest = encryptedManifest('audit-slug'); + const wrongKey = testCrypto.randomBytes(32); + + try { + await service.restore({ manifest, encryptionKey: wrongKey }); + } catch { + // expected + } + + const dfMetrics = observability.metric.mock.calls.filter( + (c) => c[0] === 'error' && c[1].action === 'decryption_failed', + ); + expect(dfMetrics[0][1]).toHaveProperty('slug', 'audit-slug'); + }); +}); + +describe('16.12: KDF brute-force — library rate-limiting', () => { + it('library API does NOT rate-limit', async () => { + const { service } = setup(); + const manifest = encryptedManifest('rate-test'); + const wrongKey = testCrypto.randomBytes(32); + + const start = Date.now(); + let caught; + try { + await service.restore({ manifest, encryptionKey: wrongKey }); + expect.unreachable('should have thrown INTEGRITY_ERROR'); + } catch (err) { + caught = err; + } + const elapsed = Date.now() - start; + expect(caught?.code).toBe('INTEGRITY_ERROR'); + expect(elapsed).toBeLessThan(500); + }); +}); diff --git a/test/unit/domain/services/CasService.lifecycle.test.js b/test/unit/domain/services/CasService.lifecycle.test.js new file mode 100644 index 0000000..acd7630 --- /dev/null +++ b/test/unit/domain/services/CasService.lifecycle.test.js @@ -0,0 +1,119 @@ +import { describe, it, expect, vi } from 'vitest'; +import CasService from '../../../../src/domain/services/CasService.js'; +import { getTestCryptoAdapter } from '../../../helpers/crypto-adapter.js'; +import JsonCodec from '../../../../src/infrastructure/codecs/JsonCodec.js'; +import { digestOf } from '../../../helpers/crypto.js'; + +const testCrypto = await getTestCryptoAdapter(); + +function makeChunk(index, seed, blobOid) { + return { index, size: 1024, digest: digestOf(seed), blob: blobOid }; +} + +function setup() { + const mockPersistence = { + writeBlob: vi.fn(), + writeTree: vi.fn(), + readBlob: vi.fn(), + readTree: vi.fn(), + }; + const observability = { + metric: vi.fn(), + log: vi.fn(), + span: vi.fn().mockReturnValue({ end: vi.fn() }), + }; + const service = new CasService({ + persistence: mockPersistence, + crypto: testCrypto, + codec: new JsonCodec(), + chunkSize: 1024, + observability, + }); + return { mockPersistence, observability, service }; +} + +function mockManifest(mockPersistence, manifest) { + const codec = new JsonCodec(); + mockPersistence.readTree.mockResolvedValue([ + { mode: '100644', type: 'blob', oid: 'mf-oid', name: 'manifest.json' }, + ]); + mockPersistence.readBlob.mockResolvedValue(codec.encode(manifest)); +} + +describe('16.7: inspectAsset (canonical name)', () => { + it('returns { slug, chunksOrphaned }', async () => { + const { service, mockPersistence } = setup(); + const manifest = { + slug: 'asset-1', filename: 'f.bin', size: 2048, + chunks: [makeChunk(0, 'c0', 'b0'), makeChunk(1, 'c1', 'b1')], + }; + mockManifest(mockPersistence, manifest); + const result = await service.inspectAsset({ treeOid: 'tree-1' }); + expect(result).toEqual({ slug: 'asset-1', chunksOrphaned: 2 }); + }); +}); + +describe('16.7: deleteAsset (deprecated alias)', () => { + it('delegates to inspectAsset and returns same result', async () => { + const { service, mockPersistence } = setup(); + const manifest = { + slug: 'asset-2', filename: 'g.bin', size: 1024, + chunks: [makeChunk(0, 'd0', 'b0')], + }; + mockManifest(mockPersistence, manifest); + const result = await service.deleteAsset({ treeOid: 'tree-2' }); + expect(result).toEqual({ slug: 'asset-2', chunksOrphaned: 1 }); + }); + + it('emits deprecation warning via observability', async () => { + const { service, mockPersistence, observability } = setup(); + const manifest = { + slug: 'x', filename: 'x.bin', size: 0, chunks: [], + }; + mockManifest(mockPersistence, manifest); + await service.deleteAsset({ treeOid: 'tree-x' }); + expect(observability.log).toHaveBeenCalledWith( + 'warn', 'deleteAsset() is deprecated — use inspectAsset()', + ); + }); +}); + +describe('16.7: collectReferencedChunks (canonical name)', () => { + it('returns { referenced, total }', async () => { + const { service, mockPersistence } = setup(); + const manifest = { + slug: 'asset-3', filename: 'h.bin', size: 2048, + chunks: [makeChunk(0, 'e0', 'b0'), makeChunk(1, 'e1', 'b1')], + }; + mockManifest(mockPersistence, manifest); + const result = await service.collectReferencedChunks({ treeOids: ['tree-3'] }); + expect(result.referenced.size).toBe(2); + expect(result.total).toBe(2); + }); +}); + +describe('16.7: findOrphanedChunks (deprecated alias)', () => { + it('delegates to collectReferencedChunks', async () => { + const { service, mockPersistence } = setup(); + const manifest = { + slug: 'asset-4', filename: 'i.bin', size: 1024, + chunks: [makeChunk(0, 'f0', 'b0')], + }; + mockManifest(mockPersistence, manifest); + const result = await service.findOrphanedChunks({ treeOids: ['tree-4'] }); + expect(result.referenced.size).toBe(1); + expect(result.total).toBe(1); + }); + + it('emits deprecation warning via observability', async () => { + const { service, mockPersistence, observability } = setup(); + const manifest = { + slug: 'y', filename: 'y.bin', size: 0, chunks: [], + }; + mockManifest(mockPersistence, manifest); + await service.findOrphanedChunks({ treeOids: ['tree-y'] }); + expect(observability.log).toHaveBeenCalledWith( + 'warn', 'findOrphanedChunks() is deprecated — use collectReferencedChunks()', + ); + }); +}); diff --git a/test/unit/domain/services/CasService.orphanedBlobs.test.js b/test/unit/domain/services/CasService.orphanedBlobs.test.js new file mode 100644 index 0000000..5903ccd --- /dev/null +++ b/test/unit/domain/services/CasService.orphanedBlobs.test.js @@ -0,0 +1,96 @@ +import { describe, it, expect, vi, beforeEach } from 'vitest'; +import CasService from '../../../../src/domain/services/CasService.js'; +import { getTestCryptoAdapter } from '../../../helpers/crypto-adapter.js'; +import JsonCodec from '../../../../src/infrastructure/codecs/JsonCodec.js'; + +const testCrypto = await getTestCryptoAdapter(); + +function failingSource(chunksBeforeError, chunkSize = 1024) { + let yielded = 0; + return { + [Symbol.asyncIterator]() { + return { + async next() { + if (yielded >= chunksBeforeError) { + throw new Error('simulated stream failure'); + } + yielded++; + return { value: Buffer.alloc(chunkSize, 0xaa), done: false }; + }, + }; + }, + }; +} + +function buildService() { + let blobCounter = 0; + const mockPersistence = { + writeBlob: vi.fn().mockImplementation(() => Promise.resolve(`blob-${blobCounter++}`)), + writeTree: vi.fn().mockResolvedValue('mock-tree-oid'), + readBlob: vi.fn().mockResolvedValue(Buffer.from('data')), + }; + const observability = { + metric: vi.fn(), + log: vi.fn(), + span: vi.fn().mockReturnValue({ end: vi.fn() }), + }; + const service = new CasService({ + persistence: mockPersistence, + crypto: testCrypto, + codec: new JsonCodec(), + chunkSize: 1024, + observability, + }); + return { service, mockPersistence, observability }; +} + +describe('CasService — orphaned blob tracking in STREAM_ERROR', () => { + let service; + let observability; + + beforeEach(() => { + ({ service, observability } = buildService()); + }); + + it('STREAM_ERROR meta includes orphanedBlobs array', async () => { + try { + await service.store({ source: failingSource(3), slug: 'fail', filename: 'f.bin' }); + expect.unreachable('should have thrown STREAM_ERROR'); + } catch (err) { + expect(err.code).toBe('STREAM_ERROR'); + expect(Array.isArray(err.meta.orphanedBlobs)).toBe(true); + } + }); + + it('orphanedBlobs contain OIDs from successful writes', async () => { + try { + await service.store({ source: failingSource(3), slug: 'fail', filename: 'f.bin' }); + expect.unreachable('should have thrown STREAM_ERROR'); + } catch (err) { + expect(err.meta.orphanedBlobs.length).toBe(3); + expect(err.meta.orphanedBlobs).toContain('blob-0'); + expect(err.meta.orphanedBlobs).toContain('blob-1'); + expect(err.meta.orphanedBlobs).toContain('blob-2'); + } + }); + + it('empty array when stream fails before any writes', async () => { + try { + await service.store({ source: failingSource(0), slug: 'fail', filename: 'f.bin' }); + expect.unreachable('should have thrown STREAM_ERROR'); + } catch (err) { + expect(err.meta.orphanedBlobs).toEqual([]); + } + }); + + it('emits metric with orphaned blob count', async () => { + try { + await service.store({ source: failingSource(2), slug: 'fail', filename: 'f.bin' }); + } catch { + // expected + } + const errorMetrics = observability.metric.mock.calls.filter((c) => c[0] === 'error'); + expect(errorMetrics.length).toBeGreaterThan(0); + expect(errorMetrics[0][1]).toHaveProperty('orphanedBlobs', 2); + }); +}); diff --git a/test/unit/domain/services/CasService.restoreGuard.test.js b/test/unit/domain/services/CasService.restoreGuard.test.js new file mode 100644 index 0000000..f4402db --- /dev/null +++ b/test/unit/domain/services/CasService.restoreGuard.test.js @@ -0,0 +1,180 @@ +import { describe, it, expect, vi } from 'vitest'; +import CasService from '../../../../src/domain/services/CasService.js'; +import { getTestCryptoAdapter } from '../../../helpers/crypto-adapter.js'; +import JsonCodec from '../../../../src/infrastructure/codecs/JsonCodec.js'; +import CasError from '../../../../src/domain/errors/CasError.js'; +import SilentObserver from '../../../../src/infrastructure/adapters/SilentObserver.js'; +import Manifest from '../../../../src/domain/value-objects/Manifest.js'; + +const testCrypto = await getTestCryptoAdapter(); + +function setup({ maxRestoreBufferSize } = {}) { + const mockPersistence = { + writeBlob: vi.fn().mockResolvedValue('mock-blob-oid'), + writeTree: vi.fn().mockResolvedValue('mock-tree-oid'), + readBlob: vi.fn().mockResolvedValue(Buffer.alloc(1024, 0xaa)), + readTree: vi.fn(), + }; + const opts = { + persistence: mockPersistence, + crypto: testCrypto, + codec: new JsonCodec(), + chunkSize: 1024, + observability: new SilentObserver(), + }; + if (maxRestoreBufferSize !== undefined) { + opts.maxRestoreBufferSize = maxRestoreBufferSize; + } + const service = new CasService(opts); + return { mockPersistence, service }; +} + +function makeEncryptedManifest(chunkSizes) { + const chunks = chunkSizes.map((size, i) => ({ + index: i, + size, + digest: 'a'.repeat(64), + blob: `blob-${i}`, + })); + return new Manifest({ + slug: 'test', + filename: 'test.bin', + size: chunkSizes.reduce((a, b) => a + b, 0), + chunks, + encryption: { + algorithm: 'aes-256-gcm', + nonce: Buffer.alloc(12).toString('base64'), + tag: Buffer.alloc(16).toString('base64'), + encrypted: true, + }, + }); +} + +describe('CasService — RESTORE_TOO_LARGE throws on exceed', () => { + it('throws RESTORE_TOO_LARGE when chunk sizes exceed limit', async () => { + const { service } = setup({ maxRestoreBufferSize: 2000 }); + const manifest = makeEncryptedManifest([1024, 1024, 1024]); + + await expect( + service.restoreStream({ manifest, encryptionKey: Buffer.alloc(32, 0xab) }).next(), + ).rejects.toThrow(CasError); + + try { + await service.restoreStream({ manifest, encryptionKey: Buffer.alloc(32, 0xab) }).next(); + } catch (err) { + expect(err.code).toBe('RESTORE_TOO_LARGE'); + expect(err.meta.size).toBe(3072); + expect(err.meta.limit).toBe(2000); + } + }); +}); + +describe('CasService — RESTORE_TOO_LARGE succeeds within limit', () => { + it('succeeds when within limit', async () => { + const { service, mockPersistence } = setup({ maxRestoreBufferSize: 4096 }); + const key = Buffer.alloc(32, 0xab); + + async function* source() { yield Buffer.alloc(512, 0xaa); } + const manifest = await service.store({ source: source(), slug: 'ok', filename: 'ok.bin', encryptionKey: key }); + + const storedBlobArgs = mockPersistence.writeBlob.mock.calls.map((c) => c[0]); + let blobIdx = 0; + mockPersistence.readBlob.mockImplementation(() => Promise.resolve(storedBlobArgs[blobIdx++] || Buffer.alloc(0))); + + const chunks = []; + for await (const chunk of service.restoreStream({ manifest, encryptionKey: key })) { + chunks.push(chunk); + } + expect(chunks.length).toBeGreaterThan(0); + }); +}); + +describe('CasService — RESTORE_TOO_LARGE defaults and meta', () => { + it('default maxRestoreBufferSize is 512 MiB', () => { + const { service } = setup(); + expect(service.maxRestoreBufferSize).toBe(512 * 1024 * 1024); + }); + + it('error meta includes size and limit', async () => { + const { service } = setup({ maxRestoreBufferSize: 2048 }); + const manifest = makeEncryptedManifest([1100, 1100]); + + try { + await service.restoreStream({ manifest, encryptionKey: Buffer.alloc(32, 0xab) }).next(); + expect.unreachable('should have thrown RESTORE_TOO_LARGE'); + } catch (err) { + expect(err.code).toBe('RESTORE_TOO_LARGE'); + expect(err.meta).toHaveProperty('size', 2200); + expect(err.meta).toHaveProperty('limit', 2048); + } + }); +}); + +describe('CasService — RESTORE_TOO_LARGE after decompression', () => { + it('throws when decompressed size exceeds limit', async () => { + const { service, mockPersistence } = setup({ maxRestoreBufferSize: 4096 }); + const key = Buffer.alloc(32, 0xab); + + // Store a small encrypted+compressed manifest that fits pre-decompression + async function* source() { yield Buffer.alloc(2048, 0xaa); } + const manifest = await service.store({ + source: source(), slug: 'bomb', filename: 'bomb.bin', + encryptionKey: key, compression: { algorithm: 'gzip' }, + }); + + // Wire readBlob to return the stored blobs + const storedBlobs = mockPersistence.writeBlob.mock.calls.map((c) => c[0]); + let idx = 0; + mockPersistence.readBlob.mockImplementation(() => Promise.resolve(storedBlobs[idx++] || Buffer.alloc(0))); + + // Mock _decompress to return a buffer larger than the limit + service._decompress = vi.fn().mockResolvedValue(Buffer.alloc(8192, 0xbb)); + + await expect( + service.restoreStream({ manifest, encryptionKey: key }).next(), + ).rejects.toMatchObject({ code: 'RESTORE_TOO_LARGE' }); + }); +}); + +describe('CasService — maxRestoreBufferSize validation', () => { + it('throws for non-integer', () => { + expect(() => setup({ maxRestoreBufferSize: 1.5 })).toThrow(); + }); + + it('throws for value below 1024', () => { + expect(() => setup({ maxRestoreBufferSize: 512 })).toThrow(); + }); + + it('throws for NaN', () => { + expect(() => setup({ maxRestoreBufferSize: NaN })).toThrow(); + }); + + it('accepts 1024', () => { + const { service } = setup({ maxRestoreBufferSize: 1024 }); + expect(service.maxRestoreBufferSize).toBe(1024); + }); +}); + +describe('CasService — RESTORE_TOO_LARGE does not affect streaming', () => { + it('does not apply to unencrypted/uncompressed restoreStream', async () => { + const { service, mockPersistence } = setup({ maxRestoreBufferSize: 1024 }); + const manifest = new Manifest({ + slug: 'plain', + filename: 'plain.bin', + size: 2048, + chunks: [ + { index: 0, size: 1024, digest: 'a'.repeat(64), blob: 'blob-0' }, + { index: 1, size: 1024, digest: 'a'.repeat(64), blob: 'blob-1' }, + ], + }); + + mockPersistence.readBlob.mockResolvedValue(Buffer.alloc(1024, 0xcc)); + service._sha256 = vi.fn().mockResolvedValue('a'.repeat(64)); + + const chunks = []; + for await (const chunk of service.restoreStream({ manifest })) { + chunks.push(chunk); + } + expect(chunks).toHaveLength(2); + }); +}); diff --git a/test/unit/domain/services/VaultService.encryptionCount.test.js b/test/unit/domain/services/VaultService.encryptionCount.test.js new file mode 100644 index 0000000..c349149 --- /dev/null +++ b/test/unit/domain/services/VaultService.encryptionCount.test.js @@ -0,0 +1,95 @@ +import { describe, it, expect, vi } from 'vitest'; +import VaultService from '../../../../src/domain/services/VaultService.js'; +import { getTestCryptoAdapter } from '../../../helpers/crypto-adapter.js'; + +const testCrypto = await getTestCryptoAdapter(); + +function encryptedMetadata(overrides = {}) { + return { + version: 1, + encryption: { + cipher: 'aes-256-gcm', + kdf: { algorithm: 'pbkdf2', salt: 'c2FsdA==', iterations: 100000, keyLength: 32 }, + }, + ...overrides, + }; +} + +function setup(metadata = encryptedMetadata()) { + const observability = { + metric: vi.fn(), + log: vi.fn(), + span: vi.fn().mockReturnValue({ end: vi.fn() }), + }; + const persistence = { + writeBlob: vi.fn().mockResolvedValue('blob-oid'), + writeTree: vi.fn().mockResolvedValue('tree-oid'), + readBlob: vi.fn().mockResolvedValue(Buffer.from(JSON.stringify(metadata))), + readTree: vi.fn().mockResolvedValue([ + { mode: '100644', type: 'blob', oid: 'meta-oid', name: '.vault.json' }, + ]), + }; + const ref = { + resolveRef: vi.fn().mockResolvedValue('commit-oid'), + resolveTree: vi.fn().mockResolvedValue('root-tree-oid'), + createCommit: vi.fn().mockResolvedValue('new-commit-oid'), + updateRef: vi.fn().mockResolvedValue(undefined), + }; + const vault = new VaultService({ + persistence, ref, crypto: testCrypto, observability, + }); + return { vault, persistence, ref, observability }; +} + +describe('16.13: Nonce usage tracking — encryptionCount', () => { + it('vault metadata includes encryptionCount after add', async () => { + const { vault, persistence } = setup(); + await vault.addToVault({ slug: 'asset-1', treeOid: 'tree-1' }); + + const writtenMetadata = JSON.parse(persistence.writeBlob.mock.calls[0][0]); + expect(writtenMetadata).toHaveProperty('encryptionCount', 1); + }); + + it('encryptionCount increments per encrypted store', async () => { + const meta = encryptedMetadata({ encryptionCount: 5 }); + const { vault, persistence } = setup(meta); + await vault.addToVault({ slug: 'asset-2', treeOid: 'tree-2' }); + + const writtenMetadata = JSON.parse(persistence.writeBlob.mock.calls[0][0]); + expect(writtenMetadata.encryptionCount).toBe(6); + }); +}); + +describe('16.13: Nonce usage tracking — threshold warning', () => { + it('warns when encryptionCount exceeds threshold', async () => { + const threshold = VaultService.ENCRYPTION_COUNT_WARN; + const meta = encryptedMetadata({ encryptionCount: threshold - 1 }); + const { vault, observability } = setup(meta); + await vault.addToVault({ slug: 'asset-3', treeOid: 'tree-3' }); + + const warnCalls = observability.log.mock.calls.filter( + (c) => c[0] === 'warn' && c[1].includes('encryption count'), + ); + expect(warnCalls.length).toBe(1); + }); + + it('no warning below threshold', async () => { + const meta = encryptedMetadata({ encryptionCount: 0 }); + const { vault, observability } = setup(meta); + await vault.addToVault({ slug: 'asset-4', treeOid: 'tree-4' }); + + const warnCalls = observability.log.mock.calls.filter( + (c) => c[0] === 'warn' && c[1].includes('encryption count'), + ); + expect(warnCalls.length).toBe(0); + }); + + it('no counter increment for unencrypted vault', async () => { + const meta = { version: 1 }; + const { vault, persistence } = setup(meta); + await vault.addToVault({ slug: 'plain-1', treeOid: 'tree-p' }); + + const writtenMetadata = JSON.parse(persistence.writeBlob.mock.calls[0][0]); + expect(writtenMetadata).not.toHaveProperty('encryptionCount'); + }); +}); diff --git a/test/unit/domain/services/rotateVaultPassphrase.test.js b/test/unit/domain/services/rotateVaultPassphrase.test.js index 4539557..a16c9cc 100644 --- a/test/unit/domain/services/rotateVaultPassphrase.test.js +++ b/test/unit/domain/services/rotateVaultPassphrase.test.js @@ -34,7 +34,7 @@ async function createDeps(repoDir) { const service = new CasService({ persistence, codec: new JsonCodec(), crypto, observability: new SilentObserver(), chunkSize: 1024, }); - const vault = new VaultService({ persistence, ref, crypto }); + const vault = new VaultService({ persistence, ref, crypto, observability: new SilentObserver() }); return { service, vault }; } diff --git a/test/unit/infrastructure/adapters/CryptoAdapter.conformance.test.js b/test/unit/infrastructure/adapters/CryptoAdapter.conformance.test.js new file mode 100644 index 0000000..3361a41 --- /dev/null +++ b/test/unit/infrastructure/adapters/CryptoAdapter.conformance.test.js @@ -0,0 +1,55 @@ +import { describe, it, expect } from 'vitest'; +import NodeCryptoAdapter from '../../../../src/infrastructure/adapters/NodeCryptoAdapter.js'; +import WebCryptoAdapter from '../../../../src/infrastructure/adapters/WebCryptoAdapter.js'; + +/** + * Conformance test suite that asserts identical behavioral contracts across + * all crypto adapters that can run in the current environment. + */ + +const adapters = [ + ['NodeCryptoAdapter', new NodeCryptoAdapter()], + ['WebCryptoAdapter', new WebCryptoAdapter()], +]; + +// BunCryptoAdapter is only available in Bun runtime — skip in Node/Deno +if (typeof globalThis.Bun !== 'undefined') { + const { default: BunCryptoAdapter } = await import( + '../../../../src/infrastructure/adapters/BunCryptoAdapter.js' + ); + adapters.push(['BunCryptoAdapter', new BunCryptoAdapter()]); +} + +describe.each(adapters)('%s conformance', (_name, adapter) => { + const key = Buffer.alloc(32, 0xab); + + it('encryptBuffer returns a Promise (thenable)', async () => { + const result = adapter.encryptBuffer(Buffer.from('hello'), key); + expect(typeof result.then).toBe('function'); + const { buf, meta } = await result; + expect(buf).toBeInstanceOf(Buffer); + expect(meta.encrypted).toBe(true); + }); + + it('decryptBuffer rejects INVALID_KEY_TYPE for string key', async () => { + const { buf, meta } = await adapter.encryptBuffer(Buffer.from('test'), key); + await expect( + Promise.resolve().then(() => adapter.decryptBuffer(buf, 'not-a-buffer', meta)), + ).rejects.toMatchObject({ code: 'INVALID_KEY_TYPE' }); + }); + + it('decryptBuffer rejects INVALID_KEY_LENGTH for 16-byte key', async () => { + const shortKey = Buffer.alloc(16, 0xcc); + const { buf, meta } = await adapter.encryptBuffer(Buffer.from('test'), key); + await expect( + Promise.resolve().then(() => adapter.decryptBuffer(buf, shortKey, meta)), + ).rejects.toMatchObject({ code: 'INVALID_KEY_LENGTH' }); + }); + + it('createEncryptionStream.finalize() throws STREAM_NOT_CONSUMED before consumption', () => { + const { finalize } = adapter.createEncryptionStream(key); + expect(() => finalize()).toThrow( + expect.objectContaining({ code: 'STREAM_NOT_CONSUMED' }), + ); + }); +}); diff --git a/test/unit/infrastructure/adapters/WebCryptoAdapter.bufferGuard.test.js b/test/unit/infrastructure/adapters/WebCryptoAdapter.bufferGuard.test.js new file mode 100644 index 0000000..2d9bf62 --- /dev/null +++ b/test/unit/infrastructure/adapters/WebCryptoAdapter.bufferGuard.test.js @@ -0,0 +1,85 @@ +import { describe, it, expect } from 'vitest'; +import WebCryptoAdapter from '../../../../src/infrastructure/adapters/WebCryptoAdapter.js'; +import NodeCryptoAdapter from '../../../../src/infrastructure/adapters/NodeCryptoAdapter.js'; +import CasError from '../../../../src/domain/errors/CasError.js'; + +const key = Buffer.alloc(32, 0xab); + +async function* makeSource(totalBytes, chunkSize = 1024) { + let remaining = totalBytes; + while (remaining > 0) { + const size = Math.min(chunkSize, remaining); + yield Buffer.alloc(size, 0xcc); + remaining -= size; + } +} + +async function consumeStream(encrypt, source) { + const chunks = []; + for await (const chunk of encrypt(source)) { + chunks.push(chunk); + } + return chunks; +} + +describe('WebCryptoAdapter — ENCRYPTION_BUFFER_EXCEEDED', () => { + it('throws ENCRYPTION_BUFFER_EXCEEDED when data exceeds limit', async () => { + const adapter = new WebCryptoAdapter({ maxEncryptionBufferSize: 2000 }); + const { encrypt } = adapter.createEncryptionStream(key); + + await expect( + consumeStream(encrypt, makeSource(3000)), + ).rejects.toThrow(CasError); + + try { + const adapter2 = new WebCryptoAdapter({ maxEncryptionBufferSize: 2000 }); + const { encrypt: encrypt2 } = adapter2.createEncryptionStream(key); + await consumeStream(encrypt2, makeSource(3000)); + } catch (err) { + expect(err.code).toBe('ENCRYPTION_BUFFER_EXCEEDED'); + expect(err.meta.limit).toBe(2000); + } + }); + + it('succeeds within limit', async () => { + const adapter = new WebCryptoAdapter({ maxEncryptionBufferSize: 4096 }); + const { encrypt, finalize } = adapter.createEncryptionStream(key); + + const chunks = await consumeStream(encrypt, makeSource(1024)); + expect(chunks.length).toBeGreaterThan(0); + + const meta = finalize(); + expect(meta.encrypted).toBe(true); + }); +}); + +describe('WebCryptoAdapter — maxEncryptionBufferSize validation', () => { + it('throws for NaN', () => { + expect(() => new WebCryptoAdapter({ maxEncryptionBufferSize: NaN })).toThrow(RangeError); + }); + + it('throws for 0', () => { + expect(() => new WebCryptoAdapter({ maxEncryptionBufferSize: 0 })).toThrow(RangeError); + }); + + it('throws for negative', () => { + expect(() => new WebCryptoAdapter({ maxEncryptionBufferSize: -1 })).toThrow(RangeError); + }); + + it('throws for Infinity', () => { + expect(() => new WebCryptoAdapter({ maxEncryptionBufferSize: Infinity })).toThrow(RangeError); + }); +}); + +describe('NodeCryptoAdapter — no buffer guard for streaming', () => { + it('does NOT throw for same-size stream (true streaming)', async () => { + const adapter = new NodeCryptoAdapter(); + const { encrypt, finalize } = adapter.createEncryptionStream(key); + + const chunks = await consumeStream(encrypt, makeSource(3000)); + expect(chunks.length).toBeGreaterThan(0); + + const meta = finalize(); + expect(meta.encrypted).toBe(true); + }); +}); diff --git a/test/unit/infrastructure/chunkers/ChunkerBounds.test.js b/test/unit/infrastructure/chunkers/ChunkerBounds.test.js new file mode 100644 index 0000000..2a5a2d6 --- /dev/null +++ b/test/unit/infrastructure/chunkers/ChunkerBounds.test.js @@ -0,0 +1,58 @@ +import { describe, it, expect } from 'vitest'; +import FixedChunker from '../../../../src/infrastructure/chunkers/FixedChunker.js'; +import CdcChunker from '../../../../src/infrastructure/chunkers/CdcChunker.js'; + +const MiB = 1024 * 1024; + +describe('FixedChunker — chunk size upper bound', () => { + it('throws when chunkSize > 100 MiB', () => { + expect(() => new FixedChunker({ chunkSize: 100 * MiB + 1 })).toThrow(RangeError); + }); + + it('accepts exactly 100 MiB', () => { + const chunker = new FixedChunker({ chunkSize: 100 * MiB }); + expect(chunker.params.chunkSize).toBe(100 * MiB); + }); +}); + +describe('FixedChunker — chunk size lower bound', () => { + it('throws when chunkSize is 0', () => { + expect(() => new FixedChunker({ chunkSize: 0 })).toThrow(RangeError); + }); + + it('throws when chunkSize is negative', () => { + expect(() => new FixedChunker({ chunkSize: -1 })).toThrow(RangeError); + }); + + it('throws when chunkSize is NaN', () => { + expect(() => new FixedChunker({ chunkSize: NaN })).toThrow(RangeError); + }); + + it('throws when chunkSize is not an integer', () => { + expect(() => new FixedChunker({ chunkSize: 1.5 })).toThrow(RangeError); + }); + + it('accepts chunkSize of 1', () => { + const chunker = new FixedChunker({ chunkSize: 1 }); + expect(chunker.params.chunkSize).toBe(1); + }); +}); + +describe('CdcChunker — chunk size upper bound', () => { + it('throws when maxChunkSize > 100 MiB', () => { + expect(() => new CdcChunker({ + maxChunkSize: 100 * MiB + 1, + minChunkSize: 1024, + targetChunkSize: 50 * MiB, + })).toThrow(RangeError); + }); + + it('accepts exactly 100 MiB as maxChunkSize', () => { + const chunker = new CdcChunker({ + maxChunkSize: 100 * MiB, + minChunkSize: 1024, + targetChunkSize: 50 * MiB, + }); + expect(chunker.params.max).toBe(100 * MiB); + }); +}); diff --git a/test/unit/infrastructure/chunkers/FixedChunker.test.js b/test/unit/infrastructure/chunkers/FixedChunker.test.js new file mode 100644 index 0000000..78233f7 --- /dev/null +++ b/test/unit/infrastructure/chunkers/FixedChunker.test.js @@ -0,0 +1,63 @@ +import { describe, it, expect } from 'vitest'; +import FixedChunker from '../../../../src/infrastructure/chunkers/FixedChunker.js'; + +async function* toAsyncIter(buffers) { + for (const b of buffers) { yield b; } +} + +async function collect(iter) { + const result = []; + for await (const chunk of iter) { result.push(chunk); } + return result; +} + +describe('16.4: FixedChunker pre-allocated buffer — regression', () => { + it('produces byte-exact output for a single large input', async () => { + const chunkSize = 64; + const chunker = new FixedChunker({ chunkSize }); + const input = Buffer.alloc(200); + for (let i = 0; i < input.length; i++) { input[i] = i & 0xff; } + + const chunks = await collect(chunker.chunk(toAsyncIter([input]))); + expect(chunks.map((c) => c.length)).toEqual([64, 64, 64, 8]); + expect(Buffer.concat(chunks).equals(input)).toBe(true); + }); + + it('exact multiple of chunkSize produces no partial', async () => { + const chunkSize = 128; + const chunker = new FixedChunker({ chunkSize }); + const input = Buffer.alloc(chunkSize * 3, 0xbb); + const chunks = await collect(chunker.chunk(toAsyncIter([input]))); + expect(chunks.length).toBe(3); + expect(chunks.every((c) => c.length === chunkSize)).toBe(true); + }); +}); + +describe('16.4: FixedChunker pre-allocated buffer — edge cases', () => { + it('many small input buffers reassemble correctly', async () => { + const chunkSize = 256; + const chunker = new FixedChunker({ chunkSize }); + const total = 1024; + const smallBufs = Array.from({ length: total }, (_, i) => Buffer.from([i & 0xff])); + + const chunks = await collect(chunker.chunk(toAsyncIter(smallBufs))); + expect(chunks.length).toBe(4); + const reassembled = Buffer.concat(chunks); + for (let i = 0; i < total; i++) { + expect(reassembled[i]).toBe(i & 0xff); + } + }); + + it('empty source produces no chunks', async () => { + const chunker = new FixedChunker({ chunkSize: 64 }); + const chunks = await collect(chunker.chunk(toAsyncIter([]))); + expect(chunks.length).toBe(0); + }); + + it('single byte produces one partial chunk', async () => { + const chunker = new FixedChunker({ chunkSize: 64 }); + const chunks = await collect(chunker.chunk(toAsyncIter([Buffer.from([42])]))); + expect(chunks.length).toBe(1); + expect(chunks[0]).toEqual(Buffer.from([42])); + }); +}); diff --git a/test/unit/vault/VaultService.test.js b/test/unit/vault/VaultService.test.js index a85e219..93d3697 100644 --- a/test/unit/vault/VaultService.test.js +++ b/test/unit/vault/VaultService.test.js @@ -34,11 +34,16 @@ function mockCrypto() { }; } +function mockObservability() { + return { metric: vi.fn(), log: vi.fn(), span: vi.fn().mockReturnValue({ end: vi.fn() }) }; +} + function createVault(overrides = {}) { return new VaultService({ persistence: overrides.persistence || mockPersistence(), ref: overrides.ref || mockRef(), crypto: overrides.crypto || mockCrypto(), + observability: overrides.observability || mockObservability(), }); }