feat(cache): portable mache db — push/pull/verify/inspect, all 5 phases (mache-aeb262)#412
Merged
Conversation
External smell rules from $MACHE_SMELL_RULES_DIR are appended to the registry and their ScopeColumn value is interpolated unescaped into runSmellRule's SQL (`"AND " + rule.ScopeColumn + " = ?"`). The trust boundary is operator-controlled, so this isn't a vulnerability — but the cost of a load-time whitelist is one regex-shaped check and the value is "a typo or malicious external rule can't smuggle a `;` terminator, `--` line comment, or unexpected characters into the SQL composition path." Whitelist mirrors the character set the built-in ScopeColumn values actually use (identifiers, `.`, `,`, `(`, `)`, `'`, space) — proven by TestValidateScopeColumn_AcceptsBuiltinShapes which iterates the registry. Rejection coverage in TestValidateScopeColumn_RejectsInjectionShapes spans `;`, `--`, `/*`, `*`, `=`, backtick, double-quote, newline. End-to-end TestLoadExternalSmellRules_RejectsInjectableScopeColumn proves a malicious JSON rule never reaches runSmellRule.
…cheLockfile schema
User correction landed: schema design moved from this repo to LLO
(ADR-0021 / ley-line-open-ae89aa). This ADR is now the mache-specific
consumer adoption note covering:
- producer string ("mache") and kind vocabulary (per-language)
- lockfile location (mache.lock.toml at repo root, committed)
- input_hash definition (raw bytes, no normalization)
- verification posture (re-hash + chunk-hash fallback)
- one combined cross-language lockfile per repo for v1
Pairs with mache-aeb262 (the portable mache db feature bead).
Branch: feat/portable-cache-aeb262.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ands Implements consumer-side surface for the mache portable-cache feature against LLO's substrate (cache.capnp schema + FsBlobStore-shaped layout + BLAKE3 hashes per ADR-0021). Cobra subcommands: mache cache push --db <path> <out-dir> mache cache pull --out-db <path> [--verify] <in-dir> Phase 1 (push): - Opens mache-built .db, queries _source for (id, path, language, content) - Computes BLAKE3(content) for each source (v1: chunk = raw bytes; Phase 4 will switch to capnp-encoded parse outputs once sheaf-driven incremental lands) - Writes chunks to <out>/objects/<hash[0..2]>/<hash[2..]> matching LLO's FsBlobStore layout (future migration is a no-op) - Atomic write (temp + fsync + rename); idempotent (skip if present and hashes match; hard-fail if present but corrupt) - Emits both mache.lock.bin (capnp wire, authoritative) and mache.lock.toml (diff-friendly TOML) per ADR-0025 conventions Phase 2 (pull): - Reads .bin lockfile; refuses mismatched schemaVersion or foreign producer - For each source: fetches chunk by hash, verifies BLAKE3 unless --verify=false, inserts into fresh _source table - Verifies root chain (BLAKE3(concat(chunkHashes)) == lockfile.root) - v1 restores only _source; _ast / _lsp* come back via re-ingest 7 tests, all pass: - EmitsLockfileAndChunks : layout + hashes + meta - RefusesEmptyDB : empty-db guard - PushPull_RoundTrip : 3-source end-to-end - RejectsWrongSchemaVersion: version-skew refused (hand-built bad lockfile) - VerifyRejectsTamperedChunk : verify-on-read catches disk tampering - NoVerifyAcceptsTamperedChunk : --verify=false documented behavior - Idempotent : second push is no-op (IM axiom) Architectural decisions in code comments: - Producer = "mache" (short-name v1 per ADR-0020) - Kind = "<language>-source" - v1 chunks = raw bytes; Phase 4 → capnp _ast rows - Wire = capnp Marshal (canonicalize is v1.1 follow-up for cross-runtime byte equality with Rust producer) - go.mod replace directive points at local LLO leyline-schema until v0.5.x ships to module registry Phase 3 (remote build-cache transport per cloister-spec/build-cache/v1), Phase 4 (chunks-as-parse-outputs), and Phase 5 (CI/dev UX) remain queued in the mache-aeb262 bead. go test ./cmd/ -run TestCache: 7/7 pass golangci-lint run ./cmd/: clean Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ush/pull --remote) Implements consumer-side surface for cloister-spec/build-cache/v1 (cloister-bb168f). HTTP+OCI plumbing on top of the spec that landed in earlier iterations. Files: cmd/cache_oci.go: OCIClient (HeadBlob/PutBlob/GetBlob/PutManifest/ GetManifest), high-level PushBundle/PullBundle with bounded parallel chunk uploads, typed errors (OCIBlobMissingError/ OCIManifestMissingError), BLAKE3-in-sha256:-prefix digest encoding per spec's deliberate misuse, verify-on-read on every GET. cmd/cache.go (extended): --remote/--scope/--tag/--token flags on push; --remote/--scope/--ref/--token flags on pull. runCacheRemotePush walks local emit dir + uploads via OCIClient; runCacheRemotePull fetches into the local cache layout runCachePull understands. Token reads MACHE_CACHE_TOKEN env if --token not set. cmd/cache_oci_test.go: httptest in-process mock registry with concurrency-safe state + failure injection. 12 tests for blob round-trip, HEAD present/absent, idempotency, corruption detection, 404s, manifest mediaType refusal, bundle round-trip, missing-chunk guard, HEAD/PUT failure surfacing, parallel upload. cmd/cache_remote_test.go: end-to-end db → local push → remote push → fresh remote pull → local restore → byte-equal content. Plus idempotency across the wire. Verification: - go test ./cmd/ -run "TestCache|TestOCI": 22/22 pass (7 Phase 1+2 + 12 Phase 3 OCI client + 3 Phase 3 e2e) - golangci-lint: clean - gofumpt: clean (auto-formatted on commit hook) What this enables: mache cache push --db <db> <out-dir> --remote <url> --scope <repo>/<sha> mache cache pull --out-db <db> <in-dir> --remote <url> --scope <repo>/<sha> --ref <ref> Honest limits documented in cache_oci.go: - OAuth2 dance is registry's concern; client takes pre-issued token - No retry/backoff; caller wraps - HTTP/2 reuse limited to net/http defaults - Cross-region failover not handled - OCI mount-blob (cross-repo dedup) falls back to plain upload Phases 4 (chunks-as-parse-outputs via sheaf-driven incremental) and 5 (CI/dev UX) remain queued in mache-aeb262. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rip CI workflow Phase 5 of mache-aeb262: dev UX (task entries) + CI smoke test (GHA workflow). Closes out the feature's developer-facing surface. Taskfile.yml: - task cache:test Run all 22 cache-related tests - task cache:roundtrip End-to-end self-test (the "feature still works" gate) .github/workflows/cache-roundtrip.yml: - Triggers on PR/push affecting cmd/cache*.go or related files - Matrix: ubuntu-latest + macos-latest - Runs the 22 cache tests + the round-trip smoke test - No untrusted-input interpolation; all inputs are commit-controlled Verification: - task cache:test 22/22 pass - task cache:roundtrip 2/2 pass This completes Phases 1+2+3+5 of mache-aeb262. Phase 4 (chunks-as- parse-outputs from _ast instead of raw source bytes, via sheaf-driven incremental) is the remaining scope; it touches mache's _ast walker and the sheaf substrate so it's a meaningfully bigger arc than the transport plumbing this iteration landed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ot public-schema/ Earlier draft of this ADR referenced rs/ll-core/public-schema/capnp/ which is wrong. The schema lives at rs/ll-core/schema-capnp/schemas/ alongside common.capnp/ast.capnp — schema-capnp is structural substrate; public-schema is protocol RPC. Also: the on-disk paragraph now mentions both mache.lock.bin (canonical capnp wire, authoritative) AND mache.lock.toml (diff-friendly), matching what cmd/cache.go actually emits. This was noted as TODO when the architectural correction landed in LLO ADR-0021; deferred because the mache repo was blocked on parallel infra/elixir-parser-out-of-lfs work. Now that work proceeds via worktree, the fix lands. No code impact — pure docs drift cleanup. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…etected via _ast)
When a mache-built .db has an _ast table, mache push emits chunks
containing source content + per-source AST node rows. mache pull
reconstructs both _source AND _ast on restore. When _ast is absent,
the existing Phase 1 path (chunk = raw content) still applies.
Closes Phase 4 of mache-aeb262.
Chunk body is JSON per ADR-0021's producer-defined chunk policy.
Future bead can migrate to capnp-encoded ast.capnp if cross-runtime
byte-equal becomes needed; v1 picks JSON for diff-friendliness and
to avoid a schema bump.
Auto-detection (no flag needed):
- runCachePush: dbHasASTTable() probes sqlite_master; emits Phase 4
chunks if present, Phase 1 otherwise
- runCachePull: chunkBodyIsASTShape() per-chunk check; lazy-creates
_ast table on first AST-shape chunk
New files:
cmd/cache_ast.go JSON wire types + helpers
cmd/cache_ast_test.go 3 tests (push detect, full round-trip,
Phase 1 fallback)
cache.go changes:
- runCachePush: branch on dbHasASTTable
- runCachePull: branch on chunkBodyIsASTShape, lazy _ast create
Verification:
go test ./cmd/ -run "TestCache|TestOCI" 25/25 pass (was 22, +3 new)
golangci-lint run ./cmd/ 0 issues
task cache:roundtrip passes
The mache portable-cache feature is now Phases 1+2+3+4+5 complete —
the entire mache-aeb262 bead scope has shipped on this branch.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ire-shape doc cmd/cache_toml_test.go: 6 new tests - TestTOMLLockfile_RoundTripsBin parse mache.lock.toml back, compare to push - TestTOMLLockfile_FieldsMatchBin TOML chunk_hash matches real chunk file - TestChunkBodyIsASTShape_Negatives 7 negative cases for shape detector - TestDecodeASTChunk_Negatives bad JSON / missing source_id - TestPullRejectsBadBase64InASTChunk content_b64 garbage surfaces error - TestPullCreatesASTTableConsistently lazy _ast CREATE works any order docs/cache/phase-4-chunk-shape.md Reference doc for Phase 4 JSON chunk shape. Previously only in code comments; promoted to a proper artifact. Tests: 31/31 pass (was 25, +6 new) golangci-lint: 0 issues Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`mache cache verify --remote <url> --scope <scope> --ref <ref>` is a CI-friendly probe: fetches the manifest, HEAD-checks every layer, GET-verifies the config + a sample layer. Does NOT restore the db. Designed for a CI step that gates "do we have a cache for this commit?" before an expensive pull. 4 new tests: - TestCacheVerify_IntactBundle - TestCacheVerify_MissingManifest - TestCacheVerify_MissingLayer - TestCacheVerify_DetectsCorruptedSampleLayer README.md gains a "Portable cache" section showing all four CLI surfaces (push local, push remote, pull, verify) + links to the wire-shape doc and OCI build-cache/v1 spec. Tests: 35/35 pass (was 31, +4 new) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…r CI ergonomics mache cache inspect <cache-dir-or-lockfile> prints a summary without restoring or touching a registry. Output covers producer + version, schema, source count, topology edges, root hash, processors, and chunks-on-disk (present/missing, ast-shape vs raw-shape per Phase 1/4). Works on cache dirs and bare .bin lockfiles. --token-file <path> on push/pull/verify reads the bearer token from a file (first line, whitespace-trimmed). Precedence: --token-file > --token > MACHE_CACHE_TOKEN env. CI usage: mount a secret as a file, pass --token-file. Tokens never appear in process args or env where child processes can read them. 11 new tests: - 4 inspect (dir, bare bin, missing chunks, AST bundle) - 7 token resolution (priority, trimming, empty-file error, CLI fallback, env fallback, all-empty, missing file) Tests: 46/46 pass Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Single-page summary of mache-aeb262's complete state. Mirrors the LLO substrate-side checkpoint but from the consumer side. Covers all 5 phases + 2 extras (verify, inspect) with commit SHAs, 46-test ledger, LLO substrate beads consumed, architectural calls, operational follow-ups, how to verify locally, cron status. Any future reviewer or AI agent picking up this branch reads this doc first to orient. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…al-path replace LLO PR #53 merged to main at 5ee058e. Replace go.mod's local-filesystem replace directive with a Go pseudo-version pulled from the merged commit: v0.4.6-0.20260523221739-5ee058ebf3e1 Reproducer: go get github.com/agentic-research/ley-line-open/clients/go/leyline-schema@5ee058ebf3e1657a500aff8bb3a8e181c5666340 go mod edit -dropreplace=github.com/agentic-research/ley-line-open/clients/go/leyline-schema go mod tidy Verification: go test ./cmd/ (cache subset): 46/46 pass golangci-lint run ./cmd/: 0 issues mache CI now works against a real LLO dep — no more local-path replace blocking the runners. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The full mache portable-cache feature (bead
mache-aeb262): take a built.db, ship it via lockfile + content-addressed chunks, restore byte-equal anywhere. Local + remote OCI transport.10 commits, 46 cache tests, lint clean. Built across 18 `/evolve` loop iterations under the user's "do not do the minimum needed" thoroughness directive — every CLI surface a careful reviewer would name has tests + docs.
What ships
Phase ledger
Test ledger (46/46)
`task cache:test` runs all 46 in ~1s. `task cache:roundtrip` runs the end-to-end smoke. `golangci-lint run ./cmd/`: 0 issues.
Hard dependency
This PR cannot merge until the LLO PR ships cache.capnp. go.mod currently has:
```
replace github.com/agentic-research/ley-line-open/clients/go/leyline-schema => /Users/jamesgardner/remotes/art/ley-line-open/clients/go/leyline-schema
```
Pointing at the local LLO clone. When LLO PR #53 merges + a leyline-schema release ships with cache.capnp, remove the `replace` directive and bump the `require` to the new tag.
Architectural calls captured in code/docs
See also
Test plan
🤖 Generated with Claude Code